click below
click below
Normal Size Small Size show me how
IS 335: Chapter 5
IS 335 Chapter 5 Practice Questions
Question | Answer |
---|---|
Define Business Analytics: | Process of creating new insights from information |
Organizations relying on business analytics make extensive use of... | data, statistical and quantitative analysis, explanatory and predictive modeling, and fact based decision making. |
What are the insights used to discover existing information? | 1) Discovery by using existing information about how decisions are made in order to build systems that make similar decisions 2) Discovery by finding useful patterns in observations, typically embodied in explicit data |
Define Knowledge Discovery in Databases: | Commonly known as Data Mining (DM), it is the gold mined from the databases refers to patterns buried between the data variables, which represent new insights about the previously unknown relationships among theses variables |
Define Knowledge Discovery in Databases: | Process of applying statistical or intelligent algorithms. Viewed as involving the whole business intelligence process, whereas DM may refer more specifically to the actual application of DM techniques |
What are the factors that drive BI? | 1) Exploding data volumes 2) Increasing decision complexity 3) Need for quick reflexes 4) Technological progress |
Exploding data volumes is important because... | Ready for analysis, marked by the convergence of improved and inexpensive data storage capabilities, abundance of customer data resulting from the explosion of e-commerce applications |
Increasing decision complexity results from... | Increased competition across industries and countries, globally distributed organizations, and the increasing requirement to incorporate both structured and unstructured information in decision making |
Need for quick reflexes as decision makers increasingly need to respond too... | Efficiently and effectively to environmental changes, with greater accountability for their actions. |
Define E-Tailers: | Electronic Retailer Ex) eBags, a web based storefront of handbags, suitcases, wallets, and other similar products |
What are some examples of successful DM applications? | 1) Banking 2) Target Marketing 3) Insurance 4) Telecommunications 5) Operations Management 6) Retail Sales Forecasting 7) System Diagnosis |
Define Cross-Industry Standard Process For Data Mining (CRISP-DM): | A set of specifications. Industry-neutral and tool-neutral process for data mining that provides a hierarchical process model for defining the basic tasks of the BI process. |
Define Data Preparation: | Process of cleaning and transforming raw data prior to processing and analysis. Important step prior to processing and often involves reformatting data, making corrections to data and the combining of data sets to enrich data. |
What are the tasks for Data preparation? | 1) Selection 2) Construction and transformation of variables 3) Data integration 4) Formatting |
Step 1 (Selection) for Data Preparation involves... | a step that involves defining the predictor variables and the sample data set. Process of selecting the predictor variables is critical, because DM algorithms will not perform well if inconsequential variables are considered as potential predictors |
Step 2 (Construction and Transformation of Variables) for Data Preparation involves... | Many of the variables involved in the DM model will need to be transformed or constructed from existing raw data. Specific model may require transformations that group raw data values in ranges such as low, medium, and high |
Step 3 (Data integration) for Data Preparation involves... | Dataset required by the DM model may reside on multiple disjoint databases, which will need to be consolidated for the model to use. Data consolidation may require redefinition of some of the data fields to allow for consistency. |
Step 4 (Formatting) for Data Preparation involves... | Involves the reformatting and reordering of the data fields, as required by the DM model. |
Define Model Building and Validation: | It is the next task in the DM process. Involves repeatedly trying several options until the best quality model emerges. Therefore, developing an accurate model is an iterative process. |
What is the most popular validation technique is.... | N-Fold Cross Validation, specifically, ten-fold validation |
Define Ten-Fold Validation: | Divides the total validation dataset into ten approximately equal-sized datasets, using each of the ten validation sets a single time, to evaluate the model comparing the accuracy with that resulting from the using the remaining nine training sets. |
Define Model Evaluation and Interpretation: | Next task in the DM process (after Model Building and Validation), validation dataset is fed through the model. Model outcomes or predicted results can be compared with the actual outcomes in the validation dataset. |
Define Deployment: | Last task in the DM process, task implements the live DM model within an organization to aid the decision-making process. A valid model must make sense, and a pilot implementation is always required prior to live deployment |
What is the CRISP-DM Methodology? | 1) Business Understanding 2) Data Understanding 3) Data Preparation 4) Modeling Building and Validation 5) Model Evaluation and Interpretation 6) Deployment |
It is estimated that understanding the goals of the application, followed by understanding the characteristics of the data, consumes what percentage of project resources? | 50-80% |
Define Data Collection: | Defining the data sources for the study |
Define Data Description: | Describing the contents of each of the DM data sources |
Define Data Quality and Verification: | Defines whether any data should be ignored due to lack of quality or relevance to the study |
What is the first step in order to define which DM technique to use... | Describe what happened or predict what will happen Purpose of the study has been defined |
Once the purpose of the study has been defined, the next step is to... | Understand the characteristics of the input and outcome data that will be used. |
Define "To Describe What Happened": | Means to segment or cluster data based on certain input characteristics with no specific outcome to predict in mind. |
Define Descriptive Techniques: | Look for patterns in prior activities or actions that affect these actions or activities |
What are the different types of descriptive techniques? | Affinity or Association Clustering |
What is the Affinity or Association Descriptive Technique? | Find which items are closely associated in the database, including market basket analysis (MBA) and link analysis |
What is the Clustering Descriptive Technique? | Uncover the natural groupings of data that may not be as obvious through casual inspection. Goal is to create clusters of input records based on a set of characteristics that recognizes them as a similar |
Market Basket or Association Analysis can include the use of... | Apriori Association Rule Algorithm Generalized Rule Induction (GRI) |
Apriori Association Rule Algorithm is... | Generally faster to train than GRI. Allows only for specification or logical for the input variables such as (True or False) or (1,0) to indicate the presence (or absence) of the item in the market basket |
Define To Predict What Will Happen: | Means to develop a model that uses historical data to predict an outcome based on a set of input characteristics. Predictive techniques require the use of past history with the intent to predict future behavior. |
DM techniques used to predict what will happen are classified into how many categories and what are those methods called? | Three Statistical Connectionist Rule Induction |
Define Statistical DM Technique: | Find how two or more variables are related to each other, the correlations between the variables. |
What is the common way to find correlations (Statistical DM Techniques) is too | Curve fitting, a method used to identify a mathematical equation that can describe the relationship between the input variables and the outcomes |
What are the different types to find curve fitting methods? | Least square method Nonlinear Correlation Method Multivariate Correlation Techniques Inferential Statistical Techniques Statistical Techniques |
Connectionist Methods use... | Artificial Neural Networks (ANN) techniques. |
What is ANN? | Computer based representation of what is theorized to be the human brain's physiological structure. Used as a predictive technique or as a clustering technique. |
What is the most important feature of ANN? | They can learn |
Define Memory-Based Reasoning: | DM technique that looks for the nearest neighbors of known data samples, and combines their values to assign classification or prediction values for new data samples. |
Define Decision Tree and Rule Induction Methods: | Also known as Symbolic Techniques, used to infer the rules that classify or partition the dataset. |
Define Rule Induction: | Provides automated techniques that are used for discovering how to generate classification schemes that resemble a decision tree. Once created, it is simple to transform the conditional attribute branches into the programming code conditional statements |
Define CART Algorithm: | Classification and Regression Tree, the most popular methods to build a decision search tree. Performs a binary split of a continuous variable at each node. |
Define CHAID Method: | Chi-Squared Automatic Interaction Detector (C5.0), assumes one branch for each value taken on by the discrete variable. Can be used to build decision trees, but its use is restricted to discrete variables. |
Define the Storage Law: | Capacity of digital data storage worldwide has doubled every nine months for the last decade, at twice the rate |
Define the Moore's Law: | Growth of computing power |
Define Data Tombs: | Known as data stores, where data are deposited to merely rest in peace just in case one day the business may be able to analyze them. Most of the time, this data is never analyzed. |
What are the common mistakes that organizations seeking the deployment of DM techniques must avoid? | 1) User expectations are too high 2) Putting the right tools in the wrong hands 3) Dishing up data that users need to figure out how to use 4) Training users only at the beginning of the project |
What are the common mistakes that organizations seeking the deployment of DM techniques must avoid? | 5) Going for a quick win rather than planning for the long haul 6) Organization goes for the big bang 7) Data roles and governance are not adequately addressed 8) Organization fails to demonstrate value |
Define Real-Time Decision Support: | Refers to putting analytics into action. Necessary to put the results into action in order to drive business value |
Marketing analytics invisible requires the following challenges be addressed: | 1) Scaling analysis to large databases 2) Scaling to high-dimensional data and models 3) Automating the search 4) Finding patterns and models understandable and interesting to user |
The World Competitiveness Yearbook (WCY) provides a competitiveness score for each country by synthesizing all collected information into eight major factors: | 1) Domestic Economy 2) Internationalization 3) Government 4) Finance 5) Infrastructure 6) Management 7) Science and Technology 8) People |