click below
click below
Normal Size Small Size show me how
Bus Stats Exam 1
Business Stats Ch 1-3
Term | Definition |
---|---|
Analytics | The scientific process of transforming data into insight for making better decisions. |
Big Data | A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time. Big data are characterized by great vol- ume |
Categorical Data | Labels or names used to identify an attribute of each element. Categorical data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric. |
Census | A survey to collect data on the entire population |
Cross-Sectional Data | Data collected at the same or approximately the same point in time |
Data | The facts and figures collected, analyzed, and summarized for presentation and interpretation |
Data Mining | The process of using procedures from statistics and computer science to extract useful information from extremely large databases. |
Data Set | All the data collected in a particular study |
Descriptive Analytics | The set of analytical techniques that describe what has happened in the past |
Descriptive Statistics | Tabular, graphical, and numerical summaries of data |
Elements | The entities on which data are collected |
Interval Scale | The scale of measurement for a variable if the data demonstrate the proper- ties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric |
Nominal Scale | The scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric |
Observation | The set of measurements obtained for a particular element |
Ordinal Scale | The scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. Ordinal data may be non- numeric or numeric |
Population | The set of all elements of interest in a particular study |
Predictive Analytics | The set of analytical techniques that use models constructed from past data to predict the future or assess the impact of one variable on another |
Prescriptive Analytics | The set of analytical techniques that yield a best course of action |
Quantitative Data | Numeric values that indicate how much or how many of something. Quantitative data are obtained using either the interval or ratio scale of measurement |
Quantitative Variable | A variable with quantitative data |
Ratio Scale | The scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric |
Sample | A subset of the population |
Sample Survey | A survey to collect data on a sample |
Statistical Inference | The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population |
Statistics | The art and science of collecting, analyzing, presenting, and interpreting data |
Time Series Data | Data collected over several time periods |
Variable | A characteristic of interest for the elements |
Bar Chart | A graphical device for depicting categorical data that have been summarized in a frequency, relative frequency, or percent frequency distribution |
Categorical Data | Labels or names used to identify categories of like items |
Class Midpoint | The value halfway between the lower and upper class limits |
Crosstabulation | A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns |
Cumulative Frequency Distribution | A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class. |
Cumulative Percent Frequency Distribution | A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class |
Cumulative Relative Frequency Distribution | A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class |
Data Dashboard | A set of visual displays that organizes and presents information that is used to monitor the performance of a company or organization in a manner that is easy to read, understand, and interpret |
Data Visualization | A term used to describe the use of graphical displays to summarize and present information about a data set |
Dot Plot | A graphical device that summarizes data by the number of dots above each data value on the horizontal axis |
Frequency Distribution | A tabular summary of data showing the number (frequency) of observations in each of several nonoverlapping categories or classes |
Histogram | A graphical display of a frequency distribution |
Percent Frequency Distribution | A tabular summary of data showing the percentage of observations in each of several nonoverlapping classes |
Pie Chart | A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class |
Relative Frequency Distribution | A tabular summary of data showing the fraction or pro- portion of observations in each of several nonoverlapping categories or classes |
Scatter Diagram | A graphical display of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis |
Side-by-Side Bar Chart | A graphical display for depicting multiple bar charts on the same display |
Simpson's Paradox | Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation |
Stacked Bar Chart | A bar chart in which each bar is broken into rectangular segments of a different color showing the relative frequency of each class in a manner similar to a pie chart |
Stem-and-Leaf Display | A graphical display used to show simultaneously the rank order and shape of a distribution of data |
Trend Line | A line that provides an approximation of the relationship between two variables |
Box Plot | A graphical summary of data based on a five-number summary |
Chebyshev's Theorem | A theorem that can be used to make statements about the pro- portion of data values that must be within a specified number of standard deviations of the mean |
Coefficient of Variation | A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100 |
Correlation Coefficient | A measure of linear association between two variables that takes on values between −1 and +1. |
Covariance | A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship. |
Empirical Rule | A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell-shaped distribution |
Five-Number Summary | A technique that uses five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value |
Geometric Mean | A measure of location that is calculated by finding the nth root of the product of n values |
Interquartile Range | A measure of variability, defined to be the difference between the third and first quartiles. |
Mean | A measure of central location computed by summing the data values and dividing by the number of observations. |
Median | A measure of central location provided by the value in the middle when the data are arranged in ascending order |
Mode | A measure of location, defined as the value that occurs with greatest frequency |
Outlier | An unusually small or unusually large data value |
Percentile | A value such that at least p percent of the observations are less than or equal to this value and at least (100 − p) percent of the observations are greater than or equal to this value |
Point Estimator | A sample statistic, such as x, s2, and s, used to estimate the corresponding population parameter |
Population Parameter | A numerical value used as a summary measure for a population |
Quartiles | The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts |
Range | A measure of variability, defined to be the largest value minus the smallest value. |
Sample Statistic | A numerical value used as a summary measure for a sample |
Skewness | A measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right result in positive skewness |
Standard Deviation | A measure of variability computed by taking the positive square root of the variance |
Variance | A measure of variability based on the squared deviations of the data values about the mean. |
Weighted Average | The mean obtained by assigning each observation a weight that reflects its importance |
Z-Score | A value computed by dividing the deviation about the mean (xi − x) by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations xi is from the mean |