Term
click below
click below
Term
Normal Size Small Size show me how
Data Analysis
Year 12 Core Module
Term | Definition |
---|---|
Trend | Is present when there is a long-term upward or downward movement in a time series. |
Cycles | are present when there is a periodic movement in a time series. The period is the time it takes for one complete up and down movement in the time series plot. This term is generally reserved for periodic movements with a period greater than one year. |
Seasonality | is present when there is a periodic movement in a time series that has a calendar related period – for example, a year, a month, a week. |
Irregular (random) fluctuations | are always present in any real-world time series plot. They include all of the variations in a time series that we cannot reasonably attribute to systematic changes like trend, cycles, seasonality, structural change or the presence of outliers. |
Smoothing | is a technique used to eliminate some of the irregular fluctuations in a time series plot so that features such as trend are more easily seen. |
Seasonal indices | are used to quantify the seasonal variation in a time series. |
Deseasonalise | The process of accounting for the effects of seasonality in a time series |
Reseasonalise | The process of a converting seasonal data back into its original form is called |
Bivariate Data | are data in which each observation involves recording information about two variables for the same person or thing. An example would be the heights and weights of the children in a preschool. |
Residuals | The vertical distance from a data point to the straight line |
Interpolation | Predicting within the range of data |
Extrapolation | Predicting outside the range of data |
Slope | Gradient on a linear graph |
Coefficient of determination | gives a measure of the predictive power of a regression line |
Residual plot | can be used to test the linearity assumption by plotting the residuals against the EV. |
Correlation coefficient | gives a measure of the strength of a linear association |
Scatterplot | is used to help identify and describe an association between two numerical variables |
Parallel box plots | can be used to display, identify and describe the association between a numerical and a categorical variable |
Segmented bar charts | can be used to graphically display the information contained in a two-way frequency table. It is a useful tool for identifying relationships between two categorical variables |
Two-way frequency tables | are used as the starting point for investigating the association between two categorical variables |
z-score | also known as standardised scores. The value of the standard score gives the distance and direction of a data value from the mean in terms of standard deviations. |
68-95-99.7% rule | the rule for normal distribution |
The normal distribution | Data distributions that have a bell shape can be modelled by |
outliers | data points away from the majority of the data set |
Box plots | a graphical representation of a five-number summary |
Five number summary | A listing of the median, M, the quartiles Q1 and Q3, and the smallest and largest data values of a distribution, written in the order - minimum, Q1, M, Q3, maximum |
Interquartile range | gives the spread of the middle 50% of data values |
Median | It is the midpoint of a distribution dividing an ordered dataset into two equal parts. |
Univariate Data | are generated when each observation involves recording information about a single variable, for example a dataset containing the heights of the children in a preschool |
Categorical Variable | are used to represent characteristics of individuals |
Nominal Variable | generate data values that can only be used by name |
Ordinal Variable | generate data values that can be used to both name and order |
Numerical Variables | used to represent quantities. |
Discrete Variables | represent quantities – e.g. the number of cars in a car park |
Continuous Variables | represent quantities that are measured rather than counted – for example, weights in kg. |
Bar Charts | are used to display frequency distribution of categorical data |
Histograms | used to display the frequency distribution of a numerical variable. It is suitable for medium- to large-sized datasets. |