| Term | Definition |
| Trend | Is present when there is a long-term upward or downward
movement in a time series. |
| Cycles | are present when there is a periodic movement in a time series.
The period is the time it takes for one complete up and down movement
in the time series plot. This term is generally reserved for periodic
movements with a period greater than one year. |
| Seasonality | is present when there is a periodic movement in a time
series that has a calendar related period – for example, a year, a month,
a week. |
| Irregular (random) fluctuations | are always present in any real-world
time series plot. They include all of the variations in a time series that
we cannot reasonably attribute to systematic changes like trend, cycles,
seasonality, structural change or the presence of outliers. |
| Smoothing | is a technique used to eliminate some of the irregular
fluctuations in a time series plot so that features such as trend are more
easily seen. |
| Seasonal indices | are used to quantify the seasonal variation in a time
series. |
| Deseasonalise | The process of accounting for the effects of seasonality in a time series |
| Reseasonalise | The process of a converting seasonal data back into its original form is
called |
| Bivariate Data | are data in which each observation involves recording
information about two variables for the same person or thing. An
example would be the heights and weights of the children in a
preschool. |
| Residuals | The vertical distance from a data point to the straight line |
| Interpolation | Predicting within the range of data |
| Extrapolation | Predicting outside the range of data |
| Slope | Gradient on a linear graph |
| Coefficient of determination | gives a measure of the predictive
power of a regression line |
| Residual plot | can be used to test the linearity assumption by plotting
the residuals against the EV. |
| Correlation coefficient | gives a measure of the strength of a
linear association |
| Scatterplot | is used to help identify and describe an association
between two numerical variables |
| Parallel box plots | can be used to display, identify and describe the
association between a numerical and a categorical variable |
| Segmented bar charts | can be used to graphically display the
information contained in a two-way frequency table. It is a useful tool
for identifying relationships between two categorical variables |
| Two-way frequency tables | are used as the starting point for investigating the association between two categorical variables |
| z-score | also known as standardised scores. The value of the standard score gives the distance and direction of a data
value from the mean in terms of standard deviations. |
| 68-95-99.7% rule | the rule for normal distribution |
| The normal distribution | Data distributions that have a bell shape can be modelled by |
| outliers | data points away from the majority of the data set |
| Box plots | a graphical representation of a five-number
summary |
| Five number summary | A listing of the median, M, the quartiles Q1 and Q3, and the smallest and largest data values of a distribution, written in the order - minimum, Q1, M, Q3, maximum |
| Interquartile range | gives the spread of the middle 50% of data values |
| Median | It is the midpoint of a distribution dividing an ordered
dataset into two equal parts. |
| Univariate Data | are generated when each observation involves
recording information about a single variable, for example a dataset
containing the heights of the children in a preschool |
| Categorical Variable | are used to represent characteristics of individuals |
| Nominal Variable | generate data values that can only be used
by name |
| Ordinal Variable | generate data values that can be used to both name and order |
| Numerical Variables | used to represent quantities. |
| Discrete Variables | represent quantities – e.g. the number of cars in a car park |
| Continuous Variables | represent quantities that are measured rather than counted –
for example, weights in kg. |
| Bar Charts | are used to display frequency distribution of categorical
data |
| Histograms | used to display the frequency distribution of a numerical variable. It is suitable for medium- to large-sized datasets. |