click below
click below
Normal Size Small Size show me how
Basic Stats
For Life Sciences
Question | Answer |
---|---|
a statistic is? | a summary of data |
a field of statistics is? | the collecting, analysing and understanding of data measured with uncertainty |
what is a categorical variable? | one which is measured descriptively eg: hair colour or major at university |
what is a define quantitative variable? | one which is measured numerically: time it takes to get home from work |
graphical summary of one categorical variable? | bar graph |
graphical summary of one quantitative variable? | histogram or boxplot |
how to graphically summarise relationship between two categorical variables | clustered bar chart or jittered scatterplot |
how to graphically summarise relationship between two quantitative variables | scatterplot |
how to graphically summarise relationship between one categorical and one quantitative variable | comparative boxplots or comparative histograms |
what to look for in a graph | location, spread, shape, unusual observations |
define 'location' graphically | where most of the data lies |
define 'spread' graphically | variability of the data, how far apart or close together it is |
define 'shape' graphically | symetric, skewed etc |
how to numerically summarise one categorical variable | table of frequencies or percentages |
how to numerically summarise one quantitative variable | location: mean or median; spread: standard deviation or inter quartile range |
formula for mean? | xhat=1/N times summation of xi; preferable for approximately normal data |
formula for Median? | M=midn or (midn1+midn2)/ 2; less affected by outliers therefore used for outlier ridden data |
formula for standard deviation? | s=√1/N-1 times summation of ((xi-x) squared); preferable for approximately normal data |
formula for inter quartile range? | Q3 - Q1= IQR; less affected by outliers therefore used for outlier ridden data |
which numbers are needed to create a five number summary? | minimum, Q1, median (sometimes mean included), Q3, maximum |
an outlier is? | more than 1.5 x IQR lower than Q1; more than 1.5 x IQR higher than Q3 |
define linear transformation | transformation of a variable from x to xnew |
examples of linear transformation use | change of units; use of normal assumption therefore to find 'z' scores |
formula for linear transformation? | xnew=a+bx |
formula for new mean once linear transformation has occurred? | xbarnew=a+bxbar |
formula for new median once linear transformation has occurred? | Mnew=a+bM |
formula for new standard deviation once linear transformation has occurred? | snew=bs |
formula for IQR once linear transformation has occurred? | 1QRnew=bIQR |
explain density curves | area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable; like a smoothed out histogram describes probabilistic behaviour |
total area under density curve equals? | 1 |
explain the normality assumption | normal curve can be used if a histogram looks like a normal curve; termed 'reasonable'; must start at 0 and end at 0 |
how does a normal quantile plot confirm the normality assumption? | if in a straight line, or close to it, then normal and assumption is reasonable |
define the 68-95-99.7 rule | 68% of results will be within 1 standard deviation of the mean; 95% of results will be within 2 standard deviations of the mean; 99.7% of data will be within 3 standard deviations of the mean |
symbol for mean of a density curve? | μ |
symbol for standard deviation of density curve? | σ |
normal distribution short hand | X = random variable; N = normal distribution; first number in brackets = mean; second number in brackets = standard deviation |
explain the standard normal variable | example of set out: P = (n>Z); corresponds to the area under the curve of the corresponding region; will always be to the left of Z |
use of the standard normal distribution table | to find P: Z found along x and y axis of table; to find Z: P found in results of table; table ordered from smallest to largest |
reverse use of the standard normal distribution table | eg of how set out: P(Z<c)= n; c = right of Z |
X =? | N(μ,σ) |
formula and use of standardising transformation | Z= (X-μ)/σ; used when distribution is not N(0,1)and so it needs to be altered |
relationships between variables best explored through? why? | scatterplot; can get a sense for the nature of the relationship |
how to define the nature of relationship? | existent/ non-existent; strong/ weak; increasing/ decreasing; linear/ non-linear |
outliers in scatterplots? | represent some unexplainable anomalies in data; could reveal possible systematic structure worthy of investigation |
define casual relationship | relationship between two variables where one variable causes changes to another |
define the explanatory variable | explains or causes the change; written on x-axis |
define the response variable | that which changes; written on y-axis |
useful numbers for two quantitative variables? | correlation or regression |
formula for the correlation coefficient? | r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy) |
define xi or yi | axis values of corresponding letter |
define xbar and ybar | mean of axis values of corresponding letter |
define sx and sy | standard deviation of axis values of corresponding latter |
state the properties of r | is the correlation coefficient; numerically expresses relationships; if close to 1 = strong positive linear relatoinship; if close to -1 = strong negative linear relationship; close to 0 = weak or non-existent linear relationsip |
state the cautions about the use of r | only useful for describing linear relationships; sensitive to outliers |
what is least squares regression used for? | to explain how a response variable is related to explanatory variable; focus positive = increase; focus negative = decrease |
mathematical representation of regression | b1=r(sy/sx); b0=yhat-b1xbar; y=b0+b1x |
facts about b1 | b1 = r = correlation coefficient = slope |
how to determine the strength of a regression | rsquared = syhat/sy; r-squared is the % variation in y explained by linear regression |
state the basic regression assumptions | y=b_0+b_1+error; error~0; error corresponds to random scatter about line; this is checked by residual plots |
formula for residual plots? | y - y-hat |
residual plot is a scatter plot of? | residuals(y axis) against explanatory variable(x axis) |
interpreting residual plots | focus on pattern; there should be no pattern; if there is a pattern then the linear assumption is incorrect |
what to do if any residuals stand out? | they are either an influential point and to be left alone; or they are an outlier and to be removed if affecting results too much |
how to attach special cause to an outlier | analyse if recording error; refit line; if remove then justify why (down weight influence) |
translated residuals (removing the outlier) should have what effect? | spread pattern |
any 0 intercepting points on a residual plot are? | 1 standard deviation from mean |
if parabola presents after outlier removal? | x-hat assumption not appropriate |
if spread doesn't vary far from 0? | there is no pattern |
when to remove outlier | if influences results |
when will outlier not influence results? | when close to mean; - will have little influence on the gradient and intercept of fitted line |
what are lurking variables? | variables that can influence results which have not been taken into account |
to account for lurking variables you? | analyse the covariance |
state the strategy for using data in research? | identify question to be answered; identify population studied; locate variables: which one is IV and DV, explanatory and response; obtain data which answers question |
define anecdotal data | haphazard collection of data; unreliable for drawing conclusions |
define available data | use of data that has come from another source possibly obtained for a reason other than the one you intend to use it for |
define collect your own data | use of a census, a survey, or observations from an experiment |
define census | use of whole population to obtain data |
define sample | use of a randomised selection of the population to represent the whole; smaller and easier to do than a census |
explain observational study | no variables are manipulated or influenced; data obtained from population as it is |
explain experiment | variables are influenced or manipulated so that responses can be noted and recorded; usually a control group utilised control group = does not undergo treatment, act as a comparison group |
explain causation | a response that is the result of another variable eg: moon's movements CAUSE the tides |
common response in terms of variables means? | explanatory variable causes the response variables; response variables are associated to one another |
causation in terms of variables means? | explanatory variable causes response variable; response variable and explanatory variable are associated |
confounding in terms of variables means? | two or more explanatory variables are present and associated to one another; all explanatory variables could have caused response variable by themselves or together; explanatory variables called confounded causes |
why an experiment? | allows demonstration of causation; intervention can be used to determine whether or not effect is present |
state the principles of experiment design | subjects, treatment, factor, levels, response variable |
definition of subjects in terms of experiment design | things upon which experiment is done; eg: people, animals, chemicals etc |
definition of treatment in terms of experiment design | circumstances which applied to subjects; eg: given medication |
definition of factor in terms of experiment design | variables that are apparent within different treatments; eg: given medication or placebo |
definition of levels in terms of experiment design | formation of treatments determined by which combination of factors used; eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken |
definition of response variable in terms of principle of experiment design | the variable which will answer the question variable of most interest that is measured on subject after treatment |
explain a principles of experiment summarisation table | Factors on x and y axis; levels in first columns and rows; rest of table = number allocated to that particular treatment group |
state the three principles all experiments must follow | compare two or more treatments where one is the control; random assignment of subjects to treatments; repeat the experiment on numerous subjects (for reduction of confounding variables) |
how to randomise | allocate all subjects a random number; order subjects in accordance to those random numbers (smallest to largest, or largest to smallest); form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects |
define control group: | different from all other treatments as it only pretends to apply explanatory variable; is the group that the results are compared against |
explain random comparative experiment | subjects randomly allocated one of several treatments; responses compared across treatment groups |
explain matched pairs design | break subjects with similar properties into pairs; one of two treatments applied to one of each pair; can produce more precise results; used in before and after, and twin studies |
explain random block design | block = group of subjects known before experiment to be similar in some way that would affect response; randomised assignment of treatments to subjects within block; matched pairs is special case of this |
experimental caution: appropriate control | only variant across treatment(s) is/are factor(s) |
experimental caution: beware of bias | administrator of experiment can present bias towards certain treatment to certain subjects double blind accounts for this: neither subject nor administrator know which treatment applied |
experimental caution: repetition of entire subjects | all steps for experiment are performed for all subjects in all treatments |
experimental caution: realistic experiment | experiment needs to duplicate real-world conditions |