click below
click below
Normal Size Small Size show me how
stat Midterm front
stat flashcards
Front | Back |
---|---|
Sample | Subgroup of the population |
Sampling | Process of selecting sample from population |
Random sampling | Independent selection |
Descriptive vs. Inferential Statistics | – Descriptive: primary purpose is to describe some aspect of the data Inferential: primary purpose is to infer (to estimate or to make a decision, test a hypothesis) |
All inferential statistics have the following in common: | – use of some descriptive statistic – use of probability – potential for estimation – sampling variability – sampling distributions – use of a theoretical distribution – two hypotheses, two decisions, two types of error |
Research defined | Structured Problem Solving |
Scientific methods: steps (cyclic) | – 1. encounter and identify problem – 2. formulate hypotheses, define variables – 3. think through consequences of hypotheses – 4. design & run study, collect data, compute statistics, test hypotheses – 5. draw conclusions |
Variable | entity that is free to take on different values |
ndependent variable (IV) | its values are manipulated by the researcher, comes first in time |
Dependent variable (DV) | measured by researcher, follows the IV in time |
Population | Target group for inference |
Extraneous variable (EV) | controlled by researcher • randomization of subjects to groups • keep all subjects constant on EV • include EV in the design of the experiment |
Predictor variable (PV) | comes first in time but there is no manipulation, analogous to IV. |
Criterion variable (CV): | follows PV in time, analogous to DV. |
Causal relationship: | IV causes the DV |
Predictive relationship: | PV predicts the CV |
2 Types of research | 1. experimental 2. observational |
True experiment | • manipulation of IV • randomization of subjects to groups • causal relationship between IV and DV |
Observational research | • no manipulation • minimal control of EV • predictive relationship between PV and CV |
Stem and Leaf Display | • The first digit(s) of a score form the stem, the last digit(s) form the leaf. • We want 10-20 total number of stems. • Number of stems per digit depends on total number of stems: can do 1, 2, or 5 stems per digit. |
Description With Statistics Aspects or characteristics of data that we can describe are: | – Middle – Spread – Skewness – Kurtosis |
Other words that describe Middle | central tendency, location, center |
Statistics that Measure middle are: | mean, median, mode • “Middle” is the aspect of data we want to describe. • We describe/measure the middle of data in a population with the parameter m (‘mu’); we usually don’t know m, so we estimate it with X-bar. |
Other words that describe Spread | variability, dispersion, skatter |
Statistics that Measure spread are: | range, variance, standard deviation, midrange • “Spread” is the aspect of data we want to describe. • Any statistic that describes/measures spread should have these characteristics: it should – Equal zero when the spread is zero. – Inc |
Skewness | =departure from symmetry – Positive skewness = tail (extreme scores) in positive direction – Negative skewness = tail (extreme scores) in negative direction (The Few name the Skew) |
Kurtosis | peakedness relative to normal curve |
Sample Mean | -The sample mean is the sum of the scores divided by the number of scores, and is symbolized by X-bar, X = SX/N -For example, for X1=4, X2=1, X3=7, N=3, SX=12 and X = SX/N = 12/3 = 4 • Characteristics: – X-bar is the balance point |
Sample Median | • The median is the middle of the ordered scores, and is symbolized as X50. • Median position (as distinct from the median itself) is (N+1)/2 and is used to find the median. • Example: X1=4, X2=1, X3=7, then N=3. • Characteristic |
Sample Mode | • The mode is the most frequent score. • Examples: – 1 1 4 7, the mode is 1. – 1 1 4 7 7, there are two modes, 1 and 7. – 1 4 7, there is no mode. • Characteristics: – Has problems: more than one, or none; maybe not in the mid |
Spred cont. | • We describe/measure the spread of data in a sample with the statistics: – Range = high score-low score. – Midrange, MR. – Sample variance, s*². – Sample standard deviation, s*. – Unbiased variance estimate, s². – s. • We des |
Midrange (MR) | • Formula is MR=UH-LH – UH=upper hinge – LH=lower hinge – Hinges cut off 25% of the data in each tail • Hinge position is ([median position]+1)/2. – [median position] is the whole number part of the median position (remember, median p |
Hinge position | ([median position]+1)/2 – [median position] is the whole number part of the median position (remember, median pos.=(N+1)/2) • Use hinge position to count in from the tails to find the hinges. |
Sample Standard Deviation, s*Sample Variance, s*² | • Definitional formula: s*²=S(X-X)²/N, the average squared deviation from X-bar. Sample Standard Deviation= s* Unbiased Variance Estimate, s² |
Box-plots | • A pictorial description that uses a box to show the middle of the data and lines called whiskers to show the tails of a distribution. |
3 Parts to Box Plot | 1.) Box 2.) Wiskers 3.) Outliers |
Box | – Upper end is at the UH, lower end is at the LH - Line across the middle is X50 |
Whiskers | – Whiskers are lines drawn from the ends of the box (the hinges) to adjacent values, UAV & LAV. – Adjacent values are the first real data values inside the inner fences. – Inner fences, upper and lower • Upper, UIF=UH+1.5MR • Lower, LIF= L |
Outliers | Outliers: outside whiskers, marked with |
Midrange (MR) | UH- LH |
z Scores | • The aspect of the data we want to describe/measure is relative position. • z scores are statistics that describe the relative position of something in its distribution. |
Z score formula | z is something minus its mean divided by its standard deviation. |
z score characteristics | – The mean of a distribution of z scores is zero. – The variance of a distribution of z scores is one. – The shape of a distribution of z scores is reflective, the shape is the same as the shape of the distribution of the Xs. |
Characteristics of Normal Distributions | – Symmetric, continuous, unimodal. – Bell-shaped. – Scores range from -¥ to +¥ . – Mean, median, and mode are all the same value. – Each distribution has two parameters, m and s². |
Use of Z score | • We use this distribution to get probabilities associated with a z score (probability, proportion, and area under the curve are synonymous). - look up z in table to find probabilities. |
Correlation | – Defined as the degree of linear relationship between X and Y. – Is measured/described by the statistic r. |
Regression | – Is concerned with the prediction of Y from X Forms a prediction equation to predict Y from X Uses the formula for a straight line, Y’=bX+a. – Y’ is the predicted Y score on the criterion variable. – b is the slope, b=DY/ D X=rise/run. – |
r= | r=SzXzY/N, the average product of z scores for X and Y – Works with two variables, X and Y – -1<r<1, r measures positive or negative relationships – Measures only the degree of linear relationship – r2=proportion of variability in Y that is e |
r2= | proportion of variability in Y that is explained by X. |
Correlation: Undefined | If there is no spread in X or Y, then r is undefined. Note that any z is undefined if the standard deviation is zero, and r=SzXzY/N. |
Population correlation coefficient, | r (rho) |
regression cont. | • Linear only. • Generalize only for X values in your sample. • Actual observed Y is different from Y’ by an amount called error, e, that is, Y=Y’+e. • Error in regression is e=Y-Y’. • Many different potential regression |
Line of Best Fit | The statistics b and a are computed so as to minimize the sum of squared errors, – Se2=S(Y-Y’)2 is a minimum. – This is called the Least Squares Criterion. |
Partition total spread | – Total = Explained + Not Explained – This is true for proportion of spread and amount of spread. • Proportion: 1 = r2 + (1-r2) • Amount: s2y = s2y r2 + s2y(1-r2) |
Probability | Defined as relative frequency of occurence. |
Sample space | all possible outcomes of an experiment |
Elementary event | a single member of the sample space |
Event | any collection of elementary events |
p(elementary event | 1/(total number) |
p(event) | (number in the event)/(total number) |
Conditional probability | • p(A|B)=(number in [A and B])/(number in B) • The probability of A in the redefined (reduced) sample space of B. |
Big 3 Probability Rules | 1. independence 2. mulitplication, mutually exclusive 3.) addition |
Independence (1) | events A and B are independent if • p(A|B)=p(A) • The A probability is not changed by reducing the sample space to B. |
Multiplication (And) Rule (2) | • p(A and B)=p(A)p(B|A)=p(A|B)p(B) |
Mutually exclusive: | • Events A and B do not have any elementary events in common. • Events A and B cannot occur simultaneously. • p(A and B)=0 |
Addition (Or) Rule (3) | p(A or B)=p(A)+p(B)-p(A and B) |
The sampling distribution of X-bar | – Has the purpose of any sampling distribution: to obtain probabilities… – Has the definition of any sampling distribution: the distribution of a statistic. – Has specific characteristics: • Mean: mX = m • Variance: s2X =s2/N • Shape i |
Hypothesis testing | is the process of testing tentative guesses about relationships between variables in populations. These relationships between variables are evidenced in a statement , a hypothesis, about a population parameter. |
Test statistic | a statistic used only for the purpose of testing hypotheses; e.g. zX |
Assumptions | conditions placed on a test statistic necessary for its valid use in hypothesis testing;– for zX, the assumptions are that the population is normal in shape and that the observations are independent. |
Null hypothesis | the hypothesis that we test; Ho. |
Alternative hypothesis | where we put what we believe; H |
Significance level | he standard for what we mean by a “small” probability in hypothesis testing; a. The significance level is the small probability used in hypothesis testing to determine an unusual event that leads you to reject Ho. – The significance level is sym |
Direcetional v. Non-Directional Hypothesis | >,<, or = • Directional hypotheses specify a particular direction for values of the parameter. – IQ of deaf children example: Ho: m>100, H1: m<100. • Non-directional hypotheses do not specify a particular direction for values of the paramet |
One- and two-tailed tests | – A one-tailed test is a statistical test that uses only one tail of the sampling distribution of the test statistic. – A two-tailed test is a statistical test that uses two tails of the sampling distribution of the test statistic. |
Critical values | values of the test statistic that cut off a or a/2 in the tail(s) of the theoretical reference distribution. |
Rejection values | the values of the test statistic that lead to rejection of Ho |
p-Value Decision Rules | • Reject Ho if – ½ the SAS p-value <a, and – the observed zX is in the tail specified by H1. |