click below
click below
Normal Size Small Size show me how
Statistics PPPA 6002
2nd Quiz Study Guide
Question | Answer |
---|---|
Traditionalism | 1) Social Science is not a hard Science 2) Humans are too complex for quantification 3) Historical, anecdotal, journalistic approach |
Behavioralism (aka Basic Research) | 1) There are regularities to permit generlizations 2) Explicit, Replicable, neutral methods 3) Priority: hypothesis testing to build theories Goal: highly predictive interlocking theories |
Applied Research (Post-Behaviorialism or Policy Analysis) | Accepted the merits of explicit, rigorous, replicable scientific methods Changed the goal from building theory to addressing practical/applied/policy questions And acknowledged the role of values in setting research priorities |
Classic Model of the Scientific Process | 1) Theory 2) Deduce Hypothesis from theory 3) Design Study and operationalize concepts 4) Conduct the Study (collect the data) 5) Analyze data to accept/reject hypothesis 6) Support, modify, or reject initial theory |
Model of Applied Research | Begin with specific, practical issue Devise Testable research question - Design study and operationalize concepts - Conduct the study (collect the data) - Analyze data to accept/reject hypothesis Use results to inform decision-maing |
Hypothesis | A testable statement of the relationship between two or more variables |
Theory | A set of logically related propositions intended to explain a range of phenomena |
Main Structure of Research Reports | Intro (Problem Area; Issues) Literature Review Methodology Findings Discussion and Conclusion |
The Strong Lit Review | Primary (not secondary) sources Nonelectronic searches Contact leading researchers Add unpublished/forthcoming research Diagram/model key relationships Use elements of meta-analysis |
Meta-Analysis Steps | (1) Clear Statement of Hypothesis (2) Explicit and Replicable Lit Searches (3) Set Variables for Coding Studies (4) Analyze predictors of the results - Certain factors associated with certain outcomes? |
Good Individual Questions | Short as possible Shared, simple vocab Unbiased Language/premises Unambiguous Answers Confined to one issue Exhaustive/Exclusive Categories |
Good Format and Overall Flow | Brief Smooth Intro Easy Non-threatening start Early closed-ended questions Move from general to specific Delay sensitive issues until later Demographics last Fair Framing Short transitions Consistent series answer format |
Census vs Sample | Use Census if feasible, affordable and not often; but samples usually more practical |
Random vs Nonprobability | Use random samples unless desperate |
Nonprobability Sampling | Convenience Purposive Snowball |
Random sampling includes | Simple (every nth) Stratified (proportionate or non proportionate) |
Simple Random Sampling | Each sample chosen independently and randomly from the sampling frame |
Systematic | Selecting every nth item from a list (from a random point) |
Stratified | Draw random samples within groups if easier or to over sample a group intentionally. Proportionate or Disproportionate |
Response Rate Determinants | Costs - Est. Lengths / Time / Complexity Benefits - Enjoyable / Important/ Satisfaction |
Evaluating a Sample Size | Overall precision (CI) needed Depth of Subgroup analysis As well as the research budget |
95% Confidence Interval - Sample 100 | +/- 10% |
95% Confidence Interval - Sample 600 | +/- 4% |
95% Confidence Interval - Sample 1100 | +/- 3% |
Nominal | Categories by names only (region, religion, sex) |
Ordinal | Categories can be ordered on a single dimension (agree/disagree; highest degree earned; young, middle, old) |
Interval | Increments are consistent but no absolute zero (Fahrenheit, year of birth) |
Ration | Absolute Quantities (amount of dollars, inches, siblings, years, pounds) ask yourself...can it be TWICE AS MUCH? |
Principles of Data Analysis | (1) Good Data are a prerequisite (2) All Statistics are reductionist (3) Context dictates interpretation (4) Avoid Exaggerating small gaps (Bill hates this!) (5) Correlation DOES NOT equal Causation (6) Start with Univariate Analysis |
Univariate Nominal Variables | Mode = Plurality but not always a majority Percentages = usually round % |
Univariate Nominal Variables - Interpretation Pitfalls | Misleading Pictograms Confusing absolute and relative % Misinterpreting nominal nodes as if they were midpoint/averages Misleading/simplified composites from nominal and other modes |
n | Univariate Sample size |
N | Univariate population size |
Measures of Central Tendency | (1) Mean (2) Median (3) Trimmed Means |
Mean | Sum divided by # of cases; very sensitive to extreme values. x with line on top is sample mean; mu which looks like a u is for population mean |
Median | 50th Percentile; half of the cases below; half above; totally insensitive higher and lower values |
Trimmed Means | Discard a percent of the highest/lowest values, top and bottom five percent...used in Olympic scoring |
Measures of Dispersion | (1) Range (2) Standard Deviation (3) Interquartile Range |
Range | Highest to lowest value; crude measure of dispersion |
Standard Deviation (Equation) | Square root of the sum of the squared difference of each case from the mean divided by the number of cases |
Standard Deviation | Shows the range of the middle 68% of cases in a normal curve, otherwise it only tells relative dispersion |
IQR | 25th to 75th percentiles; range of the middle 50% of all cases; easy to explain. |
Smaller IQR/SD Scores | Tight cluster of cases |
Measure of Shape | Skewness |
Skewness | Asymmetrical distribution skewed positively if a few high scores pull the mean above median; reverse (mean below the median) reflects a negative skew. |
The Normal Curve | The Bell Shaped Curve Central Limit Theorem |
+/- 1 Std Dev | 68.3% of all cases |
+/- 2 Std Dev | 95.4% of all cases |
+/- 3 Std Dev | 99.7% of all cases |
Descriptive Statistics | Data of the whole relevant population - treat results as real. |
Inferential Statistics | Used with sample because results are estimates. Keeps us from jumping to conclusions and treating sample estimate as more precise than they really are. |
Population based statistics are... | Descriptive Only |
Sample based statistics are... | Inferential and descriptive |
Formula for 95% CI around a proportion... | (Sqr Root of P multiplied by (1 minus P) divided by Sample Size) mulitplied by 1.96 |
Confidence Intervals for Means Formula | Std Dev of Sample divided by the sqr root of sample size, then multiplied by 1.96 |
When to use T-Test | Comparing means of two groups... (1) using sample data (derived from random sampling) (2) using experimental data (derived from random assignment) |
T-Test Steps | (1) State the Null Hypothesis (2) State Research Hypothesis (3) State Decision Rule (Probability Level) (4) Assume Equal Variance - Unless F-Test is significant (5) Reject or fail to reject the null |
Easiest Null for T-Tests | There is no difference in the mean (dependent variable) of (group 1) and (group 2) |
T-Test Interpretation | (1) Prevents 'jumping to conclusions' when differences in two means may just be random variation (2) Statistical significance is not the same as substantive significance (3) Easy to get stat. sig. with large samples, hard with small samples |
T-Test and Population studies without randomized data | No need for T-Test, because it is inferential. |
Difference in steps between Chi Square and T-Test | T-Test adds the F-test step. |
Similarities between Chi Square and T-Test | (1) Stat. Sig. does NOT necessarily mean it is important or consequential. (2) If NOT stat. sig. remember we never prove the null we just fail to reject the null. (3) A small sample may not be Stat Sig, but could be Stat Sig in a larger sample |
Three Elements of Causal Inference | (1) X & Y covary (2) X precedes Y (3) Rule out the Z's |
Post Hoc Fallacy | Fallacy of concluding that since change in Y followed X, it was caused by X. |
Antecendent Variables | Before X (Z->X->Y) |
Intervening Variables | Between X and Y (X->Z->Y) |
Campbell and Stanley's Notation System | O = Observations (measures) of Y Left to Right = Chronological Order Each Row = One Group of Subjects |
Single Group posttest only | X O |
Single Group pretest-posttest (before and after design) | O X O |
Static Group Design | O X O ----- O O |
History | External event during period |
Maturation | Subjects change over time |
Practice | Familiarity with the measure |
Instrumentation | A changed measure |
Regression to the mean | If subjects are chosen due to extreme scores, they tend to regress to the mean on posttest |
Selection | Groups different from start |
Intragroup history | unique group event |
Mortality | groups differ in attrition |
What to do with Attrition... | (1) Omit pretest scores of lost subjects; (2) Omit all data of lost 'types' from all groups (3) Match by statistical weighting (4) Analyze by "intention to treat" (i.e. include dropouts) |
Between Group Reactivity | (1) Spillover (My buddy is sick and I know if I give him a lime he will get better) (2) Compensatory rivalry (controls try harder) (3) Resentful demoralization (Controls try less...I never get picked so I will just suck) |
Placebo Effect | Subject expectancy to get better and psychologically they do. (Reactivity) |
Novelty Effect | X works because its new. Innovation effect. Short term effect.(reactivity) |
Guinea Pig Effect | Subjects act differently because they feel that they are under surveillance. Evaluation Apprehension - I know I am under |
Demand Effect | Think they know what authority wants of them. The real pills are handed out with more conviction, requires double blind effect to limit. |
Social desirability | Reflexivity - Political Correctness, Societal pressures/inhibitions, I am supposed to act a certain way. |
Hawthorne Effect | Electric Plant Light Dimming Example. Refers to reactivity in general. |
Heisenberg Effect | Act of measuring something changes what you're measuring |
Two Elements of a true experiment | (1) Random Assignment of subjects to groups (2) Random Assignment of Treatments to groups |
Source of power of experiments | Comparability of the groups - the only real difference is one gets X, the other doesn't. Otherwise the two groups are identical. |
Classic Experimental Design | R) O X O R) O O |
Posttest Only Experiment | R) X O R) O |
Factorial Design | R) O Xa Xb O R) O Xa O R) O Xb O R) O O |
Complex X | Many ingredients in X |
Multiple Ys | Studies often measure the impact of X on several Ys. |
Compensatory Rivalry | Controls try harder |
Resentful demoralization | Controls try less |
Spillover effects/diffusion | Some X spills over to controls |
Strategies to minimize reactivity | (1) deceit (2) obscure / mislead (3) use placebo (4) double blind (5) time (hope they forget the study) |
Placebo | A dummy treatment given to the controls to 'hold constant' the impact of their expectations. Common in medical studies; not always possible. |
Natural Experiment | Both subjects and X were randomly assigned without a researcher's intervention; term is also sometimes used less strictly to refer to a close natural approximation even if lacking in randomization |
Big Four Categories of Validity | (1) Measurement Validity (2) Internal Validity (3) Statistical Conclusion Validity (4) External Validity |
External Validity | Generalizability; the essential yet unavoidably subjective judgment about the extent to which it is reasonable to generalize/extrapolate the findings of one study to other places, subjects, times, etc. |
How to strengthen external validity | (1) Test subjects representative of the subjects you want to generalize to (2) replications in varied settings (3) Consistent results in varied tests |
Limitations of Experiments | (1) Unethical or illegal to withhold X (2) Unethical or illegal to risk trying X (3) Unaffordable to finance in field (4) Infeasible to enforce X vs no X (5) Impractical to field test outside a lab |
Quasi-Experimental Designs | Commonly means any clever design lacking randomized control groups |
Causal-Comparative Designs | Studies that seek to infer causality using comparison groups without randomly assigned subjects |
Primary threat of Internal validity when no randomization | Selection |
NEC | Nonequivelent Comparison Group Design |
Nonequivalent Comparison Group Designs | O X O ----- O O |
Retrospective match / Ex post facto design | Creating a comparison group later by finding and matching subjects similar to those who previously got exposed to X. |
Time Series Designs | X may be short term or enduring. Top internal validity threat is history. Trend line makes it superior to O X O. |
Simple Interrupted Time Series | O O O O O O X O O O O O O |
Reiterative Time Series | O O X O O X O O X O O |
Comparison Time Series | O O O O O X O O O O O --------------------- O O O O O O O O O O |
Multiple Time Series | O O X O O O O O O --------------------- O O O O X O O O O --------------------- O O O O O O X O O ---------------------- O O O O O O O O |
Panel | Repeated data tracking same people; valuable but expensive, can produce reactivity |
Cross-sectional data | Time series with new random samples from same population. Shows net change but masks the rest. |
Deceptive Time Series Charts | Using a truncated base plus narrow or wide axis. |
Retrospective pretests | Proxy pretests - recollections used for pretest measure. |
Danger of time series inferences from a single survey | Can not infer age = time. Bill used the Navy Officer surveys of high ranking and low ranking officers, infering that low ranking officers will think like high ranking officer when they get there. |
Correlational Designs | Typically using a single survey to try to "statistically control" for alternative explanations, often using multiple regression. Issues with selection. |
Aggregate Data | Units of analysis are groups, such as precincts, cities, states. |
Ecological Fallacy | Drawing individual level inferences from aggregae-level correlations. |
Check list of Empirical Studies | (1) Theory Building or Applied Research (2) Causal or Descriptive (3) Exact Hypothesis (4) Independent Variable(s) (5) Dependent Variable(s) |
When Something is NOT Statistically Significant | Do not bring it up. Consider the dispersion between the groups. |
T-Test Analysis | Analysis is black and white, it is or it isn't stat. sig. If you hit .05, you have a slight relationship. State just that, a slight relationship. |
Grouping Ratios | Becomes Ordinal |
Central Tendency | Mean, Median, Trimmed Mean |
Extreme Lopsided Distribution does what to Confidence Intervals? | Becomes Smaller |
At what level is .012 statistically significant? | It is Stat. Sig at .05, but NOT at .001 or .01. |
True or False - Standard Deviation is a measure of Central Tendency? | False |
What is the biggest threat to NEC design? | Selection |
What is the biggest threat to Time Series Designs? | History |
What does comparing results to go good existing records? | Concurrent Validity |
What are two elements of dispersion? | IQR and SD |
Two Types of Empirical Validity | (1) Concurrent Validity (2) Predictive Validity |
Concurrent Validity | Testing a measure against existing data believed accurate. (Empirical) |
Predictive Validity | Testing a measure designed to predict future outcomes by the actual success of its forecasts. (Empirical) |
Subjective Validity | (1) Face Validity (2) Content Validity |
Face Validity | Operationalizing the usual usage of a word in a reasonable way. |
Content Validity | Operationalizing the full scope of the entire intended concept and not just a part of it. |
Multiple Measures (Triangulation) | Assessment using a variety of indicators (not just one) |
Unobtrusive Measures | No survey - Measuring actual behavior - not just self-reported behavior. |
Validity | Accuracy |
Reliability | Consistency |
According SPSS Scale Measurements are... | Interval and Ratio |
Content Analysis Steps | (1) Define exact scope of the study (dates, sources, search strategy); (2) Operationalize variables to code; (3) Refine coding system & test reliability; (4) Code the content under study; (5) Analyze Patterns |
Is Content Analysis Descriptive or Causal? | By itself it's descriptive. If part of a study it can be Causal. |
Intercoder Reliability Test | Where independent coders, at least 2, evaluate a characteristic of a message or artifact and reach the same conclusion. Must have atleast 80% rate. |
What to worry about in analyzing patterns in Content Analysis... | Caution in drawing inferences. |
Types of Operationalize Variables to Code | (1) Specific Word Count; (2) Sources Quoted; (3) Topics; (4) Overt Visual Image; (5) Voice Inflections; (6) Subtle Themes; (7) Global Code |
Uses in Content Analysis | History, Public Relations, National Intelligence, Lobbying, Detective Work, Mass Communication, Linguistics |
Content Analysis | Systematic analysis of patterns in communications |
When to use inferential Stats? | Randomized - ALWAYS! Population - Use if group can be used as a sample. |
Qualitative Research | More exploratory, small purposive "samples", open-ended semi-structured interviews, more time per subject, narrative format, note researchers impact. |
Quantitative Research | More defined, specific hypothesis testing, large random samples, close-ended instruments, less time per subject, data-based reports, distant/unacknowledged. |
Matching Qualitative and Quantitative | Start with Qualitative research to define the issues/vocabulary, to help generate/refine research questions, test a draft questionnaire. Then conduct quantitative study. Use qualitative to explore puzzles found. |
Purpose of Focus Groups | In-depth probing of views (pre-existing); Reactions to new stimuli (new responses); Group brainstorming (new idea generation); |
Focus Groups Format | Recruit relevant participants; 10-12 people, 1.5 to 2 hours long, audio/video taped, semi-structured format w/ open ended agenda questions, neutral moderator. |
The right number of Focus Group meetings | Depends on resources, how much is at stake, but at least more than one! |
Bivariate Regression | One X, Correlation Coefficient = r, Coefficient of Determination = r2, Y=a + bX |
Multiple Regression | Two or more Xs, Multiple Correlation Coefficient = Multiple R, Multiple Coefficient of Determination = Multiple R2, Y=a+b1X1 + b2X2...b#X# |
Y = a + bX | a=intercept; b=slope |
Multiple Correlation Coefficient | Multiple R |
Multiple Coefficient of Determination | R (squared) |
Unstandardized Coefficients in Multiple Regression Equations | Symbol: b; Unstandardized Partial regression coefficient/slop; slope change measured in original units; |
How to interpret Unstandardized Coefficients in Multiple Regression Equations | If b is -3, subtract 3 years for every pack of cigarettes. |
Standardized Coefficients in Multiple Regression Equations | Symbol: B (Greek Beta); Beta or beta weight or standardized partial regression coefficient/slope; in units standardized as Z-scores (Std. Dev. Units) to allow comparisons. |
How to interpret Standarized Coefficients in Multiple Regression Equations | Use for ranking variables: The higher the beta the more powerful the X. |
Multicollinearity | Overlap of variables |
Dummy Variable | When there is a dichotomy within variables, this process enables the portion of the variable not being measured to not be calculated. |
r | Correlation Coefficient |
Correlation Coefficient (r) | Summarizes the strengths of the linear relationship between two scale variables. Perfect Positive Correlation 1.0 (Left up to Right); Perfect Negative Correlation -1.0 (Left down to Right). 0 = No correlation. |
r(squared) | Coefficient of Determination |
Coefficient of Determination (r2) | Indicates strength of relationship but has no negative sign. Yields lower but more intuitive score. |
Role of Correlation Coefficient and Coefficient of Determination | Both summarize (in slightly different ways) the strength of the relationship between two scale variables. Neither is inferential. |
Feature of Correlation Coefficient | Shows strength and direction, though somewhate inflated. |
Feature of Coefficient of Determination | Shows strength and proportion of variation explained, but lacks direction sign. |
Homoscedasticity | Even variation around the slope (Homo is straight) |
Heteroscedasticity | Uneven Variation on the slope (Hetero is balled up) |
Bivariate Analysis of Outliers | Could be bad data, but may provide lesson learned data for how to do it right or very bad. |
Standard Error of the Estimate (SEE) | Applies lines that show what falls within the 68% of the regression line. |
Is Standard Error of the Estimate Inferential? | Not just no, but hell no! |
Aggregate Data | Units of analysis are collectivities (i.e. counties, states, countries) |
Ecological Fallacy | Drawing individual-level inference from a pattern in aggregate data. |