Ch10. Statistics Comprehensive Review — Essential Formulas and Practical Applications

Statistics: The Full Workflow

Data Collection → Descriptive Statistics → Probability Theory → Inferential Statistics
       ↓                    ↓                      ↓                      ↓
Sample Design       Mean, Std Dev         Distributions, CLT       Estimation, Testing

Essential Formula Reference

Central Tendency and Dispersion

Arithmetic mean:    x̄ = Σxᵢ / n
Sample variance:    s² = Σ(xᵢ−x̄)² / (n−1)
Standard deviation: s = √s²
Coefficient of variation: CV = s/x̄ × 100%

Probability Fundamentals

Addition rule:   P(A∪B) = P(A) + P(B) − P(A∩B)
Multiplication:  P(A∩B) = P(A) × P(B|A)
Conditional:     P(A|B) = P(A∩B) / P(B)
Independence:    P(A∩B) = P(A) × P(B)

Key Probability Distributions

Binomial:  E(X) = np,  Var(X) = np(1−p)
Poisson:   E(X) = Var(X) = λ
Normal:    X~N(μ,σ²) → Z = (X−μ)/σ ~ N(0,1)
t-dist:    t = (x̄−μ₀)/(s/√n),  df = n−1

Estimation and Testing

Standard error:   SE = σ/√n
95% CI:           x̄ ± 1.96 × σ/√n
99% CI:           x̄ ± 2.576 × σ/√n
Z-test statistic: Z = (x̄−μ₀) / (σ/√n)
F-statistic:      F = MSB/MSW
Chi-square:       χ² = Σ(O−E)²/E

Regression Analysis

Slope:                β₁ = r × (sy/sx)
Intercept:            β₀ = ȳ − β₁x̄
Coefficient of det.:  R² = SSR/SST = 1 − SSE/SST

Key Values to Memorize

Quantity	Value
Z₀.₀₂₅ (95% two-tailed)	1.96
Z₀.₀₀₅ (99% two-tailed)	2.576
Z₀.₀₅ (90% two-tailed)	1.645
68–95–99.7% rule	±1σ, ±2σ, ±3σ
Binomial → Poisson approx.	n ≥ 30, p ≤ 0.05, λ = np

Common Statistical Fallacies and Traps

1. Confusing Correlation with Causation

Ice cream sales ↑ → Drowning deaths ↑ (Common cause: summer, not causation)

2. Misinterpreting the p-value

p = 0.03 does NOT mean H₀ has only a 3% probability of being true
→ It means “assuming H₀ is true, this data has only a 3% chance of occurring”

3. Statistical Significance ≠ Practical Importance

With very large n, tiny differences become statistically significant
→ Always report effect size (Cohen’s d, η²) alongside the p-value

4. Survivorship Bias

Analyzing only businesses that survived ignores failures → distorted conclusions
→ Classic example: WWII analysis of returning aircraft showed bullet holes only where planes survived hits — armor was needed elsewhere

5. Extreme Values in Small Samples

Outliers in small samples have outsized effects on the mean → use the median

Real-World Applications of Statistics

Quality Control

Six Sigma: Defect rate < 3.4 ppm (within μ ± 6σ)
Process capability index: Cp = (USL − LSL) / 6σ

A/B Testing

Statistically compares two versions (A and B) — used in website conversion rate optimization, pharmaceutical clinical trials, and advertising effectiveness.

Use t-test or Z-test to compare two proportions or means
Ensure sufficient sample size → adequate statistical power

Economic Indicator Interpretation

GDP growth rate: % change from same period prior year (seasonally adjusted)
Consumer Price Index (CPI): Price level relative to a base year (published monthly by BLS)
Unemployment rate: Unemployed / Labor force (published monthly by BLS, Current Population Survey)

Key Concept Cards

Central Limit Theorem (Review) ★★★★★ : Regardless of population distribution, when n ≥ 30, the sample mean approximates a normal distribution. The theoretical foundation of all statistical inference. Memory tip: CLT = the most important theorem in statistics

Limits of Statistical Significance ★★★★★ : p < 0.05 does not guarantee practical significance. Always interpret alongside effect size. Memory tip: significant ≠ important; large n can make small differences statistically significant

Survivorship Bias ★★★★☆ : Drawing conclusions only from observable (surviving) cases, ignoring those that were eliminated. Memory tip: WWII aircraft — only planes that came back were analyzed, leading to misplaced armor reinforcement

Practice Questions

Q. An online course has a mean satisfaction rating of 7.2 (σ = 1.5, n = 400). What is the 95% confidence interval for the population mean?

SE = 1.5/√400 = 0.075. 95% CI = 7.2 ± 1.96 × 0.075 = [7.053, 7.347].

Q. Two marketing strategies have click-through rates of A = 5.2% (n₁ = 1,000) and B = 5.8% (n₂ = 1,000). Is the difference statistically significant?

Using a two-proportion Z-test: difference = 0.006, pooled SE = √[0.052×0.948/1000 + 0.058×0.942/1000] ≈ 0.0070. Z ≈ 0.857. p > 0.05 → not statistically significant. The absolute difference is too small relative to sampling variability given these sample sizes.