Ch10. Statistics Comprehensive Review — Essential Formulas and Practical Applications
Statistics: The Full Workflow
Data Collection → Descriptive Statistics → Probability Theory → Inferential Statistics
↓ ↓ ↓ ↓
Sample Design Mean, Std Dev Distributions, CLT Estimation, Testing
Essential Formula Reference
Central Tendency and Dispersion
Arithmetic mean: x̄ = Σxᵢ / n
Sample variance: s² = Σ(xᵢ−x̄)² / (n−1)
Standard deviation: s = √s²
Coefficient of variation: CV = s/x̄ × 100%
Probability Fundamentals
Addition rule: P(A∪B) = P(A) + P(B) − P(A∩B)
Multiplication: P(A∩B) = P(A) × P(B|A)
Conditional: P(A|B) = P(A∩B) / P(B)
Independence: P(A∩B) = P(A) × P(B)
Key Probability Distributions
Binomial: E(X) = np, Var(X) = np(1−p)
Poisson: E(X) = Var(X) = λ
Normal: X~N(μ,σ²) → Z = (X−μ)/σ ~ N(0,1)
t-dist: t = (x̄−μ₀)/(s/√n), df = n−1
Estimation and Testing
Standard error: SE = σ/√n
95% CI: x̄ ± 1.96 × σ/√n
99% CI: x̄ ± 2.576 × σ/√n
Z-test statistic: Z = (x̄−μ₀) / (σ/√n)
F-statistic: F = MSB/MSW
Chi-square: χ² = Σ(O−E)²/E
Regression Analysis
Slope: β₁ = r × (sy/sx)
Intercept: β₀ = ȳ − β₁x̄
Coefficient of det.: R² = SSR/SST = 1 − SSE/SST
Key Values to Memorize
| Quantity | Value |
|---|---|
| Z₀.₀₂₅ (95% two-tailed) | 1.96 |
| Z₀.₀₀₅ (99% two-tailed) | 2.576 |
| Z₀.₀₅ (90% two-tailed) | 1.645 |
| 68–95–99.7% rule | ±1σ, ±2σ, ±3σ |
| Binomial → Poisson approx. | n ≥ 30, p ≤ 0.05, λ = np |
Common Statistical Fallacies and Traps
1. Confusing Correlation with Causation
Ice cream sales ↑ → Drowning deaths ↑ (Common cause: summer, not causation)
2. Misinterpreting the p-value
p = 0.03 does NOT mean H₀ has only a 3% probability of being true
→ It means “assuming H₀ is true, this data has only a 3% chance of occurring”
3. Statistical Significance ≠ Practical Importance
With very large n, tiny differences become statistically significant
→ Always report effect size (Cohen’s d, η²) alongside the p-value
4. Survivorship Bias
Analyzing only businesses that survived ignores failures → distorted conclusions
→ Classic example: WWII analysis of returning aircraft showed bullet holes only where planes survived hits — armor was needed elsewhere
5. Extreme Values in Small Samples
Outliers in small samples have outsized effects on the mean → use the median
Real-World Applications of Statistics
Quality Control
Six Sigma: Defect rate < 3.4 ppm (within μ ± 6σ)
Process capability index: Cp = (USL − LSL) / 6σ
A/B Testing
Statistically compares two versions (A and B) — used in website conversion rate optimization, pharmaceutical clinical trials, and advertising effectiveness.
Use t-test or Z-test to compare two proportions or means
Ensure sufficient sample size → adequate statistical power
Economic Indicator Interpretation
- GDP growth rate: % change from same period prior year (seasonally adjusted)
- Consumer Price Index (CPI): Price level relative to a base year (published monthly by BLS)
- Unemployment rate: Unemployed / Labor force (published monthly by BLS, Current Population Survey)
Key Concept Cards
Central Limit Theorem (Review) ★★★★★ : Regardless of population distribution, when n ≥ 30, the sample mean approximates a normal distribution. The theoretical foundation of all statistical inference. Memory tip: CLT = the most important theorem in statistics
Limits of Statistical Significance ★★★★★ : p < 0.05 does not guarantee practical significance. Always interpret alongside effect size. Memory tip: significant ≠ important; large n can make small differences statistically significant
Survivorship Bias ★★★★☆ : Drawing conclusions only from observable (surviving) cases, ignoring those that were eliminated. Memory tip: WWII aircraft — only planes that came back were analyzed, leading to misplaced armor reinforcement
Practice Questions
Q. An online course has a mean satisfaction rating of 7.2 (σ = 1.5, n = 400). What is the 95% confidence interval for the population mean?
SE = 1.5/√400 = 0.075. 95% CI = 7.2 ± 1.96 × 0.075 = [7.053, 7.347].
Q. Two marketing strategies have click-through rates of A = 5.2% (n₁ = 1,000) and B = 5.8% (n₂ = 1,000). Is the difference statistically significant?
Using a two-proportion Z-test: difference = 0.006, pooled SE = √[0.052×0.948/1000 + 0.058×0.942/1000] ≈ 0.0070. Z ≈ 0.857. p > 0.05 → not statistically significant. The absolute difference is too small relative to sampling variability given these sample sizes.
OIYO Editorial
Content Editor지식 인큐베이터이자 전문 콘텐츠 크리에이터. 경영, 경제, 법률 및 실생활에 유용한 실무/자격증 중심의 깊이 있는 정보를 연구하고 공유합니다.