Ch4. Sampling Distributions and the Central Limit Theorem

Population and Sample

Population: The entire set of subjects of interest in a study
Sample: A subset selected from the population

Why use samples?

A census of the entire population is often impractical due to cost and time
Samples allow us to infer population characteristics

Parameter: A characteristic of the population (μ, σ) — for example, the true mean household income of all US adults
Statistic: A value calculated from the sample (x̄, s) — for example, mean income in a BLS survey sample

Sampling Methods

Method	Description
Simple Random Sampling	Every individual has an equal probability of selection
Stratified Sampling	Divide into strata (subgroups), then sample from each stratum
Cluster Sampling	Select clusters (groups), then take a census within each cluster
Systematic Sampling	Randomly select the first unit, then select every kth unit

Distribution of the Sample Mean

When we repeatedly draw samples of size n from the same population, the sample mean x̄ has its own distribution.

Population with mean μ and variance σ²:

Sample mean x̄ ~ N(μ, σ²/n)

Expected value: E(x̄) = μ
Variance:       Var(x̄) = σ²/n
Standard error: SE = σ/√n

Key insight: As sample size n increases, the variability of the sample mean (standard error) decreases.

Central Limit Theorem (CLT)

The most important theorem in statistics.

Regardless of the population’s distribution, when n is sufficiently large (typically n ≥ 30), the distribution of the sample mean approximates a normal distribution.

x̄ ~ N(μ, σ²/n)  (when n is sufficiently large)

Standardization: Z = (x̄ − μ) / (σ/√n)

Significance: Even if the population is not normally distributed (e.g., uniform, binomial, skewed), a large enough sample produces a normally distributed sample mean.

Standard Error (SE)

SE = σ/√n  (when population SD is known)
SE = s/√n  (when estimating with sample SD)

Standard Deviation vs Standard Error:

Standard deviation: variability of individual data points
Standard error: variability of the sample mean (precision of the estimate)

If sample size is multiplied by 4, the standard error is cut in half.

Distribution of Sample Proportions

Sample proportion p̂ = number of successes in sample / n

Expected value: E(p̂) = p
Variance:       Var(p̂) = p(1−p)/n
Standard error: √[p(1−p)/n]

By the CLT, p̂ approximates a normal distribution when n is large.

Key Concept Cards

Central Limit Theorem (CLT) ★★★★★ : Regardless of population distribution, when n ≥ 30, the sample mean approximates N(μ, σ²/n). The foundation of statistical inference. Memory tip: large n → sample mean → normal distribution

Standard Error (SE) ★★★★★ : The standard deviation of the sample mean. SE = σ/√n. Larger n → smaller SE → more precise estimates. Memory tip: SE = σ/√n; n × 4 → SE ÷ 2

Parameter vs Statistic ★★★★☆ : Parameters (μ, σ) are fixed population values. Statistics (x̄, s) are estimated values that vary from sample to sample. Memory tip: parameter = population; statistic = sample

Practice Questions

Q. A population has standard deviation 15. What is the standard error for a sample of n = 225?

SE = 15/√225 = 15/15 = 1. The variability in the sample mean is reduced to a standard deviation of 1.

Q. A population has a right-skewed distribution. What is the distribution of the sample mean for n = 50?

By the Central Limit Theorem, with n = 50, the distribution of the sample mean approximates a normal distribution regardless of the population’s shape. Expected value = population mean; standard deviation = population SD / √50.