What Is Statistics?

Statistics: The discipline of collecting, organizing, analyzing, and interpreting data to support decision-making under uncertainty.

Descriptive Statistics: Summarizes and describes a dataset
Inferential Statistics: Uses sample data to draw conclusions about a population

Measures of Central Tendency

Indicators that show where data tends to cluster.

Mean (Arithmetic Mean)

Arithmetic Mean = (Sum of all values) / (Number of data points)

Advantage: Intuitive and easy to compute
Disadvantage: Sensitive to extreme values (outliers)

Median

The middle value when data is sorted in order.
Odd count: The middle-positioned value
Even count: Average of the two middle values

Advantage: Robust to outliers
Usage: Real estate prices, income distributions (e.g., US Census median household income)

Mode

The value that appears most frequently. There can be multiple modes.
Usage: Categorical data (clothing sizes, preferred colors)

Relationship Between Mean, Median, and Mode

Distribution Shape	Relationship
Normal distribution (symmetric)	Mean = Median = Mode
Right-skewed (positive skew)	Mode < Median < Mean
Left-skewed (negative skew)	Mean < Median < Mode

Measures of Dispersion

Indicators that show how spread out the data is.

Range

Range = Maximum value − Minimum value

Simple but highly sensitive to outliers.

Variance

Population variance  σ² = Σ(Xᵢ − μ)² / N
Sample variance      s² = Σ(xᵢ − x̄)² / (n−1)

The average of squared deviations. Difficult to interpret because the unit is squared.

Standard Deviation

σ = √Variance (population)
s = √Sample variance (sample)

The square root of variance. Has the same units as the original data.

Applications:

Stock market volatility (higher SD = higher risk)
Quality control (defect rate within specification ±3σ)

Coefficient of Variation (CV)

CV = (Standard Deviation / Mean) × 100%

Used to compare dispersion between two groups with different units.

Quartiles and Box Plots

Quartiles:

Q1 (25th percentile): lower 25% boundary
Q2 (50th percentile): median
Q3 (75th percentile): upper 25% boundary
IQR = Q3 − Q1 (Interquartile Range)

Outlier Detection: below Q1 − 1.5×IQR or above Q3 + 1.5×IQR

Skewness and Kurtosis

Skewness: The degree of asymmetry in the distribution

Positive skew (+): longer right tail
Negative skew (−): longer left tail

Kurtosis: The degree of peakedness

Normal distribution kurtosis = 3 (excess kurtosis = 0)
Kurtosis > 3: more peaked with heavier tails (concentrated risk)

Key Concept Cards

Mean vs Median ★★★★★ : When outliers are present, the median is a better representative value. This is why median household income (reported by the US Census Bureau) is preferred over mean income. Memory tip: outliers present → median is more appropriate

Standard Deviation ★★★★★ : How far, on average, data points deviate from the mean. The larger the SD, the more spread out the data. Memory tip: SD = the magnitude of average deviation from the mean

IQR (Interquartile Range) ★★★★☆ : Q3 − Q1. The range of the middle 50% of data. Used for outlier detection and drawing box plots. Memory tip: IQR = Q3 − Q1; outlier threshold = ±1.5×IQR

Practice Questions

Q. A company’s employee salaries are [ $45,000;$ 48,000; $46,500;$ 45,750; $225,000]. Which is the better representative measure — mean or median?

The median ( $46,500) is more representative. The$ 225,000 (CEO) is an extreme outlier that greatly inflates the mean. The median is unaffected by outliers.

Q. Compare the risk-return efficiency of Stock A (mean return 10%, SD 5%) and Stock B (mean return 20%, SD 8%) using the coefficient of variation.

CV of A = 5/10 = 50%; CV of B = 8/20 = 40%. Stock B has lower variability per unit of return, making it relatively more efficient.

Ch1. Descriptive Statistics — Summarizing Data with Mean, Variance, and Standard Deviation