Academy Chapter 1 4 min read

Ch1. Descriptive Statistics — Summarizing Data with Mean, Variance, and Standard Deviation

O
OIYO Editorial Contributor
1/10

What Is Statistics?

Statistics: The discipline of collecting, organizing, analyzing, and interpreting data to support decision-making under uncertainty.

Descriptive Statistics: Summarizes and describes a dataset
Inferential Statistics: Uses sample data to draw conclusions about a population


Measures of Central Tendency

Indicators that show where data tends to cluster.

Mean (Arithmetic Mean)

Arithmetic Mean = (Sum of all values) / (Number of data points)

Advantage: Intuitive and easy to compute
Disadvantage: Sensitive to extreme values (outliers)

Median

The middle value when data is sorted in order.
Odd count: The middle-positioned value
Even count: Average of the two middle values

Advantage: Robust to outliers
Usage: Real estate prices, income distributions (e.g., US Census median household income)

Mode

The value that appears most frequently. There can be multiple modes.
Usage: Categorical data (clothing sizes, preferred colors)

Relationship Between Mean, Median, and Mode

Distribution ShapeRelationship
Normal distribution (symmetric)Mean = Median = Mode
Right-skewed (positive skew)Mode < Median < Mean
Left-skewed (negative skew)Mean < Median < Mode

Measures of Dispersion

Indicators that show how spread out the data is.

Range

Range = Maximum value − Minimum value

Simple but highly sensitive to outliers.

Variance

Population variance  σ² = Σ(Xᵢ − μ)² / N
Sample variance      s² = Σ(xᵢ − x̄)² / (n−1)

The average of squared deviations. Difficult to interpret because the unit is squared.

Standard Deviation

σ = √Variance (population)
s = √Sample variance (sample)

The square root of variance. Has the same units as the original data.

Applications:

  • Stock market volatility (higher SD = higher risk)
  • Quality control (defect rate within specification ±3σ)

Coefficient of Variation (CV)

CV = (Standard Deviation / Mean) × 100%

Used to compare dispersion between two groups with different units.


Quartiles and Box Plots

Quartiles:

  • Q1 (25th percentile): lower 25% boundary
  • Q2 (50th percentile): median
  • Q3 (75th percentile): upper 25% boundary
  • IQR = Q3 − Q1 (Interquartile Range)

Outlier Detection: below Q1 − 1.5×IQR or above Q3 + 1.5×IQR


Skewness and Kurtosis

Skewness: The degree of asymmetry in the distribution

  • Positive skew (+): longer right tail
  • Negative skew (−): longer left tail

Kurtosis: The degree of peakedness

  • Normal distribution kurtosis = 3 (excess kurtosis = 0)
  • Kurtosis > 3: more peaked with heavier tails (concentrated risk)

Key Concept Cards

Mean vs Median ★★★★★ : When outliers are present, the median is a better representative value. This is why median household income (reported by the US Census Bureau) is preferred over mean income. Memory tip: outliers present → median is more appropriate

Standard Deviation ★★★★★ : How far, on average, data points deviate from the mean. The larger the SD, the more spread out the data. Memory tip: SD = the magnitude of average deviation from the mean

IQR (Interquartile Range) ★★★★☆ : Q3 − Q1. The range of the middle 50% of data. Used for outlier detection and drawing box plots. Memory tip: IQR = Q3 − Q1; outlier threshold = ±1.5×IQR


Practice Questions

Q. A company’s employee salaries are [45,000;45,000; 48,000; 46,500;46,500; 45,750; $225,000]. Which is the better representative measure — mean or median?

The median (46,500)ismorerepresentative.The46,500) is more representative. The 225,000 (CEO) is an extreme outlier that greatly inflates the mean. The median is unaffected by outliers.

Q. Compare the risk-return efficiency of Stock A (mean return 10%, SD 5%) and Stock B (mean return 20%, SD 8%) using the coefficient of variation.

CV of A = 5/10 = 50%; CV of B = 8/20 = 40%. Stock B has lower variability per unit of return, making it relatively more efficient.

O

OIYO Editorial

Content Editor

지식 인큐베이터이자 전문 콘텐츠 크리에이터. 경영, 경제, 법률 및 실생활에 유용한 실무/자격증 중심의 깊이 있는 정보를 연구하고 공유합니다.