CFA Level 1 Hypothesis Testing: Sampling, CLT, Confidence Intervals & Statistical Tests

This page covers Learning Modules 7–9 of the 2026 CFA Level 1 Quantitative Methods curriculum. These three modules build on the statistical foundations from Probability Concepts (LM 3–6) and take you from sampling and estimation all the way through parametric and nonparametric hypothesis tests. Quant Methods carries a 6–9% exam weight, and hypothesis testing is one of its most testable sections.

What’s Covered: Three Learning Modules at a Glance

Learning Module	Topic	Core Skills
LM 7	Estimation and Inference	Sampling methods, sampling error, Central Limit Theorem, standard error, confidence intervals, bootstrapping
LM 8	Hypothesis Testing	6-step process, null/alternative hypotheses, Type I & II errors, power, t-tests, chi-square, F-test, p-values
LM 9	Parametric & Non-Parametric Tests of Independence	Pearson correlation test, Spearman rank correlation, contingency tables, chi-square test of independence

LM 7: Estimation and Inference

Before you can test hypotheses, you need to understand how samples relate to populations. This module covers how to draw samples, what happens to sample statistics as sample size grows, and how to construct confidence intervals.

Sampling Methods

The CFA curriculum distinguishes between probability sampling (where every element has a known chance of selection) and non-probability sampling (where it doesn’t).

Method	How It Works	When to Use
Simple Random	Every member of the population has an equal probability of being selected	Default method; works well when the population is relatively homogeneous
Stratified Random	Divide population into subgroups (strata), then sample randomly within each stratum	When subgroups differ meaningfully — e.g., sampling a bond index by credit rating and maturity
Cluster	Divide population into clusters, randomly select entire clusters, then sample within them	When a full population list is impractical — e.g., geographic clusters
Convenience (non-probability)	Select whatever data is readily available	Quick and cheap, but introduces selection bias
Judgmental (non-probability)	Researcher handpicks elements based on expertise	Relies on the researcher’s knowledge; results may not generalize

Exam Tip: Stratified vs. Cluster

Stratified sampling ensures representation from every subgroup. Cluster sampling selects entire groups at random. For bond index replication, stratified sampling is the standard approach — you match the index’s duration, credit quality, and sector breakdown.

Sampling Error

Sampling error is the difference between a sample statistic (like the sample mean) and the corresponding population parameter. It’s unavoidable — you’re estimating a population from a subset. The goal isn’t to eliminate sampling error but to understand and control it.

The Central Limit Theorem (CLT)

The CLT is one of the most powerful results in statistics, and it’s a favorite CFA exam topic.

Central Limit Theorem: Given a population with mean μ and finite variance σ², the sampling distribution of the sample mean approaches a normal distribution with mean μ and variance σ²/n as the sample size n becomes large — regardless of the shape of the original population distribution.

Why this matters: even if the underlying returns are skewed or non-normal, the distribution of the sample mean will be approximately normal for large samples (typically n ≥ 30). This is what justifies using z-tests and t-tests on real financial data.

Standard Error of the Sample Mean

Standard Error (known population σ) σ_x̄ = σ / √n

Standard Error (unknown σ, use sample s) s_x̄ = s / √n

The standard error tells you how much the sample mean is expected to vary from the population mean. As sample size increases, standard error decreases — your estimate gets more precise. Doubling the sample size reduces the standard error by a factor of √2 (about 29%), not by half.

Confidence Intervals

Confidence Interval for Population Mean X̄ ± (Critical Value × Standard Error)

A 95% confidence interval means: if we repeated this sampling procedure many times, 95% of the resulting intervals would contain the true population mean. It does not mean there’s a 95% probability the population mean is in this particular interval.

Confidence Level	z Critical Value (two-tailed)	Interpretation
90%	±1.645	Wider net, less precision
95%	±1.960	Most commonly used in practice
99%	±2.576	Very high confidence, very wide interval

z vs. t: When to Use Which

Use the z-statistic when population variance is known (rare in practice). Use the t-statistic when population variance is unknown and you’re using the sample standard deviation — this is the case on almost every CFA exam question. As sample size grows large, the t-distribution converges to the z-distribution.

LM 8: Hypothesis Testing

This is the core of the three modules. You need to know the 6-step process cold and be able to apply it to questions about means, differences between means, and variances.

The 6-Step Hypothesis Testing Process

Step	What You Do	Key Details
1. State the hypotheses	Define H₀ (null) and Hₐ (alternative)	The null is what you’re trying to reject. The alternative is what you’re trying to support.
2. Identify the test statistic	Choose the right test (z, t, chi-square, F)	Depends on what you’re testing (mean, variance, proportion) and what you know about the population
3. Specify significance level	Set α (typically 0.05 or 0.01)	α = probability of Type I error = probability of rejecting a true null
4. State the decision rule	Determine the critical value(s)	Reject H₀ if test statistic falls in the rejection region (beyond critical values)
5. Calculate test statistic	Plug sample data into the formula	Compare computed value to critical value
6. Make a decision	Reject or fail to reject H₀	You never “accept” the null — you either reject it or fail to reject it

One-Tailed vs. Two-Tailed Tests

Test Type	Hypotheses	Rejection Region
Two-tailed	H₀: μ = μ₀ vs. Hₐ: μ ≠ μ₀	Both tails — reject if test stat is too far in either direction
Upper one-tailed	H₀: μ ≤ μ₀ vs. Hₐ: μ > μ₀	Right tail only
Lower one-tailed	H₀: μ ≥ μ₀ vs. Hₐ: μ < μ₀	Left tail only

Type I and Type II Errors

This is tested constantly. You must know the trade-off.

	H₀ is Actually True	H₀ is Actually False
Reject H₀	Type I Error (false positive) — probability = α	Correct decision — probability = Power (1 − β)
Fail to reject H₀	Correct decision — probability = (1 − α)	Type II Error (false negative) — probability = β

Key relationships:

α (significance level) = P(Type I error) = P(rejecting a true null)
β = P(Type II error) = P(failing to reject a false null)
Power = 1 − β = probability of correctly rejecting a false null
Decreasing α (say from 5% to 1%) increases β — there’s a direct trade-off
Increasing sample size reduces both types of error

The p-Value Approach

The p-value is the smallest significance level at which you would reject the null. If p-value ≤ α, reject H₀. If p-value > α, fail to reject. Many CFA questions give you a p-value and ask for the conclusion at a given significance level — just compare the two numbers.

Tests of a Single Mean

t-Statistic for a Population Mean t = (X̄ − μ₀) / (s / √n)

Degrees of freedom: n − 1. This is the workhorse test on the exam. You’ll be given a sample mean, hypothesized population mean, sample standard deviation, and sample size.

Tests of Differences Between Means

The exam tests two scenarios:

Scenario	Test	When to Use
Independent samples, equal variances	Pooled t-test	Two separate groups (e.g., returns of fund A vs. fund B)
Dependent (paired) samples	Paired t-test (test of mean differences)	Same group measured twice (e.g., returns before and after an event)

Paired t-Test t = d̄ / (s_d / √n)

Where d̄ is the mean of the differences and s_d is the standard deviation of the differences. Degrees of freedom: n − 1 (number of pairs minus 1).

Test of a Single Variance (Chi-Square)

Chi-Square Test for Variance χ² = (n − 1)s² / σ₀²

Degrees of freedom: n − 1. The chi-square distribution is always non-negative and right-skewed. Use this when testing whether a portfolio’s volatility matches a claimed level.

Test of Equality of Two Variances (F-Test)

F-Statistic F = s₁² / s₂²

Put the larger variance in the numerator. Degrees of freedom: (n₁ − 1, n₂ − 1). The F-distribution is always positive. You’ll use this to test whether two portfolios have significantly different risk levels.

Parametric vs. Nonparametric Tests

Feature	Parametric Tests	Nonparametric Tests
Assumptions	Specific distributional assumptions (e.g., normality)	Minimal or no distributional assumptions
Data type	Continuous, interval/ratio scale	Ordinal, ranked, or non-normal data
Power	More powerful when assumptions hold	Less powerful but more robust
Examples	z-test, t-test, F-test	Spearman rank correlation, chi-square test of independence

When to Go Nonparametric

Use nonparametric tests when: (1) data don’t meet normality assumptions, (2) data are ranked or ordinal, (3) sample is small and you can’t verify normality, or (4) there are significant outliers.

LM 9: Parametric & Non-Parametric Tests of Independence

This module applies hypothesis testing specifically to testing whether two variables are related. It covers three tests you need to know.

Parametric Test of Correlation (Pearson)

Tests whether the population correlation coefficient (ρ) equals zero.

t-Test for Correlation t = r√(n − 2) / √(1 − r²)

Degrees of freedom: n − 2. Reject H₀: ρ = 0 if the calculated t exceeds the critical value. An important nuance: as sample size increases, smaller correlations become statistically significant — a correlation of r = 0.35 might not be significant with n = 12, but it could be significant with n = 32.

Statistical vs. Economic Significance

A large sample can make a tiny correlation “statistically significant” even when it’s economically meaningless. Always consider the practical significance of your findings, not just the p-value.

Spearman Rank Correlation

The nonparametric alternative to Pearson. Instead of testing raw data values, you rank them first and then calculate the correlation on the ranks. Use it when:

Data may not be normally distributed
You’re working with ordinal data (rankings, ratings)
Outliers are a concern
The relationship might be monotonic but not linear

The test for significance uses the same t-formula as Pearson, just applied to the rank correlation coefficient (r_s) instead of the raw correlation.

Chi-Square Test of Independence (Contingency Tables)

Tests whether two categorical variables are independent using observed vs. expected frequencies in a contingency table.

Chi-Square Test of Independence χ² = Σ [(Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ]

Where O is the observed frequency and E is the expected frequency (calculated assuming independence). Degrees of freedom: (rows − 1)(columns − 1). This is always a one-sided test — the rejection region is on the right because the chi-square statistic is always positive.

Example application: testing whether ETF performance category (outperform/underperform) is independent of fund type (equity/bond/alternative). If the chi-square statistic exceeds the critical value, you reject independence — the two variables are related.

Which Test to Use: Decision Framework

The exam often tests whether you can pick the right test for the scenario. Here’s a quick decision guide:

What You’re Testing	Test Statistic	Distribution	Degrees of Freedom
Single mean (σ unknown)	t = (X̄ − μ₀) / (s/√n)	t	n − 1
Difference of means (independent)	Pooled t-test	t	n₁ + n₂ − 2
Difference of means (paired)	t = d̄ / (s_d/√n)	t	n − 1
Single variance	χ² = (n−1)s²/σ₀²	Chi-square	n − 1
Equality of two variances	F = s₁²/s₂²	F	(n₁−1, n₂−1)
Correlation (parametric)	t = r√(n−2)/√(1−r²)	t	n − 2
Independence (categorical)	χ² = Σ(O−E)²/E	Chi-square	(r−1)(c−1)

How These Modules Connect to the Rest of the Curriculum

Concept from LM 7–9	Where It Appears Later
Central Limit Theorem	Justifies normal-distribution-based tests throughout the curriculum
Confidence intervals	Economics (forecasting), equity valuation (range estimates)
t-tests on means	Testing whether portfolio returns exceed a benchmark in Portfolio Management
F-test on variances	Comparing risk levels across portfolios; ANOVA in regression (LM 10)
Correlation tests	Beta estimation, factor models, diversification analysis
Chi-square test of independence	Testing relationships between categorical financial variables (e.g., sector and performance)

Study Strategy for LM 7–9

Memorize the 6-step process. Every hypothesis test question follows this framework. If you can lay out the steps, the rest is mechanical.
Know the Type I/II error trade-off cold. This is tested on almost every exam. Practice articulating what happens when you change α or increase sample size.
Practice the “which test?” decision. The exam often gives you a scenario and asks you to identify the correct test. Use the decision framework table above until it becomes automatic.
Don’t memorize every formula in isolation. The t-test structure (point estimate − hypothesized value) / standard error is the same pattern for all mean tests. Recognize the pattern.
p-value questions are free points. If p-value ≤ α → reject. If p-value > α → fail to reject. That’s it.

For all formulas consolidated, see the CFA Level 1 Formula Sheet. For additional drill problems, visit Practice Questions. And for broader exam strategy, check Tips & Strategies.

Key Takeaways

Stratified random sampling ensures every subgroup is represented — it’s the preferred method for bond index replication.
The Central Limit Theorem guarantees the sample mean is approximately normal for large n, regardless of the population distribution.
Standard error = s/√n. Increasing sample size improves precision but with diminishing returns (you need to quadruple n to halve standard error).
Type I error (α) = rejecting a true null. Type II error (β) = failing to reject a false null. Power = 1 − β. Decreasing α increases β.
Use the t-statistic when population variance is unknown (almost always on the exam). Use chi-square for single variance tests and F for comparing two variances.
The p-value is the smallest α at which you’d reject H₀. If p ≤ α, reject. Period.
Spearman rank correlation is the nonparametric alternative to Pearson — use it when normality is in question or data are ordinal.
The chi-square test of independence uses a contingency table of observed vs. expected frequencies. Degrees of freedom = (rows − 1)(columns − 1).

Frequently Asked Questions

What’s the difference between a Type I and Type II error on the CFA exam?

A Type I error means you rejected the null hypothesis when it was actually true — a false positive. A Type II error means you failed to reject a false null — a false negative. The significance level α directly controls the Type I error rate. There’s a trade-off: decreasing α makes Type I errors less likely but Type II errors more likely, unless you also increase sample size.

When should I use a t-test vs. a z-test?

Use a z-test only when the population variance is known — which almost never happens in practice. On the CFA exam, you’ll use the t-test in nearly every question about means because you’ll be working with a sample standard deviation. As sample size gets large (n > 30 or so), t and z values converge and the distinction becomes less important, but the t-test is still technically correct.

How do I decide between a one-tailed and two-tailed test?

Read the alternative hypothesis. If Hₐ says “not equal to” (≠), it’s two-tailed. If Hₐ says “greater than” (>) or “less than” (<), it's one-tailed. The CFA exam usually tells you which one to use. One-tailed tests are more powerful for detecting an effect in a specific direction because the entire rejection region is on one side.

What does “power of a test” mean?

Power is the probability of correctly rejecting a false null hypothesis — it equals 1 − β, where β is the Type II error probability. A test with high power is good at detecting a real effect. You can increase power by increasing sample size, increasing α (at the cost of more Type I errors), or testing larger true effect sizes.

Why is the chi-square test of independence always one-sided?

Because the chi-square statistic sums squared differences between observed and expected frequencies — it’s always non-negative. Large values indicate that observed data differ significantly from what you’d expect under independence. There’s no concept of a “negative” chi-square value, so the rejection region is always in the right tail only.

How does the Central Limit Theorem help with hypothesis testing?

The CLT guarantees that the sampling distribution of the mean is approximately normal for large sample sizes, even if the underlying population isn’t normal. This allows you to use z-based and t-based tests on real financial data — which is typically skewed and leptokurtic — as long as your sample is large enough. Without the CLT, you’d need to know the exact population distribution to run most tests.