HomeCFA PrepLevel 1 › Hypothesis Testing

CFA Level 1 Hypothesis Testing: Sampling, CLT, Confidence Intervals & Statistical Tests

This page covers Learning Modules 7–9 of the 2026 CFA Level 1 Quantitative Methods curriculum. These three modules build on the statistical foundations from Probability Concepts (LM 3–6) and take you from sampling and estimation all the way through parametric and nonparametric hypothesis tests. Quant Methods carries a 6–9% exam weight, and hypothesis testing is one of its most testable sections.

What’s Covered: Three Learning Modules at a Glance

Learning ModuleTopicCore Skills
LM 7Estimation and InferenceSampling methods, sampling error, Central Limit Theorem, standard error, confidence intervals, bootstrapping
LM 8Hypothesis Testing6-step process, null/alternative hypotheses, Type I & II errors, power, t-tests, chi-square, F-test, p-values
LM 9Parametric & Non-Parametric Tests of IndependencePearson correlation test, Spearman rank correlation, contingency tables, chi-square test of independence

LM 7: Estimation and Inference

Before you can test hypotheses, you need to understand how samples relate to populations. This module covers how to draw samples, what happens to sample statistics as sample size grows, and how to construct confidence intervals.

Sampling Methods

The CFA curriculum distinguishes between probability sampling (where every element has a known chance of selection) and non-probability sampling (where it doesn’t).

MethodHow It WorksWhen to Use
Simple RandomEvery member of the population has an equal probability of being selectedDefault method; works well when the population is relatively homogeneous
Stratified RandomDivide population into subgroups (strata), then sample randomly within each stratumWhen subgroups differ meaningfully — e.g., sampling a bond index by credit rating and maturity
ClusterDivide population into clusters, randomly select entire clusters, then sample within themWhen a full population list is impractical — e.g., geographic clusters
Convenience (non-probability)Select whatever data is readily availableQuick and cheap, but introduces selection bias
Judgmental (non-probability)Researcher handpicks elements based on expertiseRelies on the researcher’s knowledge; results may not generalize
Exam Tip: Stratified vs. Cluster
Stratified sampling ensures representation from every subgroup. Cluster sampling selects entire groups at random. For bond index replication, stratified sampling is the standard approach — you match the index’s duration, credit quality, and sector breakdown.

Sampling Error

Sampling error is the difference between a sample statistic (like the sample mean) and the corresponding population parameter. It’s unavoidable — you’re estimating a population from a subset. The goal isn’t to eliminate sampling error but to understand and control it.

The Central Limit Theorem (CLT)

The CLT is one of the most powerful results in statistics, and it’s a favorite CFA exam topic.

Central Limit Theorem: Given a population with mean μ and finite variance σ², the sampling distribution of the sample mean approaches a normal distribution with mean μ and variance σ²/n as the sample size n becomes large — regardless of the shape of the original population distribution.

Why this matters: even if the underlying returns are skewed or non-normal, the distribution of the sample mean will be approximately normal for large samples (typically n ≥ 30). This is what justifies using z-tests and t-tests on real financial data.

Standard Error of the Sample Mean

Standard Error (known population σ) σ_x̄ = σ / √n
Standard Error (unknown σ, use sample s) s_x̄ = s / √n

The standard error tells you how much the sample mean is expected to vary from the population mean. As sample size increases, standard error decreases — your estimate gets more precise. Doubling the sample size reduces the standard error by a factor of √2 (about 29%), not by half.

Confidence Intervals

Confidence Interval for Population Mean X̄ ± (Critical Value × Standard Error)

A 95% confidence interval means: if we repeated this sampling procedure many times, 95% of the resulting intervals would contain the true population mean. It does not mean there’s a 95% probability the population mean is in this particular interval.

Confidence Levelz Critical Value (two-tailed)Interpretation
90%±1.645Wider net, less precision
95%±1.960Most commonly used in practice
99%±2.576Very high confidence, very wide interval
z vs. t: When to Use Which
Use the z-statistic when population variance is known (rare in practice). Use the t-statistic when population variance is unknown and you’re using the sample standard deviation — this is the case on almost every CFA exam question. As sample size grows large, the t-distribution converges to the z-distribution.

LM 8: Hypothesis Testing

This is the core of the three modules. You need to know the 6-step process cold and be able to apply it to questions about means, differences between means, and variances.

The 6-Step Hypothesis Testing Process

StepWhat You DoKey Details
1. State the hypothesesDefine H₀ (null) and Hₐ (alternative)The null is what you’re trying to reject. The alternative is what you’re trying to support.
2. Identify the test statisticChoose the right test (z, t, chi-square, F)Depends on what you’re testing (mean, variance, proportion) and what you know about the population
3. Specify significance levelSet α (typically 0.05 or 0.01)α = probability of Type I error = probability of rejecting a true null
4. State the decision ruleDetermine the critical value(s)Reject H₀ if test statistic falls in the rejection region (beyond critical values)
5. Calculate test statisticPlug sample data into the formulaCompare computed value to critical value
6. Make a decisionReject or fail to reject H₀You never “accept” the null — you either reject it or fail to reject it

One-Tailed vs. Two-Tailed Tests

Test TypeHypothesesRejection Region
Two-tailedH₀: μ = μ₀ vs. Hₐ: μ ≠ μ₀Both tails — reject if test stat is too far in either direction
Upper one-tailedH₀: μ ≤ μ₀ vs. Hₐ: μ > μ₀Right tail only
Lower one-tailedH₀: μ ≥ μ₀ vs. Hₐ: μ < μ₀Left tail only

Type I and Type II Errors

This is tested constantly. You must know the trade-off.

H₀ is Actually TrueH₀ is Actually False
Reject H₀Type I Error (false positive) — probability = αCorrect decision — probability = Power (1 − β)
Fail to reject H₀Correct decision — probability = (1 − α)Type II Error (false negative) — probability = β

Key relationships:

The p-Value Approach

The p-value is the smallest significance level at which you would reject the null. If p-value ≤ α, reject H₀. If p-value > α, fail to reject. Many CFA questions give you a p-value and ask for the conclusion at a given significance level — just compare the two numbers.

Tests of a Single Mean

t-Statistic for a Population Mean t = (X̄ − μ₀) / (s / √n)

Degrees of freedom: n − 1. This is the workhorse test on the exam. You’ll be given a sample mean, hypothesized population mean, sample standard deviation, and sample size.

Tests of Differences Between Means

The exam tests two scenarios:

ScenarioTestWhen to Use
Independent samples, equal variancesPooled t-testTwo separate groups (e.g., returns of fund A vs. fund B)
Dependent (paired) samplesPaired t-test (test of mean differences)Same group measured twice (e.g., returns before and after an event)
Paired t-Test t = d̄ / (s_d / √n)

Where d̄ is the mean of the differences and s_d is the standard deviation of the differences. Degrees of freedom: n − 1 (number of pairs minus 1).

Test of a Single Variance (Chi-Square)

Chi-Square Test for Variance χ² = (n − 1)s² / σ₀²

Degrees of freedom: n − 1. The chi-square distribution is always non-negative and right-skewed. Use this when testing whether a portfolio’s volatility matches a claimed level.

Test of Equality of Two Variances (F-Test)

F-Statistic F = s₁² / s₂²

Put the larger variance in the numerator. Degrees of freedom: (n₁ − 1, n₂ − 1). The F-distribution is always positive. You’ll use this to test whether two portfolios have significantly different risk levels.

Parametric vs. Nonparametric Tests

FeatureParametric TestsNonparametric Tests
AssumptionsSpecific distributional assumptions (e.g., normality)Minimal or no distributional assumptions
Data typeContinuous, interval/ratio scaleOrdinal, ranked, or non-normal data
PowerMore powerful when assumptions holdLess powerful but more robust
Examplesz-test, t-test, F-testSpearman rank correlation, chi-square test of independence
When to Go Nonparametric
Use nonparametric tests when: (1) data don’t meet normality assumptions, (2) data are ranked or ordinal, (3) sample is small and you can’t verify normality, or (4) there are significant outliers.

LM 9: Parametric & Non-Parametric Tests of Independence

This module applies hypothesis testing specifically to testing whether two variables are related. It covers three tests you need to know.

Parametric Test of Correlation (Pearson)

Tests whether the population correlation coefficient (ρ) equals zero.

t-Test for Correlation t = r√(n − 2) / √(1 − r²)

Degrees of freedom: n − 2. Reject H₀: ρ = 0 if the calculated t exceeds the critical value. An important nuance: as sample size increases, smaller correlations become statistically significant — a correlation of r = 0.35 might not be significant with n = 12, but it could be significant with n = 32.

Statistical vs. Economic Significance
A large sample can make a tiny correlation “statistically significant” even when it’s economically meaningless. Always consider the practical significance of your findings, not just the p-value.

Spearman Rank Correlation

The nonparametric alternative to Pearson. Instead of testing raw data values, you rank them first and then calculate the correlation on the ranks. Use it when:

The test for significance uses the same t-formula as Pearson, just applied to the rank correlation coefficient (r_s) instead of the raw correlation.

Chi-Square Test of Independence (Contingency Tables)

Tests whether two categorical variables are independent using observed vs. expected frequencies in a contingency table.

Chi-Square Test of Independence χ² = Σ [(Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ]

Where O is the observed frequency and E is the expected frequency (calculated assuming independence). Degrees of freedom: (rows − 1)(columns − 1). This is always a one-sided test — the rejection region is on the right because the chi-square statistic is always positive.

Example application: testing whether ETF performance category (outperform/underperform) is independent of fund type (equity/bond/alternative). If the chi-square statistic exceeds the critical value, you reject independence — the two variables are related.

Which Test to Use: Decision Framework

The exam often tests whether you can pick the right test for the scenario. Here’s a quick decision guide:

What You’re TestingTest StatisticDistributionDegrees of Freedom
Single mean (σ unknown)t = (X̄ − μ₀) / (s/√n)tn − 1
Difference of means (independent)Pooled t-testtn₁ + n₂ − 2
Difference of means (paired)t = d̄ / (s_d/√n)tn − 1
Single varianceχ² = (n−1)s²/σ₀²Chi-squaren − 1
Equality of two variancesF = s₁²/s₂²F(n₁−1, n₂−1)
Correlation (parametric)t = r√(n−2)/√(1−r²)tn − 2
Independence (categorical)χ² = Σ(O−E)²/EChi-square(r−1)(c−1)

How These Modules Connect to the Rest of the Curriculum

Concept from LM 7–9Where It Appears Later
Central Limit TheoremJustifies normal-distribution-based tests throughout the curriculum
Confidence intervalsEconomics (forecasting), equity valuation (range estimates)
t-tests on meansTesting whether portfolio returns exceed a benchmark in Portfolio Management
F-test on variancesComparing risk levels across portfolios; ANOVA in regression (LM 10)
Correlation testsBeta estimation, factor models, diversification analysis
Chi-square test of independenceTesting relationships between categorical financial variables (e.g., sector and performance)

Study Strategy for LM 7–9

  1. Memorize the 6-step process. Every hypothesis test question follows this framework. If you can lay out the steps, the rest is mechanical.
  2. Know the Type I/II error trade-off cold. This is tested on almost every exam. Practice articulating what happens when you change α or increase sample size.
  3. Practice the “which test?” decision. The exam often gives you a scenario and asks you to identify the correct test. Use the decision framework table above until it becomes automatic.
  4. Don’t memorize every formula in isolation. The t-test structure (point estimate − hypothesized value) / standard error is the same pattern for all mean tests. Recognize the pattern.
  5. p-value questions are free points. If p-value ≤ α → reject. If p-value > α → fail to reject. That’s it.

For all formulas consolidated, see the CFA Level 1 Formula Sheet. For additional drill problems, visit Practice Questions. And for broader exam strategy, check Tips & Strategies.

Key Takeaways

  • Stratified random sampling ensures every subgroup is represented — it’s the preferred method for bond index replication.
  • The Central Limit Theorem guarantees the sample mean is approximately normal for large n, regardless of the population distribution.
  • Standard error = s/√n. Increasing sample size improves precision but with diminishing returns (you need to quadruple n to halve standard error).
  • Type I error (α) = rejecting a true null. Type II error (β) = failing to reject a false null. Power = 1 − β. Decreasing α increases β.
  • Use the t-statistic when population variance is unknown (almost always on the exam). Use chi-square for single variance tests and F for comparing two variances.
  • The p-value is the smallest α at which you’d reject H₀. If p ≤ α, reject. Period.
  • Spearman rank correlation is the nonparametric alternative to Pearson — use it when normality is in question or data are ordinal.
  • The chi-square test of independence uses a contingency table of observed vs. expected frequencies. Degrees of freedom = (rows − 1)(columns − 1).

Frequently Asked Questions

What’s the difference between a Type I and Type II error on the CFA exam?

A Type I error means you rejected the null hypothesis when it was actually true — a false positive. A Type II error means you failed to reject a false null — a false negative. The significance level α directly controls the Type I error rate. There’s a trade-off: decreasing α makes Type I errors less likely but Type II errors more likely, unless you also increase sample size.

When should I use a t-test vs. a z-test?

Use a z-test only when the population variance is known — which almost never happens in practice. On the CFA exam, you’ll use the t-test in nearly every question about means because you’ll be working with a sample standard deviation. As sample size gets large (n > 30 or so), t and z values converge and the distinction becomes less important, but the t-test is still technically correct.

How do I decide between a one-tailed and two-tailed test?

Read the alternative hypothesis. If Hₐ says “not equal to” (≠), it’s two-tailed. If Hₐ says “greater than” (>) or “less than” (<), it's one-tailed. The CFA exam usually tells you which one to use. One-tailed tests are more powerful for detecting an effect in a specific direction because the entire rejection region is on one side.

What does “power of a test” mean?

Power is the probability of correctly rejecting a false null hypothesis — it equals 1 − β, where β is the Type II error probability. A test with high power is good at detecting a real effect. You can increase power by increasing sample size, increasing α (at the cost of more Type I errors), or testing larger true effect sizes.

Why is the chi-square test of independence always one-sided?

Because the chi-square statistic sums squared differences between observed and expected frequencies — it’s always non-negative. Large values indicate that observed data differ significantly from what you’d expect under independence. There’s no concept of a “negative” chi-square value, so the rejection region is always in the right tail only.

How does the Central Limit Theorem help with hypothesis testing?

The CLT guarantees that the sampling distribution of the mean is approximately normal for large sample sizes, even if the underlying population isn’t normal. This allows you to use z-based and t-based tests on real financial data — which is typically skewed and leptokurtic — as long as your sample is large enough. Without the CLT, you’d need to know the exact population distribution to run most tests.