CFA PrepLevel 1 › Probability Concepts

Probability Concepts – CFA Level 1 Study Guide

Why it matters: Probability concepts are the quantitative backbone of investment analysis. Every portfolio allocation, risk model, and valuation forecast relies on expected values, variance, and conditional probabilities. This topic spans two CFA Level 1 learning modules — Statistical Measures of Asset Returns and Probability Trees & Conditional Expectations — and appears across Quantitative Methods, Portfolio Management, and Derivatives.

This guide covers the full scope of probability-related content tested on the CFA Level 1 exam: measures of central tendency and dispersion, the shape of return distributions, correlation between variables, expected value and variance of random variables, probability trees, and Bayes’ formula for updating probabilities with new information.

Measures of Central Tendency

A measure of central tendency tells you where the data are centered. For return series, it shows where the empirical distribution of returns is concentrated — essentially the “expected” return based on the observed sample.

Arithmetic Mean

The arithmetic mean is the sum of all observations divided by the number of observations. It is the most commonly used measure of central tendency in finance.

Sample Mean X̄ = (Σ Xi) / n

The arithmetic mean is sensitive to outliers. A single extreme return can pull the mean significantly upward or downward. When outliers are a concern, analysts may use alternative measures.

Trimmed and Winsorized Means

A trimmed mean excludes a stated percentage of the highest and lowest values before computing the average. A 5% trimmed mean drops the top 2.5% and bottom 2.5% of observations. A winsorized mean replaces extreme values with the nearest non-extreme value instead of removing them — keeping the sample size unchanged but limiting the influence of outliers.

Geometric and Harmonic Means

The geometric mean captures compound growth. For a time series of percentage returns, it answers: “What constant rate would produce the same cumulative result?” The geometric mean is always less than or equal to the arithmetic mean, and the gap widens as dispersion increases.

The harmonic mean is useful for averaging ratios like price-to-earnings or cost per unit when investing a fixed dollar amount periodically. It weights smaller values more heavily and is always the smallest of the three means: harmonic ≤ geometric ≤ arithmetic.

Median and Mode

The median is the middle value when observations are sorted. For an odd-numbered sample of n observations, it is the value at position (n + 1)/2. For an even-numbered sample, it is the average of the two middle values. Unlike the mean, the median is not affected by outliers, making it valuable for skewed distributions.

The mode is the most frequently occurring value. A distribution can be unimodal (one mode), bimodal (two modes), trimodal, or have no mode at all. For continuous return data grouped into intervals, the modal interval is the one with the highest frequency.

Measures of Location: Quantiles

Quantiles describe where specific proportions of data fall within a distribution. They are widely used in portfolio performance evaluation and risk management.

QuantileDivides Data IntoCommon Use
Quartiles4 equal parts (Q1, Q2, Q3)Box plots, IQR calculation
Quintiles5 equal partsFactor-based portfolio sorting
Deciles10 equal partsMarket cap ranking, fund performance
Percentiles100 equal partsVaR thresholds, peer group ranking

The interquartile range (IQR) equals Q3 − Q1 and measures the spread of the middle 50% of data. It is the basis for box-and-whisker plots, where whiskers typically extend 1.5 × IQR beyond the quartile boundaries. Observations outside the fences are flagged as potential outliers.

Measures of Dispersion

If central tendency tells you about reward, dispersion tells you about risk. These measures quantify how spread out returns are around their center.

Range and Mean Absolute Deviation

The range is simply Maximum − Minimum. It is easy to compute but uses only two data points and is highly sensitive to outliers. The mean absolute deviation (MAD) uses all observations:

Mean Absolute Deviation MAD = (Σ |Xi − X̄|) / n

Variance and Standard Deviation

Variance is the average of squared deviations from the mean. Squaring eliminates the sign problem but produces units in squared terms. Standard deviation is the square root of variance and returns to the original units — making it the most widely used measure of volatility in finance.

Sample Variance s² = Σ(Xi − X̄)² / (n − 1)
Sample Standard Deviation s = √[Σ(Xi − X̄)² / (n − 1)]

The denominator uses n − 1 (not n) for sample statistics. This is called degrees of freedom — once you know the sample mean, only n − 1 observations are truly independent. This correction makes the sample variance an unbiased estimator of the population variance.

Target Downside Deviation

Investors are typically more concerned with downside risk than upside surprises. The target semideviation measures dispersion only for observations below a specified target return B:

Target Semideviation sTarget = √[Σ(Xi − B)² / (n − 1)]   for all Xi ≤ B

Because it captures only downside risk, the target semideviation is always less than or equal to the standard deviation.

Coefficient of Variation

The coefficient of variation (CV) standardizes risk relative to return, making it possible to compare assets with different means:

Coefficient of Variation CV = s / X̄

A higher CV means more risk per unit of return. The CV is scale-free and unitless, which makes it ideal for comparing investments with different average returns or measured in different currencies.

Exam Tip
The relationship between the three means is always: harmonic ≤ geometric ≤ arithmetic. The greater the dispersion in the data, the larger the gap between them. This relationship is frequently tested.

Skewness: Asymmetry in Return Distributions

A distribution that is not symmetrical about its mean is skewed. Skewness has direct implications for risk assessment because it reveals whether extreme returns tend to occur on the upside or downside.

PropertyPositive Skew (Right-Tailed)Negative Skew (Left-Tailed)
TailLong right tailLong left tail
Mean vs. MedianMean > Median > ModeMean < Median < Mode
Extreme ReturnsFew large gainsFew large losses
Investor PreferencePreferred (upside potential)Disliked (crash risk)
Sample Skewness (large n) Skewness ≈ (1/n) × Σ[(Xi − X̄) / s]³

A skewness of zero indicates symmetry. Most equity return series show negative skewness — frequent small gains offset by occasional large losses. For a given expected return and standard deviation, investors prefer positive skew because the mean return lies above the median.

Kurtosis: Tail Risk in Return Distributions

Kurtosis measures the combined weight of the tails relative to the rest of the distribution. It tells you how likely extreme outcomes are — a critical concern for risk management.

Distribution TypeKurtosisExcess KurtosisTail Behavior
Leptokurtic (fat-tailed)> 3.0> 0More frequent extreme returns
Mesokurtic (normal)= 3.0= 0Baseline — normal distribution
Platykurtic (thin-tailed)< 3.0< 0Fewer extreme returns
Sample Excess Kurtosis (large n) KE ≈ [(1/n) × Σ[(Xi − X̄) / s]⁴] − 3

Most equity return series are leptokurtic — they have positive excess kurtosis. This means extreme events (crashes and spikes) occur more often than a normal distribution would predict. If you use a normal distribution model on a fat-tailed dataset, you will systematically underestimate the probability of very bad (or very good) outcomes.

Common Mistake
Many statistical packages report “kurtosis” when they actually mean excess kurtosis (kurtosis minus 3). Always check which definition is being used. On the CFA exam, excess kurtosis (where normal = 0) is the standard reference.

Covariance and Correlation

Understanding how two variables move together is essential for portfolio diversification. Two measures capture this relationship: covariance and correlation.

Covariance

Covariance measures the joint variability of two random variables. Positive covariance means they tend to move in the same direction; negative covariance means they tend to move in opposite directions.

Sample Covariance sXY = Σ(Xi − X̄)(Yi − Ȳ) / (n − 1)

The problem with covariance is interpretation — its magnitude depends on the units and scale of the variables, making it hard to compare across datasets.

Correlation Coefficient

The correlation coefficient standardizes covariance by dividing by the product of the two variables’ standard deviations. This produces a unitless measure bounded between −1 and +1:

Sample Correlation rXY = sXY / (sX × sY)
CorrelationInterpretation
r = +1Perfect positive linear relationship
r = 0No linear relationship (may still be nonlinearly related)
r = −1Perfect inverse linear relationship

Limitations of Correlation

Correlation has important blind spots that appear frequently on the exam. It only measures linear relationships — two variables with a strong nonlinear association can have a correlation near zero. Correlation is sensitive to outliers. And correlation does not imply causation.

Watch out for spurious correlation: apparent relationships that arise from chance, a confounding third variable, or data mining. Investment strategies based on spurious correlations will fail out of sample.

Expected Value and Variance of Random Variables

Moving from descriptive statistics to probability, the expected value of a random variable is its probability-weighted average outcome. While the sample mean describes historical data, expected value is forward-looking — it represents the forecast.

Expected Value E(X) = Σ P(Xi) × Xi

The variance of a random variable is the probability-weighted average of squared deviations from the expected value. Standard deviation is the positive square root of variance and is measured in the same units as the random variable.

Variance of a Random Variable σ²(X) = Σ P(Xi) × [Xi − E(X)]²

Expected value tells you where outcomes are centered; variance tells you how dispersed they are around that center. The two together give you a complete picture of the risk-return tradeoff for any discrete probability distribution.

Probability Trees and Conditional Expectations

A probability tree is a visual tool for organizing scenarios involving sequential or conditional events. Each branch represents a possible scenario with its probability, and terminal nodes show the outcome values.

Conditional Expected Value

The expected value of a variable given a particular scenario is a conditional expected value, written E(X | S). It uses only the probabilities and outcomes that apply under that specific scenario:

Conditional Expected Value E(X | S) = P(X1 | S) × X1 + P(X2 | S) × X2 + … + P(Xn | S) × Xn

Total Probability Rule for Expected Value

To find the unconditional expected value, you weight each conditional expected value by the probability of its scenario occurring. If S1, S2, …, Sn are mutually exclusive and exhaustive scenarios:

Total Probability Rule E(X) = E(X | S1)P(S1) + E(X | S2)P(S2) + … + E(X | Sn)P(Sn)

This is fundamental for scenario analysis. You build expectations under different macro environments (e.g., recession vs. expansion), then combine them to get the overall expected outcome.

Exam Tip
The total probability rule works both for probabilities and for expected values. If you can decompose a problem into mutually exclusive and exhaustive scenarios, you can always compute the unconditional result by probability-weighting the conditional results.

Bayes’ Formula: Updating Probabilities

Bayes’ formula provides a rational method for updating your beliefs when new information arrives. It reverses the conditioning: given that you observed some new information, what is the updated probability of the event that caused it?

Bayes’ Formula P(Event | Info) = [P(Info | Event) / P(Info)] × P(Event)

The key components are:

TermNameWhat It Represents
P(Event)Prior probabilityYour belief before new information
P(Info | Event)LikelihoodProbability of observing the info if the event is true
P(Info)Unconditional probabilityTotal probability of observing the info under all scenarios
P(Event | Info)Posterior probabilityUpdated belief after incorporating the new information

The denominator P(Info) is computed using the total probability rule across all mutually exclusive scenarios. When the likelihood is greater than the unconditional probability, the posterior probability is higher than the prior — the new information increases your confidence in the event.

Bayes’ formula is used extensively in credit analysis (updating default probabilities given new data), earnings forecasting (revising EPS estimates after company announcements), and quantitative strategy development.

Exam Tip
On the CFA exam, Bayes’ problems typically give you prior probabilities and likelihoods, then ask you to compute the posterior. The hardest part is usually computing P(Info) in the denominator — always use the total probability rule to expand it across all scenarios.

Summary: Key Formulas at a Glance

ConceptFormula
Sample MeanX̄ = ΣXi / n
Sample Variances² = Σ(Xi − X̄)² / (n − 1)
Sample Std Devs = √[s²]
CVCV = s / X̄
Skewness(1/n) × Σ[(Xi − X̄)/s]³
Excess Kurtosis(1/n) × Σ[(Xi − X̄)/s]⁴ − 3
CovariancesXY = Σ(Xi−X̄)(Yi−Ȳ) / (n−1)
CorrelationrXY = sXY / (sX × sY)
Expected ValueE(X) = ΣP(Xi)Xi
Variance (RV)σ²(X) = ΣP(Xi)[Xi−E(X)]²
Bayes’ FormulaP(A|B) = [P(B|A)/P(B)] × P(A)

Key Takeaways

  • The arithmetic mean is the most common measure of central tendency, but it is sensitive to outliers. The median is more robust for skewed distributions.
  • The three means always follow: harmonic ≤ geometric ≤ arithmetic. Greater dispersion widens the gap.
  • Standard deviation is the primary risk measure, but target semideviation captures downside risk specifically.
  • The coefficient of variation (CV = s/X̄) allows risk comparison across assets with different mean returns.
  • Most equity return distributions are negatively skewed and leptokurtic (fat-tailed) — normal distribution models underestimate tail risk.
  • Correlation ranges from −1 to +1 and measures linear association only. It does not imply causation and is sensitive to outliers.
  • Expected value is the probability-weighted average outcome. Combined with variance, it defines the risk-return profile of any discrete distribution.
  • The total probability rule lets you compute unconditional expected values by weighting conditional expectations across mutually exclusive scenarios.
  • Bayes’ formula updates prior probabilities into posterior probabilities when new information arrives — essential for dynamic investment decisions.

Frequently Asked Questions

What is the difference between variance and standard deviation?

Variance is the average of squared deviations from the mean, measured in squared units. Standard deviation is the square root of variance, returning to the same units as the data. Standard deviation is more intuitive — if returns are in percent, standard deviation is also in percent, while variance is in “percent squared.”

Why does the sample variance formula divide by n − 1 instead of n?

Dividing by n − 1 (degrees of freedom) corrects for the fact that we estimate the population mean with the sample mean. This correction makes the sample variance an unbiased estimator of the population variance. Without it, the sample variance would systematically underestimate the true variance.

How does skewness affect investment risk?

Negative skewness means the distribution has a long left tail — large losses are more frequent than a symmetric distribution would suggest. Most equity returns exhibit negative skewness. For a given mean and standard deviation, investors prefer positive skew (upside surprises) over negative skew (crash risk).

What does it mean when a distribution is leptokurtic?

A leptokurtic distribution has fatter tails and a higher peak than a normal distribution (excess kurtosis > 0). This means extreme returns — both very large gains and very large losses — are more likely than a normal distribution would predict. Using normal distribution assumptions for a leptokurtic return series will underestimate the probability of extreme events.

How is Bayes’ formula used in investment analysis?

Bayes’ formula updates your probability estimates when new information arrives. For example, if you initially estimate a 45% probability that a company will beat earnings expectations, and then observe a positive signal (like a capacity expansion), Bayes’ formula lets you calculate the revised probability by combining your prior belief with the likelihood of observing that signal under each scenario.

What is the difference between correlation and covariance?

Covariance measures how two variables move together but its scale depends on the units of the variables, making it hard to interpret in isolation. Correlation standardizes covariance to a range of −1 to +1, making it comparable across any pair of variables. Both measure linear relationships only and neither implies causation.

Related CFA Level 1 Topics

TopicConnection
Time Value of MoneyExpected value calculations underpin TVM-based valuation models
Hypothesis TestingBuilds directly on variance, standard deviation, and distributional properties
Portfolio ManagementCorrelation, covariance, and expected return drive portfolio construction
Standard DeviationThe primary risk measure used throughout the CFA curriculum
Sharpe RatioUses mean return and standard deviation to measure risk-adjusted performance
BetaDerived from covariance of asset returns with market returns divided by market variance
DiversificationDepends on imperfect correlation between portfolio holdings
Correlation Matrix Cheat SheetQuick reference for reading and interpreting correlation data