Regression Analysis Cheat Sheet for Finance

Regression analysis quantifies the relationship between a dependent variable and one or more independent variables. In finance, it is the workhorse behind beta estimation, factor models, earnings forecasting, and economic research. If you work with data, you work with regression.

Why Regression Matters in Finance

Regression lets you isolate how one variable moves in response to another — while controlling for everything else. Analysts use it to estimate a stock’s beta against the market, test whether EPS growth drives stock prices, or forecast revenue based on macroeconomic variables like GDP and inflation.

Without regression, you are guessing at relationships. With it, you can quantify, test, and defend your conclusions with statistical rigor.

Simple Linear Regression (SLR)

One dependent variable (Y), one independent variable (X). The model fits a straight line through your data that minimizes the sum of squared residuals (OLS — Ordinary Least Squares).

Simple Linear RegressionY = α + βX + ε

Where α is the intercept, β is the slope coefficient (how much Y changes for a one-unit change in X), and ε is the error term.

Component	Meaning	Finance Example
Y (dependent)	Variable you want to explain	Stock return
X (independent)	Variable you think drives Y	Market return (S&P 500)
α (intercept)	Y when X = 0	Alpha — excess return
β (slope)	Sensitivity of Y to X	Beta — market sensitivity
ε (error)	Unexplained variation	Firm-specific risk

Multiple Regression

More than one independent variable. This is what you will use most often in practice because outcomes in finance rarely depend on a single factor.

Multiple RegressionY = α + β₁X₁ + β₂X₂ + … + βₖXₖ + ε

Example: modeling a stock’s return using market return, interest rate changes, and oil prices simultaneously. Each β tells you the marginal effect of that variable, holding the others constant.

Key Regression Metrics

Metric	What It Measures	Good Value
R-squared (R²)	% of Y’s variation explained by X	Higher is better (0 to 1)
Adjusted R²	R² penalized for extra variables	Use for model comparison
Standard Error	Average prediction error	Lower is better
t-statistic	Is β significantly ≠ 0?	\|t\| > 2 (at 5% level)
p-value	Probability β = 0 by chance	< 0.05 for significance
F-statistic	Overall model significance	Higher → model is useful
Durbin-Watson	Autocorrelation in residuals	Close to 2 = no autocorrelation

OLS Assumptions (CLRM)

For your regression results to be valid, the Classical Linear Regression Model requires these assumptions to hold:

Assumption	What It Means	If Violated
Linearity	Y and X have a linear relationship	Coefficients are biased
No multicollinearity	Independent variables are not highly correlated	Coefficients unstable, high standard errors
Homoscedasticity	Residual variance is constant	Standard errors are wrong
No autocorrelation	Residuals are not correlated over time	t-stats and F-stats unreliable
Normality of errors	Residuals are normally distributed	Hypothesis tests less reliable
No endogeneity	X is not correlated with ε	Coefficients are biased

Common Regression Problems and Fixes

Problem	How to Detect	How to Fix
Multicollinearity	VIF > 5-10; correlation matrix	Drop or combine variables
Heteroscedasticity	Breusch-Pagan test; residual plots	White’s robust standard errors
Autocorrelation	Durbin-Watson; residual plots	Newey-West standard errors; add lags
Omitted variable bias	Theory; intuition	Add the missing variable
Non-normality	Jarque-Bera test; histogram	Larger sample; transform variables

Interpreting Regression Output

When you run a regression, here is how to read the results step by step:

1. Check overall fit: Look at R² and adjusted R². An R² of 0.60 means 60% of Y’s variation is explained by your model.

2. Check overall significance: The F-statistic tests whether at least one β ≠ 0. If the p-value of F is below 0.05, the model has explanatory power.

3. Check individual coefficients: Each β has its own t-statistic and p-value. If |t| > 2 (or p < 0.05), that variable is statistically significant.

4. Check residuals: Plot them. They should look random with no patterns. Patterns signal violated assumptions.

Regression in CAPM Beta Estimation

The most common regression in equity analysis: regress a stock’s excess return on the market’s excess return. The slope is beta.

CAPM RegressionRᵢ − Rf = α + β(Rₘ − Rf) + ε

If β = 1.3, the stock moves 1.3% for every 1% move in the market. The intercept (α) represents alpha — the return not explained by market exposure. The R² tells you how much of the stock’s movement is driven by the market versus firm-specific factors.

Analyst Tip

Always use adjusted R² when comparing models with different numbers of variables. Regular R² mechanically increases when you add variables — even useless ones. Adjusted R² penalizes complexity, so it only goes up when a new variable genuinely improves the model.

Common Mistake

Confusing correlation with causation. A regression may show a strong relationship between two variables, but that does not prove one causes the other. There could be an omitted variable driving both. Always pair statistical results with economic logic.

Key Takeaways

Simple regression uses one X variable; multiple regression uses several — multiple regression is standard in practice.
R² tells you explanatory power; t-statistics tell you if individual variables matter; the F-test tells you if the whole model is significant.
Always check OLS assumptions — violated assumptions make your results unreliable.
The CAPM regression (stock vs. market returns) gives you beta and alpha directly.
Correlation ≠ causation. Back your regression results with economic reasoning.

Frequently Asked Questions

What is the difference between simple and multiple regression?

Simple regression has one independent variable (Y = α + βX + ε). Multiple regression has two or more independent variables (Y = α + β₁X₁ + β₂X₂ + ε). Multiple regression is used far more often in finance because most outcomes are driven by several factors simultaneously.

What does R-squared tell you in a financial regression?

R-squared measures how much of the dependent variable’s variation is explained by the model. In a CAPM regression, an R² of 0.40 means 40% of the stock’s return variation is explained by market movements. The remaining 60% is firm-specific.

How do you test if a regression coefficient is statistically significant?

Check the t-statistic or p-value for that coefficient. If |t| > 2 (or the p-value < 0.05), the coefficient is statistically different from zero at the 5% significance level, meaning the variable has a real effect on Y.

What is multicollinearity and why is it a problem?

Multicollinearity occurs when two or more independent variables are highly correlated. It makes individual coefficients unreliable and inflates standard errors, even though the overall model fit (R²) may still look good. Check using the Variance Inflation Factor (VIF).

How is regression used to estimate beta?

Regress a stock’s excess returns (stock return minus risk-free rate) against the market’s excess returns over a historical period (typically 60 months). The slope coefficient is the stock’s beta, measuring its systematic risk relative to the market.

Asset Classes

Strategy

Essentials

Planning

Models

Skills & Tools

Look Up

Learn

Prepare