HomeCheat Sheets › Regression Analysis

Regression Analysis Cheat Sheet for Finance

Regression analysis quantifies the relationship between a dependent variable and one or more independent variables. In finance, it is the workhorse behind beta estimation, factor models, earnings forecasting, and economic research. If you work with data, you work with regression.

Why Regression Matters in Finance

Regression lets you isolate how one variable moves in response to another — while controlling for everything else. Analysts use it to estimate a stock’s beta against the market, test whether EPS growth drives stock prices, or forecast revenue based on macroeconomic variables like GDP and inflation.

Without regression, you are guessing at relationships. With it, you can quantify, test, and defend your conclusions with statistical rigor.

Simple Linear Regression (SLR)

One dependent variable (Y), one independent variable (X). The model fits a straight line through your data that minimizes the sum of squared residuals (OLS — Ordinary Least Squares).

Simple Linear RegressionY = α + βX + ε

Where α is the intercept, β is the slope coefficient (how much Y changes for a one-unit change in X), and ε is the error term.

ComponentMeaningFinance Example
Y (dependent)Variable you want to explainStock return
X (independent)Variable you think drives YMarket return (S&P 500)
α (intercept)Y when X = 0Alpha — excess return
β (slope)Sensitivity of Y to XBeta — market sensitivity
ε (error)Unexplained variationFirm-specific risk

Multiple Regression

More than one independent variable. This is what you will use most often in practice because outcomes in finance rarely depend on a single factor.

Multiple RegressionY = α + β₁X₁ + β₂X₂ + … + βₖXₖ + ε

Example: modeling a stock’s return using market return, interest rate changes, and oil prices simultaneously. Each β tells you the marginal effect of that variable, holding the others constant.

Key Regression Metrics

MetricWhat It MeasuresGood Value
R-squared (R²)% of Y’s variation explained by XHigher is better (0 to 1)
Adjusted R²R² penalized for extra variablesUse for model comparison
Standard ErrorAverage prediction errorLower is better
t-statisticIs β significantly ≠ 0?|t| > 2 (at 5% level)
p-valueProbability β = 0 by chance< 0.05 for significance
F-statisticOverall model significanceHigher → model is useful
Durbin-WatsonAutocorrelation in residualsClose to 2 = no autocorrelation

OLS Assumptions (CLRM)

For your regression results to be valid, the Classical Linear Regression Model requires these assumptions to hold:

AssumptionWhat It MeansIf Violated
LinearityY and X have a linear relationshipCoefficients are biased
No multicollinearityIndependent variables are not highly correlatedCoefficients unstable, high standard errors
HomoscedasticityResidual variance is constantStandard errors are wrong
No autocorrelationResiduals are not correlated over timet-stats and F-stats unreliable
Normality of errorsResiduals are normally distributedHypothesis tests less reliable
No endogeneityX is not correlated with εCoefficients are biased

Common Regression Problems and Fixes

ProblemHow to DetectHow to Fix
MulticollinearityVIF > 5-10; correlation matrixDrop or combine variables
HeteroscedasticityBreusch-Pagan test; residual plotsWhite’s robust standard errors
AutocorrelationDurbin-Watson; residual plotsNewey-West standard errors; add lags
Omitted variable biasTheory; intuitionAdd the missing variable
Non-normalityJarque-Bera test; histogramLarger sample; transform variables

Interpreting Regression Output

When you run a regression, here is how to read the results step by step:

1. Check overall fit: Look at R² and adjusted R². An R² of 0.60 means 60% of Y’s variation is explained by your model.

2. Check overall significance: The F-statistic tests whether at least one β ≠ 0. If the p-value of F is below 0.05, the model has explanatory power.

3. Check individual coefficients: Each β has its own t-statistic and p-value. If |t| > 2 (or p < 0.05), that variable is statistically significant.

4. Check residuals: Plot them. They should look random with no patterns. Patterns signal violated assumptions.

Regression in CAPM Beta Estimation

The most common regression in equity analysis: regress a stock’s excess return on the market’s excess return. The slope is beta.

CAPM RegressionRᵢ − Rf = α + β(Rₘ − Rf) + ε

If β = 1.3, the stock moves 1.3% for every 1% move in the market. The intercept (α) represents alpha — the return not explained by market exposure. The tells you how much of the stock’s movement is driven by the market versus firm-specific factors.

Analyst Tip

Always use adjusted R² when comparing models with different numbers of variables. Regular R² mechanically increases when you add variables — even useless ones. Adjusted R² penalizes complexity, so it only goes up when a new variable genuinely improves the model.

Common Mistake

Confusing correlation with causation. A regression may show a strong relationship between two variables, but that does not prove one causes the other. There could be an omitted variable driving both. Always pair statistical results with economic logic.

Key Takeaways

  • Simple regression uses one X variable; multiple regression uses several — multiple regression is standard in practice.
  • tells you explanatory power; t-statistics tell you if individual variables matter; the F-test tells you if the whole model is significant.
  • Always check OLS assumptions — violated assumptions make your results unreliable.
  • The CAPM regression (stock vs. market returns) gives you beta and alpha directly.
  • Correlation ≠ causation. Back your regression results with economic reasoning.

Frequently Asked Questions

What is the difference between simple and multiple regression?

Simple regression has one independent variable (Y = α + βX + ε). Multiple regression has two or more independent variables (Y = α + β₁X₁ + β₂X₂ + ε). Multiple regression is used far more often in finance because most outcomes are driven by several factors simultaneously.

What does R-squared tell you in a financial regression?

R-squared measures how much of the dependent variable’s variation is explained by the model. In a CAPM regression, an R² of 0.40 means 40% of the stock’s return variation is explained by market movements. The remaining 60% is firm-specific.

How do you test if a regression coefficient is statistically significant?

Check the t-statistic or p-value for that coefficient. If |t| > 2 (or the p-value < 0.05), the coefficient is statistically different from zero at the 5% significance level, meaning the variable has a real effect on Y.

What is multicollinearity and why is it a problem?

Multicollinearity occurs when two or more independent variables are highly correlated. It makes individual coefficients unreliable and inflates standard errors, even though the overall model fit (R²) may still look good. Check using the Variance Inflation Factor (VIF).

How is regression used to estimate beta?

Regress a stock’s excess returns (stock return minus risk-free rate) against the market’s excess returns over a historical period (typically 60 months). The slope coefficient is the stock’s beta, measuring its systematic risk relative to the market.