Regression Analysis Cheat Sheet for Finance
Regression analysis quantifies the relationship between a dependent variable and one or more independent variables. In finance, it is the workhorse behind beta estimation, factor models, earnings forecasting, and economic research. If you work with data, you work with regression.
Why Regression Matters in Finance
Regression lets you isolate how one variable moves in response to another — while controlling for everything else. Analysts use it to estimate a stock’s beta against the market, test whether EPS growth drives stock prices, or forecast revenue based on macroeconomic variables like GDP and inflation.
Without regression, you are guessing at relationships. With it, you can quantify, test, and defend your conclusions with statistical rigor.
Simple Linear Regression (SLR)
One dependent variable (Y), one independent variable (X). The model fits a straight line through your data that minimizes the sum of squared residuals (OLS — Ordinary Least Squares).
Where α is the intercept, β is the slope coefficient (how much Y changes for a one-unit change in X), and ε is the error term.
| Component | Meaning | Finance Example |
|---|---|---|
| Y (dependent) | Variable you want to explain | Stock return |
| X (independent) | Variable you think drives Y | Market return (S&P 500) |
| α (intercept) | Y when X = 0 | Alpha — excess return |
| β (slope) | Sensitivity of Y to X | Beta — market sensitivity |
| ε (error) | Unexplained variation | Firm-specific risk |
Multiple Regression
More than one independent variable. This is what you will use most often in practice because outcomes in finance rarely depend on a single factor.
Example: modeling a stock’s return using market return, interest rate changes, and oil prices simultaneously. Each β tells you the marginal effect of that variable, holding the others constant.
Key Regression Metrics
| Metric | What It Measures | Good Value |
|---|---|---|
| R-squared (R²) | % of Y’s variation explained by X | Higher is better (0 to 1) |
| Adjusted R² | R² penalized for extra variables | Use for model comparison |
| Standard Error | Average prediction error | Lower is better |
| t-statistic | Is β significantly ≠ 0? | |t| > 2 (at 5% level) |
| p-value | Probability β = 0 by chance | < 0.05 for significance |
| F-statistic | Overall model significance | Higher → model is useful |
| Durbin-Watson | Autocorrelation in residuals | Close to 2 = no autocorrelation |
OLS Assumptions (CLRM)
For your regression results to be valid, the Classical Linear Regression Model requires these assumptions to hold:
| Assumption | What It Means | If Violated |
|---|---|---|
| Linearity | Y and X have a linear relationship | Coefficients are biased |
| No multicollinearity | Independent variables are not highly correlated | Coefficients unstable, high standard errors |
| Homoscedasticity | Residual variance is constant | Standard errors are wrong |
| No autocorrelation | Residuals are not correlated over time | t-stats and F-stats unreliable |
| Normality of errors | Residuals are normally distributed | Hypothesis tests less reliable |
| No endogeneity | X is not correlated with ε | Coefficients are biased |
Common Regression Problems and Fixes
| Problem | How to Detect | How to Fix |
|---|---|---|
| Multicollinearity | VIF > 5-10; correlation matrix | Drop or combine variables |
| Heteroscedasticity | Breusch-Pagan test; residual plots | White’s robust standard errors |
| Autocorrelation | Durbin-Watson; residual plots | Newey-West standard errors; add lags |
| Omitted variable bias | Theory; intuition | Add the missing variable |
| Non-normality | Jarque-Bera test; histogram | Larger sample; transform variables |
Interpreting Regression Output
When you run a regression, here is how to read the results step by step:
1. Check overall fit: Look at R² and adjusted R². An R² of 0.60 means 60% of Y’s variation is explained by your model.
2. Check overall significance: The F-statistic tests whether at least one β ≠ 0. If the p-value of F is below 0.05, the model has explanatory power.
3. Check individual coefficients: Each β has its own t-statistic and p-value. If |t| > 2 (or p < 0.05), that variable is statistically significant.
4. Check residuals: Plot them. They should look random with no patterns. Patterns signal violated assumptions.
Regression in CAPM Beta Estimation
The most common regression in equity analysis: regress a stock’s excess return on the market’s excess return. The slope is beta.
If β = 1.3, the stock moves 1.3% for every 1% move in the market. The intercept (α) represents alpha — the return not explained by market exposure. The R² tells you how much of the stock’s movement is driven by the market versus firm-specific factors.
Always use adjusted R² when comparing models with different numbers of variables. Regular R² mechanically increases when you add variables — even useless ones. Adjusted R² penalizes complexity, so it only goes up when a new variable genuinely improves the model.
Confusing correlation with causation. A regression may show a strong relationship between two variables, but that does not prove one causes the other. There could be an omitted variable driving both. Always pair statistical results with economic logic.
Key Takeaways
- Simple regression uses one X variable; multiple regression uses several — multiple regression is standard in practice.
- R² tells you explanatory power; t-statistics tell you if individual variables matter; the F-test tells you if the whole model is significant.
- Always check OLS assumptions — violated assumptions make your results unreliable.
- The CAPM regression (stock vs. market returns) gives you beta and alpha directly.
- Correlation ≠ causation. Back your regression results with economic reasoning.
Frequently Asked Questions
What is the difference between simple and multiple regression?
Simple regression has one independent variable (Y = α + βX + ε). Multiple regression has two or more independent variables (Y = α + β₁X₁ + β₂X₂ + ε). Multiple regression is used far more often in finance because most outcomes are driven by several factors simultaneously.
What does R-squared tell you in a financial regression?
R-squared measures how much of the dependent variable’s variation is explained by the model. In a CAPM regression, an R² of 0.40 means 40% of the stock’s return variation is explained by market movements. The remaining 60% is firm-specific.
How do you test if a regression coefficient is statistically significant?
Check the t-statistic or p-value for that coefficient. If |t| > 2 (or the p-value < 0.05), the coefficient is statistically different from zero at the 5% significance level, meaning the variable has a real effect on Y.
What is multicollinearity and why is it a problem?
Multicollinearity occurs when two or more independent variables are highly correlated. It makes individual coefficients unreliable and inflates standard errors, even though the overall model fit (R²) may still look good. Check using the Variance Inflation Factor (VIF).
How is regression used to estimate beta?
Regress a stock’s excess returns (stock return minus risk-free rate) against the market’s excess returns over a historical period (typically 60 months). The slope coefficient is the stock’s beta, measuring its systematic risk relative to the market.