Why You Should Stop Adding 1 Before Taking Logs

Iterated OLS (iOLS) offers a principled, easy-to-implement solution to the log-of-zero problem

Sep 22, 2025 5 min read Research

If you have ever run a regression with a logged dependent variable, you have almost certainly encountered the dreaded log-of-zero problem. Your outcome variable contains zeros (trade flows between countries, patent counts, hospital visits, firm revenues) and log(0) is undefined. What do you do?

If you are like most applied researchers, you add a small constant and compute log(Y + 1) or log(Y + Δ), hoping the choice doesn’t matter too much. If you are more careful, you might turn to Poisson Pseudo-Maximum Likelihood (PPML). But both approaches carry important limitations that can silently distort your results.

In our paper “Dealing with Logs and Zeros in Regression Models”, my co-authors Christophe Bellégo, Louis-Daniel Pape, and I develop iterated OLS (iOLS), an estimator that resolves the log-of-zero problem while retaining the simplicity and flexibility of linear regression.

The problem is everywhere

We documented the scope of this issue by reviewing all empirical articles published in the American Economic Review between 2016 and 2020. The findings are striking: nearly 40% of empirical papers use a log specification, and 36% of those face the log-of-zero problem. The most common response? Adding a positive constant before taking logs (48% of cases). This is followed by PPML or GPML (35%) and the inverse hyperbolic sine (15%).

We also surveyed researchers at economics seminars and found a remarkably similar distribution of responses. The log-of-zero problem is pervasive, and practitioners overwhelmingly rely on ad-hoc fixes.

Why the popular fixes fail

Adding a constant log(Y + Δ) is simple, but the choice of Δ is arbitrary and consequential. Different values of Δ produce different estimates, different standard errors, and potentially different conclusions. More fundamentally, as shown by Chen and Roth (2024), no monotone transformation of the outcome, whether log(Y+1), the inverse hyperbolic sine, or any other, can consistently identify the average percentage treatment effect when zeros are present.

Dropping zeros throws away data and introduces selection bias. In many applications (trade, innovation, healthcare), the zeros are the phenomenon of interest.

PPML and GPML are theoretically grounded but face practical limitations. They suffer from an incidental-parameter bias when the model includes multi-way fixed effects (a common setting in trade and panel data), and they do not easily accommodate instrumental variables for endogenous regressors. These are not minor technical caveats: they affect exactly the settings where applied researchers most need these tools.

What iOLS does differently

The key insight behind iOLS is that the popular log(Y + Δ) transformation is actually an approximation to a well-defined econometric model, the exponential conditional mean model, but with the wrong constant. Instead of choosing Δ arbitrarily, iOLS lets the data choose it through an iterative procedure.

The algorithm is remarkably simple. Start with an initial guess, run OLS, use the fitted values to update the transformation, run OLS again, and repeat until convergence. Each step is just a standard OLS regression, which means you can use all the familiar tools of linear regression: Frisch-Waugh-Lovell projections for high-dimensional fixed effects, two-stage least squares for endogenous regressors, and clustered standard errors.

At convergence, iOLS delivers a consistent estimator of the normalized average treatment effect, the treatment effect expressed as a percentage of the mean untreated outcome. This is the parameter most researchers are actually trying to estimate when they use log specifications.

Why this should matter to you

The advantages of iOLS are both theoretical and practical:

It is as simple as OLS. Each iteration involves a single matrix inversion. No specialized optimization routines, no convergence issues with Newton-Raphson, much weaker separation problems. If you can run a linear regression, you can very probably run iOLS.

It handles zeros naturally. Unlike the log transformation, iOLS is well-defined when the outcome takes zero values. No arbitrary constants, no discarded observations.

It mitigates incidental-parameter bias. Because iOLS works through demeaning (Frisch-Waugh-Lovell), it eliminates fixed effects without estimating them. Under regularity conditions, this helps avoid the bias that can affect PPML and GPML in short panels with multi-way fixed effects.

It extends to instrumental variables. Through iterated 2SLS (i2SLS), our framework accommodates endogenous regressors naturally, something that is notoriously difficult with PPML and GPML.

It is globally convergent. We prove that the iOLS iteration is a contraction mapping under standard regularity conditions, guaranteeing convergence from any starting point.

Revisiting published results

To demonstrate the practical relevance of iOLS, we revisit three influential studies that each faced a variant of the log-of-zero problem. In each case, iOLS confirms the qualitative findings but reveals quantitative differences, sometimes substantial, compared to the original estimates. These are not cherry-picked examples of failure; they illustrate how the choice of method can quietly shift results in the kind of settings that define modern empirical economics.

Getting started

Stata and R packages are currently under development and available on GitHub. The paper is under review. Stay tuned for updates on the final publication and stable software releases.

If you use log specifications in your empirical work, and the evidence suggests most applied economists do, iOLS offers a principled, transparent, and easy-to-implement alternative to the ad-hoc fixes that currently dominate practice. The gap between what theory recommends and what practitioners do is rarely this easy to close.

Bellégo, C., Benatia, D., and Pape, L.-D. (2025). “Dealing with Logs and Zeros in Regression Models.” arXiv:2203.11820. Under review.

Econometrics iOLS Log-linear models