# When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

Cross Validated Asked on January 5, 2022

The Gini coefficient $$G$$ is a commonly used measure of income distribution inequality, taking values from 0 (meaning every individual in the population has an identical income) to 1 (meaning a single individual in the population earns the entirety of the population’s income… and violent revolution is likely imminent ;). $$G$$ is the difference between the ‘line of income equality’ where income is distributed uniformly in the populace, and the Lorenz curve, which describes cumulative income (or wealth, other social resources, etc.) as a function of cumulative portion of the population.

In the US, and in US states, $$G$$ is calculated using income tax return data provided by the IRS. For example, calculated from the IRS’ Tax Year 2017: Historic Table 2 (SOI Bulletin). However, data based on US tax returns makes clear that the number of returns filed is (much) less than the population of the US.

This is to be expected, I suppose: we generally do not expect 2 year olds to be earning income, or filing taxes on it, for example. On the other hand, those not filing returns are probably a heterogeneous group, and likely include: the jobless (i.e. employable, but not working, not actively searching for employment, and not earning a taxable income… many full-time college students, for example), possibly some wealthy folks with income sources that are entirely (or nearly entirely) not taxable (e.g., they own a lot of treasury bills, or Muni bonds from their state of residence, etc.), and unemployable (e.g., the aforementioned toddler people persistently lacking language, etc.), possibly others.

How are individuals not filing returns typically accounted for in calculating $$boldsymbol{G}$$ as a measure of income distribution inequality? (Assuming that we have reliable estimates of the population size?)

Are they:

1. Ignored? (I.e. is $$G$$ typically estimated as a measure of inequality among those filing taxes)?

2. Incorporated into the calculation of $$G$$ with zero assumed income?

3. Incorporated into the calculation of $$G$$ with some estimated mean outcome below the level required for filing taxes?

4. Incorporated into the calculation of $$G$$ with some other kind of estimated mean outcome?

5. Something else?

Bonus question: If there is some means of incorporating the whole population into the estimate of $$G$$, is this for all ages, or only for some range, such as 18–62 years?

PS One of the places where $$G$$ breaks down as a measure is when some people actually have negative incomes: certainly possible in the US today. It is probably Ok to leave this nuance out of the answer to this question… unless it isn’t. 🙂

## Related Questions

### Choice between static and dynamic panel regression

2  Asked on December 21, 2020 by uzbekistan

### Are there realistic/relevant use-cases for one way ANOVA?

2  Asked on December 20, 2020 by david-ernst

### Help with name of a regression

0  Asked on December 19, 2020 by user276835

### Forecasts combination via weights based on normal distribution

0  Asked on December 18, 2020 by oumayma-bounouh

### May Skilling’s Nested Sampling Estimate parameters in hierarchical model?

0  Asked on December 18, 2020 by germania

### How to test the influence of an external factor?

0  Asked on December 17, 2020 by pavel

### Nonparemetric tests: how to support the null hypothesis you claim to be testing

1  Asked on December 17, 2020

### Hazards in AFT with Weibull distribution

1  Asked on December 17, 2020 by user11130854

### Seeking authoritative references on weighted ANOVA

0  Asked on December 16, 2020 by whuber

### Lasso Regression – Finding multiple candidate models

1  Asked on December 16, 2020 by jlearner

### Are conditional mean in an AR(1)-GARCH(1,1) equal for different GARCH(1,1) processes of the same data?

1  Asked on December 16, 2020 by ber08

### Completly Randomized Trials versus Incomplete Cubic Lattice

0  Asked on December 16, 2020 by noumenal

### Unusual (to me) Phrasing of Power Analysis Objective; Interpretation Requested

1  Asked on December 15, 2020 by emmettcc

### Question about the right inverse method in a GLM of order 2

1  Asked on December 15, 2020 by suzee

### Different formulations of within-class scatter matrix

0  Asked on December 15, 2020

### Correct algorithm for string classification

1  Asked on December 14, 2020 by bandit_king28

### What is the best way to remember the difference between sensitivity, specificity, precision, accuracy, and recall?

9  Asked on December 13, 2020 by jessica

### Quantifying the uncertainty of aggregated model predictions

1  Asked on December 13, 2020 by kh_one

### Evaluate Bayesian SEM goodness of fit blavaan

1  Asked on December 13, 2020 by l-sicilis