Science & Society

An Analysis of Social Security Data: Not the Picture You Have Been Told!

Written by Bruce R. Copeland on April 06, 2025

Tags: annuity, data science, economics, finance, government, pension, retirement, social security

It is accurate to say that few people understand the United States Social Security system very well, and sadly this appears to include most members of Congress and even the Social Security Administration (SSA) itself. I do not expect this article will completely change the situation, but perhaps it may help to correct some of the more serious problems with Social Security funding and administration.

Over the years, the U.S. Social Security program has come under fire for various things. Some of the better-known criticisms: an entitlement, a money pit for the federal government, the reason for an ever-expanding federal budget deficit, a Ponzi scheme, etc. In recent years, a number of politicians have started claiming current Social Security beneficiaries are getting too much in benefits, and this is why the system is in trouble. One variation on this latter argument claims the current cadre of beneficiaries (boomers born between 1946 and 1964) are simply too numerous for the system.

What Is Social Security?

So what is Social Security? It is a government retirement program which most workers are entitled to use, and therefore in a very narrow technical sense it is an entitlement. On the other hand the term entitlement usually means something you get because of who you are instead of what you have done. For the retirement portion of Social Security (officially Old-Age and Survivors Insurance or OASI), beneficiary payments are all about what workers have contributed to Social Security, so the latter meaning of an entitlement clearly does NOT apply.

Functionally Social Security is closest to a pension or lifetime income annuity in which payments come from both workers and their employers. (It has some features of both defined benefit and defined contribution pension plans.) Contributions to pensions and annuities are invariably invested, and then benefits are paid at retirement from the investment gain and some of the investment principal. The main question with pensions and annuities is always 'how much in benefits are you receiving compared to what you (and your employers) paid in?' This is actually a great question to ask about Social Security.

Before looking at this question, it is important to agree on a few terms and concepts. I am going to use the term 'retirement' to mean the age at which a worker is eligible for full Social Security benefits. Right now this is age 66 for most workers, but it has been age 65 at some points in the past, and it could be age 67 in the near future. Some workers actually begin taking reduced benefits as early as age 62, and some wait until age 70 to begin taking higher benefits. I will not distinguish between those who take early or late or regular full benefits because the total amounts early and late beneficiaries receive are built into the system to be financially equivalent to the amount an individual receives at normal full retirement. Also I note that some Social Security beneficiaries work after they begin getting Social Security benefits. This does not really complicate the definition of retirement I am using, but it will make it necessary to include any income contributions these retired workers make to Social Security. The term Social Security benefits also needs some explanation. OASI makes payments to workers and in many cases their spouses and other dependents (usually children). I am going to include all these payments as worker 'benefits'. SSA also has other more minor programs covering disability, etc. These are significantly smaller in overall amounts and are administered somewhat separate from worker/spouse/dependents benefits. I am therefore going to exclude disability benefits in the analysis here. Finally Social Security contributions come from both employers and employees, but both amounts are based on worker income. Henceforth I will refer to the total of such contributions as worker contributions.

What Can Be Learned From Social Security Data?

SSA like most federal government agencies makes extensive amounts of data about their operations available. Every year the SSA publishes a Trust Report which summarizes that year's operations along with some historical context and estimates for the future. Table I (columns A-M) includes one of the tables (Table VI.A1.— Operations of the OASI Trust Fund, Calendar Years 1937-2023) from the SSA Trust Report for 2024. This table contains the historical yearly worker contributions to Social Security and yearly benefits paid out to retired workers and/or their dependents along with quite a bit of additional supporting information. Table I is a perfect starting point for a comparison of total worker contributions to Social Security and total benefits paid to workers. There is a lot of information in this table; so I have highlighted the three columns of SSA data (dark blue, purple, and green) which are of primary interest for this analysis.

Image by Alexander from Pixabay

Some aspects of the data analysis for this table are complex. Before talking in detail about the data analysis, I think it is helpful to jump ahead to the results. Figure 1 really shows it all. This is a comparison of Lifetime Worker Payroll Contributions (blue) with Lifetime Worker/Dependent Benefits (magenta) for each Year at Full Retirement (age 66) from 1948 to 2023. Social Security and the SSA were still fairly new in 1948, and the first 10 or so years of data after 1948 are somewhat sketchy. There are however some clear trends. For most of the first forty years of this plot, SSA was paying out substantially more in retired worker/dependent benefits than workers had actually contributed. Figure 1 also shows the ratio of the value of the SSA trust fund as a percent of yearly benefits (green). This curve clearly confirms what the combination of the other two curves shows. By 1980, the Social Security Trust Fund was nearly insolvent because benefits were significantly exceeding lifetime worker contributions.

Interestingly the Social Security Trust Fund did not go broke after 1980. What changed? Two main U.S. labor force demographic changes began in the 1970s. The boomer generation of U.S. children became adults and entered the workforce. The boomer generation is/was about 20% larger than previous and subsequent generations. Also in the 1970s women started to enter the U.S. labor force in quite significant numbers. These two changes meant more income for the Social Security program. The substantial entry of women into the workforce had another important effect for Social Security. The proportion of dependent benefits to worker benefits began to drop because retirement age women could now access their own worker benefits which are/were typically larger than dependent spousal benefits. This effect is clearly evident in Table II (Table V.C4 of the 2024 Trust Report), which shows that the relative number of dependent spouses compared to workers peaked around 1980 to 1985 and has been dropping since. This likely did much to cause the decline in Lifetime Benefit Payments appearing after 1980 in Figure 1.

How the Social Security Data Was Analyzed

Now let's talk about the data analysis for Figure 1. To do the analysis, it helps to focus on the group of workers who start taking Social Security benefits in any given year. Those workers are approximately (but not exclusively) 66 years of age. I refer to each year's worth of new worker beneficiaries as a cohort.

It is possible to estimate fairly accurately the total Social Security payroll contributions made by one of these worker cohorts over their working lifetime by summing (integrating) a fraction of the contributions made every year for the previous 50 years (more detail on this shortly).

Benefits paid to a cohort pose more complicated problems. Summing benefits over the years following retirement might work, but it limits the analysis to cohorts which retired more than 20 years previously. A bigger problem is that SSA does not clearly spell out the numbers and types of new beneficiaries every year. Fortunately there is a straightforward way around this problem. Benefits paid to workers/dependents in any given year span a number of different cohorts, including the cohort which reached full retirement age in that year (see e.g. Tables 5.A1 - 5.A1.6 in the SSA Annual Statistical Supplement for 2024). It turns out total benefits for any year have numbers, ages, sexes, dependents, etc. in the same proportional relationship to the year's cohort as would be needed in order to track the cohort's benefits forward in time. This means the total benefits paid in any year are a very good statistical representation of the total payments which would be made to that year's retirement cohort over time. Equivalent distributions (in this case transverse vs. longitudinal) are a fairly commonly used statistical simplification in the physical sciences, and I use it here to estimate the lifetime worker/dependent benefits for the retirement cohort associated with each year at full retirement. [Note that transverse (or cross-sectional) vs longitudinal statistical equivalence works for quantities (number of workers, dollars, etc). It does not work for opinions or other rapidly changeable distributions, which is why social scientists do not use it for surveys.]

In order to compare worker contributions and benefits (income) over a 90 year span. it significantly helps to convert these quantities into today relative income. For this purpose I use per-capita GDP as the relevant deflator. Historical data for per-capita GDP is available from U.S. Bureau of Economic Analysis via the St. Louis Federal Reserve Bank (FRED), and I have augmented Table I with that data in column P for appropriate years. It also helps to adjust historical data to some kind of constant labor force size. Historical civilian labor force data is available from the the U.S Bureau of Labor Statistics (BLS) via FRED, and I have reproduced the necessary data for 1948-2023 in column O of Table I. Correction for changes in labor force size is especially important to properly handle large changes in Social Security due to entry of women into the workforce. These changes are quite dramatic in rows 40-60 of column O. I have used the labor force and per-capita GDP data to convert Total SSA Income from worker payroll contributions (column B) to Levelized Worker Contributions (column R) and to convert Total SSA Cost from worker/dependent benefits (column G) to Lifetime (cohort) Benefits (column T).

To total the Social Security contributions for any given retirement cohort, it is simply necessary to add the fractional contributions from that cohort for the fifty years ending in that full retirement year. What do I mean by fractional contributions? Each year of worker contributions to Social Security consists of contributions from all ages of workers in the labor force. The yearly contribution for any given retirement cohort can be calculated using the cohort's age fraction in the labor force. BLS periodically reports the distribution of workers at different ages. This distribution changes somewhat over time, which could be a complication. Column W in Table I shows the Labor Force Distribution reported by BLS for 2023. I have used this distribution to sum Lifetime Worker Contributions in column S. It turns out the data in column S is very similar to data obtained using a flat age distribution for workers (ie. 0.02 for each of the 50 years which need to be totaled). This indicates the summation of yearly contributions is not very sensitive to age distribution in the labor force; this is not terribly surprising given that contributions are totaled over 50 years.

Some Additional Nuances in this Analysis of Social Security Data

It is unclear why SSA has failed to carry out the type of analysis here or why the yearly actuarial certification of the Social Security Trust Report has failed to suggest or demand such an analysis. Over the years SSA has been fairly good about balancing total yearly income from worker payroll contributions with total costs from benefit payments. It is quite evident this has not always been sufficient to keep Social Security healthy. Payroll contributions have lagged seriously over some earlier time periods, and the success of the program seems to be unreasonably sensitive to worker demographics. Note that comparison between lifetime contributions and lifetime benefits is an established manner in which the financial/investment community assesses the performance of pensions and annuities. If SSA were to carry out this type of analysis, they have access to historical data detail which would eliminate the need for some approximations made here.

Somewhat similar analyses of lifetime contributions and lifetime benefits for Social Security have also been done by Steuerle and Smith. Their analysis focuses more on differences in contributions and benefits for different income levels than on generational health of social security. Their results are however overall consistent with what is presented here. There is also an SSA actuarial note by Nichols, Clingman, and Wade, which seems to reinforce elements of what is presented here. The actuarial note is somewhat buried by SSA.

Could There Be Massive Fraud in the Social Security Program?

While I was completing the analysis for this article, there began to be claims that the Social Security program might contain 0.5 trillion or even 1 trillion dollars in fraudulent payments. Is this plausible?

Internal data consistency and consistency with other similar types of data are measures data scientists use to decide if a given data set is sufficiently valid. Data on most aspects of Social Security and SSA (numbers of beneficiaries, ages, employment contributions, benefit payments, etc.) are extensively published (and have been extensively published for seven decades or more). Other sources (BLS, FRED, other data aggregators, and other population studies) allow us to estimate roughly how many retired Americans there are. The U.S. life expectancy is about 80 years, so it is reasonable to guess that the U.S retirement population is something like 15 years (65 through 80) times a yearly population fraction between 0.01 and 0.0125 out of a total population of 340 million. This amounts to a retired population of about 57 million. You can readily query neighbors, relatives, and financial news articles to find out that the typical Social Security beneficiary receives around $1800/month in benefits. Multiplying these numbers gives an estimate of 1.23 trillion dollars in Social Security retirement benefits for a year. This estimate is remarkably close to the $1.25 trillion dollars in paid benefits reported by SSA for 2024,

There ARE some meaningful deficiencies in how Social Security is administered, and there is always some chance (even likelihood) of fraud (a few percent or less). But there is no possible way for fraud to be eating up 0.5 trillion dollars — let alone 0.5 trillion dollars per year. As discussed in this article, the typical yearly income AND cost for SSA is a little more than 1 trillion dollars. There is no room in those numbers for 0.5 trillion dollars in fraud. If massive fraud existed to the tune of 0.5 trillion, it would have to be off-books (ie. not in what SSA publishes or what Dept. of Treasury sees). Such a discrepancy could only be possible if most Treasury workers and heads AND nearly every person in Congress (BOTH parties) past and present were complicit.

Conclusions

This analysis of Social Security leads to a number of important conclusions:

  • Most of the problems with the Social Security system (today and in the past) stem from failure of SSA to apply sound data science (including forward and reverse looking analyses) and abject failures of oversight by the U.S. Congress (BOTH parties).
  • The worker/beneficiary groups which have been most criticized for use of the Social Security system have actually done the most to strengthen and rescue the Social Security system (boomers and working women).
  • There is no evidence of widespread (trillion dollar) fraud within the Social Security system, and standard data science measures indicate that available SSA data is credible and consistent with other similar known economic data.

It is important to note that the real problems with Social Security are mostly NOT the problems critics usually list. Also the periods when the Social Security system has been in greatest trouble have mostly not been the times when criticisms were leveled. SSA seems to spend a great deal of effort trying to project demographics and Trust Fund performance forward (sometimes by 50 years or more) rather than analyzing and understanding current and past performance.

Clearly Social Security does face some headwinds as contributions from the boomer generation begin to decline, and there are indications of this in the very most recent years of Figure 1. The historical results in Figure 1 strongly suggest a healthy trust fund is best maintained when there is a small (3 to 5%) excess of worker contributions over benefits. No retiree wants to give up a few percent of yearly benefits, but this would be far preferable to a 25 or 30% decrease in benefits eight or ten years down the line. There seems to be much concern that U.S. life expectancy is going to climb to 90 or 95. This is a possibility, but far from certainty (COVID very definitely gave the lie to predictions about longevity). It would be far better for SSA to focus on sound trust management based on historical results than try to guess how health is going to change.

Boomers and working women rescued Social Security in the 1980s and 1990s to the tune of some 400% of yearly benefit payments (2.4 trillion dollars). What are Congress and the SSA going to do to ensure boomers have at least this amount of Social Security Trust Fund to cover their retirement?

Have A Question?

Have a question about this article? Have something to add on this topic? Email me, and I’ll get back to you personally.

Notes and Licensing

This work by Bruce R. Copeland is licensed under a Creative Commons BY-NC 4.0 International License. The overwhelming majority of the data in this article is U.S. Government data in the Public Domain. There are two important exceptions. The per-capita GDP and Labor Force data in Table I are derived from Public Domain U.S. Government data, but were separately aggregated and provided by FRED under a non-commercial use license. I am accordingly making this entire article available under a Creative Commons BY NC license. Note that it is also acceptible to use/share any other parts of this article commercially BY as long as columns O and P of Table I (or Table I in its entirety) are excluded. It may also be possible to get permission from FRED to use the data in columns O and P of Table I commercially, in which case this entire article is licensed BY under Creative Commons.