E.g., 06/29/2023
E.g., 06/29/2023
MPI Methodology for Assigning Legal Status to Noncitizen Respondents in U.S. Census Bureau Survey Data

MPI Methodology for Assigning Legal Status to Noncitizen Respondents in U.S. Census Bureau Survey Data

Policymakers, researchers, advocates, and others rely on the American Community Survey (ACS)—the U.S. Census Bureau’s annual large-scale household survey—for data on U.S. residents, including immigrants. The ACS includes detailed information about immigrants such as where they were born, whether they are naturalized U.S. citizens, how long they have been in the United States, and their demographic, human-capital, and socioeconomic characteristics. It also includes very large samples that allow for analysis of small populations at the state and local level. But the ACS does not distinguish legal immigrants from those who are unauthorized.

Given the significant demand for information about unauthorized immigrants living in the United States, researchers at the Migration Policy Institute (MPI)—in collaboration with demographers—have developed a three-stage method to estimate the number and characteristics of the unauthorized immigrant population. The method employs data from three sources: (1) the ACS, which offers a large sample size and a wide range of characteristics; (2) the Survey of Income and Program Participation (SIPP), another Census Bureau product that uniquely collects information about respondents’ legal permanent resident status (i.e., whether they have a green card); and (3) federal agency administrative data—mostly from the Department of Homeland Security (DHS)—which provide actual counts of immigrants in various legal statuses rather than estimates from population surveys. By utilizing these data sources, MPI identifies noncitizens in the ACS who are likely to have a legal status of one type or another, for example lawful permanent residents (LPRs), refugees, asylees, and nonimmigrants with temporary visas such as international students and H-1B workers (see Box 1). The remainder of the noncitizen population represents the likely unauthorized immigrant population.

BOX 1. Defining “Immigrants” in U.S. Census Bureau Survey Data

The term "immigrants" (or “foreign born”) refers to people residing in the United States at the time of the population survey who were not U.S. citizens at birth. The immigrant population includes naturalized U.S. citizens, lawful permanent immigrants (or green-card holders), refugees and asylees, certain legal nonimmigrants (including those on student, work, or certain other temporary visas), and unauthorized immigrants (those who entered the United States illegally, generally by crossing the border with Mexico, or who overstayed or otherwise violated the terms of their visas).


The term "U.S. born" refers to people residing in the United States who were U.S. citizens in one of three categories: those born in one of the 50 states or the District of Columbia; people born in U.S. territories such as Puerto Rico or Guam; or those who were born abroad to at least one U.S.-citizen parent.

In brief, MPI’s three-stage method starts with estimating the number of unauthorized immigrants by comparing the number of all immigrants in the ACS with a count of legal immigrants in DHS administrative data. In the second stage, researchers identify which noncitizens in the ACS are legal immigrants and which are unauthorized by comparing their characteristics to immigrants counted in the SIPP. In the third and final stage, the ACS estimates of unauthorized immigrants derived in the second step are weighted upwards to account for coverage error in the ACS and to match the totals derived in the first step comparing the ACS with the DHS data.

The three-stage method used to estimate the number of unauthorized immigrants residing in the United States and develop a profile describing them is described in greater detail here:

Stage 1. Estimating the Total Number of Unauthorized Immigrants by Origin Country

To estimate the total number of unauthorized immigrants in the United States, MPI researchers collaborated with leading demographer Jennifer Van Hook at The Pennsylvania State University to employ a well-tested “residual method” also used by researchers at the Pew Research Center, Center for Migration Studies of New York, and the Department of Homeland Security’s Office of Immigration Statistics. This method involves subtracting an estimate of lawfully present immigrants, based on administrative immigrant admissions data, from the total foreign-born population recorded in the ACS or decennial census. Unauthorized immigrants make up the difference between the total foreign-born and lawfully present populations. Using this residual method, unauthorized-immigrant population estimates can be generated for specific groups by national origin, age, gender, and length of U.S. residence, as well as over time.

The lawfully present foreign-born population is comprised of three groups: lawful permanent residents (LPRs), nonimmigrants, and refugees and asylees who have not yet become LPRs.

  • The LPR population—which for the purposes of this method includes anyone who obtained LPR status, including those who eventually naturalized—is estimated by adding up the number of LPR admissions from 1982 to the most recent year of available data—2018 at the time this was written—using publicly available data from DHS. The LPR population is then aged forward and adjusted downward slightly for deaths and emigration. LPR deaths are estimated using race, ethnicity, and nativity-specific rates from the U.S. Human Mortality Database and the National Health Interview Survey. The number of LPRs who left the country is estimated using annual emigration rates derived from a 2009 analysis of Social Security Administration (SSA) records by Jonathan Schwabish and adjusted for observed changes in the foreign-born population since that time based on analysis by Mark Leach.
  • Nonimmigrants are identified in the ACS as noncitizens who have occupations, immigration histories, and family/household characteristics congruent with the eligibility criteria for specific nonimmigrant visa categories. For example, international students (F-1 visa holders) are identified based on age of U.S. arrival, full-time school enrollment, and lack of full-time employment. H-1B “specialty occupation” workers are identified based on their educational attainment (a bachelor’s or higher degree), years in the United States, and employment in certain industries. MPI’s totals of nonimmigrants identified in the ACS are comparable with administrative data from DHS.
  • Finally, non-LPR refugees and asylees are identified by counting up grants of asylum and refugee admissions each year and applying an estimate from DHS of the average number of years it takes refugees and asylees to adjust to LPR status: Two years for refugees and four years for asylees in 2018.

Once the estimate of the lawfully present population has been constructed, it is subtracted from the total foreign-born population in the ACS to yield the unauthorized immigrant population. Then, the estimated number of unauthorized immigrants is adjusted upward, taking into account their long-acknowledged undercount in the ACS and other Census Bureau surveys. (LPRs are also undercounted in the ACS, though to a lesser degree than unauthorized immigrants. This undercount implies that more LPRs are counted in the administrative data than were counted in the ACS. The LPR total derived from administrative data is therefore adjusted downward in order to not subtract more LPRs than were counted in the ACS data. When the LPR total is adjusted downward, this results in a higher estimated unauthorized immigrant population. The undercount adjustment is made using ACS coverage-error estimates from prior studies updated to the most recent ACS year using unpublished data.

These unauthorized population estimates are constructed for each data year and each major country and world region of birth. Estimates are averaged across entry years in order to smooth artificial peaks (because people are more likely to report having entered the United States at the beginning of each decade—e.g. 1990, 2000, 2010—than in other years), and to reduce the incidence of negative estimates for populations with very small numbers in certain entry years. The unauthorized immigrant population profiles and related data on MPI’s website reflect estimates developed in this way, and the total unauthorized immigrant population estimated for 2018 is 11.0 million.

Stage 2. Identifying Likely Unauthorized Immigrants in the ACS Data Using the SIPP

In the second stage, the Census Bureau datasets used—SIPP and ACS—are linked to transfer information about the legal status of immigrants garnered from the much smaller SIPP to the far larger ACS. The two datasets are linked using procedures that were developed by Van Hook at Penn State and James Bachmeier of Temple University and refined in consultation with MPI.

The SIPP is particularly useful for this task because it is the only nationally representative Census Bureau survey that asks noncitizens to report whether they have LPR status. Noncitizens without LPR status may be unauthorized immigrants or they may be lawfully present under a different, temporary legal status. For instance, they may be recent refugees or asylees who have not yet adjusted to LPR status (as they are eligible to do after one year). Or they may be nonimmigrants: temporary visa holders such as international students, H-2A agricultural workers, or H-1B high-skilled workers.

In a process similar to that undertaken in Stage 1, nonimmigrants are identified in the ACS and in the SIPP and removed from both, and citizens (including naturalized citizens) are removed from the analytical sample. The remaining foreign-born population is comprised of two groups: (1) LPRs who entered the United States after 1982 and (2) a group that is mostly unauthorized immigrants but also includes recent refugees and asylees who have not obtained permanent residency. (Refugees and asylees are eligible to apply for permanent residency after one year in the United States. Due to backlogs in processing LPR applications, and the fact that not everyone applies as soon as they are eligible, it currently takes on average two years for refugees and four years for asylees to become permanent residents).

Next, MPI researchers merge the SIPP and ACS datasets and use a statistical process known as “multiple imputation” to assign immigration status in the ACS based on the status that noncitizens report in the SIPP. Multiple imputation maps noncitizens’ characteristics such as country of birth, year of U.S. entry, age, gender, and educational attainment across the two surveys. Using this mapping process, MPI assigns LPR status to noncitizens in the ACS who have similar characteristics to LPRs in the SIPP, and unauthorized status to those in the ACS with similar characteristics to unauthorized immigrants in the SIPP. In other words, the multiple imputation process “fills in” the missing information about unauthorized versus LPR status in the ACS, using the immigration-status variable in the SIPP and the multiple variables in common between the two datasets. After this imputation process, MPI researchers use federal government administrative data to identify recent refugees and asylees as those from countries with high shares of these groups among recent arrivals. Those who were initially assigned to be unauthorized immigrants, but who match the characteristics of refugees are reassigned to be legal immigrants.

Stage 3. Reweighting the Unauthorized Immigrant Population Identified in the ACS to Match the Totals Developed Using the Residual Estimate

The ACS misses some immigrants during the survey process, and it is weighted to reflect counts in the decennial census, which also undercounts this population. As a result, the total noncitizen population in the ACS is too low, and the estimate of the unauthorized immigrant population that results from MPI’s imputation process using the ACS data is therefore also too low at this stage.

In Stage 1, the unauthorized immigrant population estimates generated by comparing the ACS with DHS administrative data were weighted upwards to reflect coverage error in the ACS and the Census, producing population totals by major birth country and region. Therefore, here in Stage 3 of MPI’s data analysis, the unauthorized immigrants identified in the ACS data during Stage 2 are upweighted to match these country/region-specific totals.

To increase sample size and improve the reliability of state- and county-level estimates, MPI researchers combine data from five years of the ACS. They also weight that file to the size of the total unauthorized immigrant population for a single year based on the estimates generated in Stage 1. For example, to conduct analyses for 2018, MPI researchers use ACS data from 2014-2018, but weight the data to match the topline estimate of 11 million unauthorized immigrants in 2018. Therefore, even though five years of ACS data are employed in the analysis, the weighting process generates results that represent the population as of a single year.

Groups Included in MPI’s Unauthorized Immigrant Population Estimates

Immigrants become unauthorized when they enter the United States illegally, overstay a valid visa, or otherwise violate their terms of admission. Robert Warren has estimated that 54 percent of unauthorized immigrants are border crossers while 46 percent are overstays. In general, most Mexican and Central American unauthorized immigrants crossed the U.S.-Mexico border illegally, while most from other regions traveled to the United States on a valid visa and then overstayed or otherwise invalidated it (for example by working on a visa that does not permit work).

Most unauthorized immigrants remain out of status, but some are able to either obtain a temporary status or a short-term reprieve from deportation. MPI’s estimates of the unauthorized immigrant population include three such groups: Recipients of Temporary Protected Status (TPS) or Deferred Action for Childhood Arrivals (DACA), as well as asylum seekers with work authorization.

In 1990, Congress created TPS for immigrants who cannot return to their home countries due to hurricanes, earthquakes, other natural disasters, armed conflict, or other “extraordinary and temporary” conditions. In 2018, an estimated 318,000 individuals had TPS. Their most common origin countries were El Salvador, Honduras, Haiti, and Nepal. TPS holders are authorized to work in the United States.

In 2012, through executive action, the Obama administration created DACA, which provides two-year, renewable reprieves from deportation for certain unauthorized immigrants who came to the United States before age 16 and are enrolled in school or completed U.S. high school. In 2018, 704,000 unauthorized immigrants were enrolled in this program. DACA, like TPS, includes a grant of work authorization.

Since 2013, a large number of asylum seekers—mostly from Central America but increasingly from a broad range of world regions—have come to the United States, generally by crossing the U.S.-Mexico border illegally or requesting asylum at an official border crossing. After six months in the United States, migrants with pending asylum claims are eligible to apply for work authorization (a regulation that took effect in August 2020 extended the waiting period to one year). Using DHS data on employment authorization documents, MPI estimates that as many as 738,000 unauthorized immigrants were work-authorized asylum seekers in 2018. This estimate is based on the number of employment authorization documents (EADs) approved for asylum seekers in fiscal years (FY) 2017 and 2018, excluding approvals for replacement cards. (MPI uses two years of EAD approvals because EADs are granted to asylum seekers with a two-year validity period.

For a more detailed explanation of the methodology, see Jennifer Van Hook, James D. Bachmeier, Donna Coffman, and Ofer Harel, “Can We Spin Straw into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches,” Demography 52 (1): 329-54, www.ncbi.nlm.nih.gov/pmc/articles/PMC4318768/. For examples of other methodologies using the residual method, see Bryan Baker, Illegal Alien Population Residing in the United States: January 2015 (Washington, DC: Department of Homeland Security, 2018); Frank D. Bean et al., “Circular, Invisible, and Ambiguous Migrants: Components of Difference in Estimates of the Number of Unauthorized Mexican Migrants in the United States,” Demography, 38, no. 3 (2001): 411-22; Robert Warren and Jeffrey S. Passel, “A Count of the Uncountable: Estimates of Undocumented Aliens Counted in the 1980 United States Census,” Demography, 24 (1987): 375-93.


Allison, Paul D. 2002. Missing Data. Thousand Oaks, CA: SAGE Publications.

Congressional Research Service. 2018. Temporary Protected Status: Overview and Current Issues. Washington, DC: Congressional Research Service.

Department of Demography, University of California, Berkeley; Max Planck Institute for Demographic Research; and Center on the Economics and Development of Aging, French Institute for Demographic Studies. 2019. National Health Interview Survey. Accessed from Lynn A. Blewett, Julia A. Rivera Drew, Miriam L. King and Kari C.W. Williams. Integrated Public Use Microdata Series: Health Surveys: Version 6.4 [dataset]. Minneapolis, University of Minnesota, 2019.

---. N.d. The Human Mortality Database. Accessed July 6, 2018.

Leach, Mark. 2017. Recent Innovations in the U.S. Census Bureau's Method of Estimating Foreign-born Emigration. Poster presented at the 2017 annual meetings of the Population Association of America, Chicago, 2017.

Leach, Mark and Eric Jensen. 2013. Estimating Foreign-Born Emigration from the United States Using Data from the American Community Survey. Paper presented at the Federal Committee on Statistical Methodology Research Conference, Washington, DC, November 4-6, 2013.

Mule, Thomas. 2012. Census Coverage Measurement Estimation Report: Summary of Estimates of Coverage for Persons in the United States. Washington, DC: U.S. Census Bureau.

Rubin, Donald, B. 1987. Multiple Imputation for Nonresponse in Surveys. New York, NY: John: Wiley, and Sons.

Schwabish, Jonathan A. 2009. Identifying Rates of Emigration in the United States Using Administrative Earnings Records. Working paper 2009-01, Congressional Budget Office, Washington, DC, March 2009.

U.S. Census Bureau. 2020. American Community Survey. Accessed from Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 7.0 [dataset]. Minneapolis: University of Minnesota, 2020.

U.S. Citizenship and Immigration Services (USCIS). 2018. Approximate Active DACA Recipients as of July 31, 2018, accessed October 20, 2020.

---. N.d. Form I-765 Application for Employment Authorization: All Receipts, Approvals, Denials Grouped by Eligibility Category Filing Type, FY 2017 and 2018. Accessed October 15, 2020.

U.S. Immigration and Naturalization Service (INS). N.d. Immigrants Admitted to the United States, 1982-2000. Accessed October 20, 2020.

U.S. Department of Homeland Security (DHS) Office of Immigration Statistics. N.d. Yearbook of Immigration Statistics, 2001 and 2002. Washington, DC: DHS Office of Immigration Statistics.

---. N.d. Profiles on Lawful Permanent Residents (FY 2003-2018. Washington, DC: DHS Office of Immigration Statistics.

---. N.d. Adjustments to Lawful Permanent Residence by Year of Entry: FY 2000 to 2018. Washington, DC: DHS Office of Immigration Statistics.

Van Hook, Jennifer, Frank D. Bean, James D. Bachmeier, and Catherine Tucker. 2014. Recent Trends in Coverage of the Mexican-Born Population of the United States: Results from Applying Multiple Methods Across Time. Demography, 51, no. 2: 699-726.

Warren, Robert. 2020. Reverse Migration to Mexico Led to US Undocumented Population Decline: 2010 to 2018. Journal on Migration and Human Security 8, no. 1: 32-41.