Studying Second-Generation Immigrants: Methodological Challenges and Innovative Solutions

Douglas D. Heckathorn

Without a public sampling frame, the second generation is a "hidden population.”

How many second-generation children are not fluent in English? Which ones have earned college degrees? Why have members of the second generation chosen certain types of occupations and not others?

These questions are not only interesting to researchers but also relevant for policymakers. In order to study the U.S.-born children of immigrants, commonly called the second generation, researchers need both demographic information and qualitative information that can only be learned through surveys and interviews.

Although imperfect, demographic information is readily available from the U.S. Census Bureau. In contrast, researchers have no easy "list" they can use to find and contact second-generation immigrants they would like to survey or interview.

The first part of this article will discuss Census Bureau data and the second part will examine ways to survey and interview the second generation, with a particular focus on a relatively new methodology called respondent-driven sampling (RDS).

Demographic Information

The U.S. Census Bureau provides three types of data relevant to studying the second generation: the decennial census, the American Community Survey (ACS), and the Current Population Survey (CPS).

The decennial census, last conducted in 2000, aimed to reach every person in the United States, regardless of their status. The 2000 census asked respondents for their country of birth but did not ask for their parents' country of birth. As a result, the 2000 census did not identify the number of adults born in the United States who have one or more foreign-born parents. Therefore, the 2000 census can only tell researchers about second-generation members who still live with their parents; the majority of this population is under age 18.

On a positive note, the 2000 census provides detailed information about the children's parents. The education level and income of parents, for instance, can help researchers understand trends among the youngest members of the second generation.

Meant to provide up-to-date statistical "snapshots" of communities between decennial census years, ACS was fully rolled out in 2005 and will be conducted each year through 2010 and beyond. ACS, which is sent to 250,000 addresses per month, does not have as broad a sample size as the decennial census but will have collected enough information by the summer of 2010 to report data on individual census tracts, the smallest geographic unit.

Like the decennial census, ACS also does not ask for parents' country of birth and thus can only be used to gather information about children of immigrants who live with their parents.

The following information about the second generation is available from 2000 census and 2005 ACS data:

Where the children of immigrants and their parents live (state and certain levels of geography for 2000 census; areas with populations of 65,000 or more for 2005 ACS)
Ages of children and parents
Country of origin of children's parents
Year in which the parents arrived in the United States
Level of self-reported English ability of the children and their parents
Grade level of children
Parents' employment
Parents' occupation
Parents' education level
Parents' income level and whether they are above or below the federal poverty line

CPS, specifically the March supplement, does ask respondents about their parents' country of birth. This makes it possible for researchers to obtain information about members of the second generation of any age. However, second-generation adults who have established their own households cannot be "matched" with their immigrant parents, and thus nothing can be said about parents' characteristics.

It must also be noted that CPS surveys only 50,000 households per month — a far smaller sample than ACS. Consequently, data can only be analyzed at the national level for any given year. By combining CPS years together, the sample size can be increased and researchers can conduct analysis at the state or large metro area level. However, the sample size would still be too small to examine characteristics of, for example, second-generation Dominican adults in a particular suburb.

The following information about the second generation is available from the CPS March supplement:

Age
Marital status
Employment
Occupation
Education level
Income level and whether the individual is above or below the federal poverty line
Welfare status

Interview and Survey Methodology

If a researcher is interested in surveying foreign-born Chinese parents and their U.S.-born children in a particular New York City neighborhood, census data can only be so helpful. By law, the Census Bureau must protect and keep confidential the information respondents provide. In other words, researchers cannot obtain from the Census Bureau the addresses or phone numbers of those who meet the research criteria.

Indeed, a challenge to the study of second-generation immigrants is the lack of a comprehensive public list, termed a "sampling frame," from which representative samples can be drawn.

In contrast, general population surveys can draw on telephone records, property tax roles, voter registrations, and other public lists of residents or residences. Similarly, studies of special groups such as physicians or lawyers can use lists of those who hold professional licenses. However, no comparable lists exist for immigrants, including the second generation.

Of course, lists can be constructed based on general population surveys, but in some settings this is infeasible because the target population (e.g., immigrants from a particular country or region) is such a small part of the general population that costs would be prohibitive. Another reason, also relevant to immigrants and their children, is that some groups' social networks are difficult for outsiders to penetrate.

For all these reasons, immigrants are an example of what is now termed a "hidden" or "hard-to-reach" population. The importance of developing means for sampling these populations has been recognized for several decades because these populations are important to many research areas, including arts and culture, public policy, and public health.

Sampling hard-to-reach populations has its problems. One approach relies on institutional records to find population members. However, using such records has limitations because institutions never sample randomly.

Voluntary associations, such as social clubs and professional associations, tend to oversample the more fortunate within a population. For example, in a study of jazz musicians, union members earned 50 percent to 100 percent more than nonmembers, and they were nearly 10 years older.

In contrast, it is well known that involuntary institutions, such as prisons and jails, tend to oversample the dispossessed. Similarly, location-based samples are valid only for geographically concentrated populations. Samples of ethnic communities, for example, miss those who live in other communities.

Despite these limitations, samples drawn from an institution or location provide a valid statistical basis for generalizing to the entire institution or location. However, this provides a valid sample only of that nonrandom portion of the population that is accessible via institutions or locations.

The second approach to sampling hidden populations relies on social networks, as in snowball sampling (referrals from initial subjects generate additional subjects) and other chain-referral methods. These methods are appealing because respondents are reached through connections to relatives, friends, and acquaintances, and hence the sample can reach even those who lack institutional affiliations or those who reside outside of ethnic communities.

Chain-referral methods also tend to reduce nonresponse bias, because respondents are referred by those with whom they already have trusting relationships. This is especially important when studying vulnerable or stigmatized groups, such as unauthorized immigrants. Consequently, network-based samples have more comprehensive coverage than institutional or location samples.

However, these samples have been seen as convenience rather than probability sampling methods due to biases inherent in snowball-type methods, such as oversampling those who are well-connected (i.e., those with larger personal networks), since more recruitment paths lead to them. Biases also result when some groups recruit more effectively, and hence their distinctive recruitment patterns shape the sample.

Owing to these biases, results from a chain-referral sample cannot be validly generalized to the population from which the sample was drawn. Hence the dilemma: statistical validity with limited coverage of the target population, or broader coverage but conclusions that cannot be generalized.

Respondent-Driven Sampling: A New Approach

Respondent-driven sampling (RDS) resolves this dilemma by converting chain-referral into a probability sampling method, thereby providing the means for combining broad coverage of the target population with the ability to generalize study results to the population from which the sample was drawn. This method has been used to study jazz musicians and Vietnam War era draft resisters, and in more than 20 other countries to study intravenous drug users, gay men, prostitutes, and street youth.

In RDS, as in other snowball-type samples, respondents recruit peers, who then recruit their friends and acquaintances who qualify for entry into the sample, who in return recruit their peers, so that the sample expands through successive waves of peer recruitment.

Tests of RDS have shown that if referral chains are sufficiently long — that is, if the chain-referral process consists of enough waves or cycles of recruitment — the composition of the final sample with respect to key characteristics and behaviors will become independent of the seeds from which it began. To create long chains, respondents need to be recruited by their peers rather than by researchers. Also, the researchers need to set a recruitment quota so a few respondents cannot do all the recruiting.

The researchers keep track of who recruited whom and their numbers of social contacts. A mathematical model of the recruitment process then weights the sample to compensate for nonrandom recruitment patterns, thereby producing statistically unbiased results.

RDS analyses can also provide information on the social network connections among respondents. In the case of the Chicago Latino data set, compiled by Jesus Ramirez-Valles in 2004, it is possible to measure immigrant groups' insularity (see Table 1). Here insularity is measured by the homophily index (the degree to which people tend to resemble one another).

Table 1. Recruitment by Immigration Status (Recruitment Count; Transition Probability)

Immigration Status of Person who Recruited	Immigration Status of Recruit
	First Generation	Second Generation	Native	Total
First Generation (number)	172	34	15	221
	77.8%	15.4%	6.8%	100%
Second Generation (number)	28	16	7	51
	54.9%	31.4%	13.7%	100%
Native (number)	9	14	11	34
	26.5%	41.2%	32.4%	100%
Total Distribution of Recruits	209	64	33	306
Sample Distribution	68.3%	20.9%	10.8%	100%
Equilibrium	67.1%	21.7%	11.1%	100%
Mean Network Size	7.1	6.5	9.2
Homophily	0.316	0.099	0.26
Population Estimate	67.6%	23.9%	8.6%	100%
Standard Error	3.4%	1.9%	0.0%

The first generation is the most insular, with a homophily index of .32. This indicates that 32 percent of the time they form a tie to another member of the first generation, and the rest of the time form ties consistent with random mixing (i.e., forming ties without regard to immigration status). Natives have a similar index of .26, so they are also substantially insular. In contrast, the second generation has a minimal index of .10, indicating that it serves as a bridge connecting the first generation to natives because 90 percent of their ties are formed irrespective of immigration status.

The applicability of RDS to study an immigrant group depends on the density of ties though which they are linked. For studies of the second generation, the empirical question is whether ties among them are dense enough to sustain a robust chain-referral process; and, if not, members of the first generation or natives may also have to be included in the sampling frame to provide indirect links among members of the second generation. Establishing a sense of trust, important in other RDS studies, will be equally important in RDS studies of immigrant groups.

Sources:

Abdul-Quader, Abu S., Douglas D. Heckathorn, Courtney McKnight, Heidi Bramson, Chris Nemeth, Keith Sabin, Kathleen Gallagher, and Don C. Des Jarlais. 2006. "Effectiveness of Respondent Driven Sampling for Recruiting Drug Users in New York City: Findings From a Pilot Study." Journal of Urban Health, 83: 459-476.

Erickson, Bonnie H. 1979 "Some Problems of Inference from Chain Data." Sociological Methodology 10:276–302.

Heckathorn, Douglas D.1997 "Respondent Driven Sampling: A New Approach to the Study of Hidden Populations." Social Problems 44:174–99. 2002. "Respondent Driven Sampling II: Deriving Statistically Valid Population Estimates from Chain-Referral Samples of Hidden Populations." Social Problems 39: 11-34.

Heckathorn, Douglas D., and Joan Jeffri. 2003. "Social Networks of Jazz Musicians," pp. 48-61 in Changing the Beat: A Study of the Worklife of Jazz Musicians, Volume III: Respondent-Driven Sampling: Survey Results by the Research Center for Arts and Culture, National Endowment for the Arts Research Division Report #43, Washington DC, 2003.

Kalton, Graham. 1983. Introduction to Survey Sampling. Newbury Park, CA: Sage Publications.

Ramirez-Valles, Jesus., Douglas D. Heckathorn, Raquel Vázquez, Rafael M. Diaz, and Richard T. Campbell. 2005. "From Networks to Populations: The Development and Application of Respondent-Driven Sampling Among IDUs and Latino Gay Men." AIDS and Behavior, 9(4):387-402.

Salganik, Matthew J. and Douglas D. Heckathorn. 2004. "Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling." Sociological Methodology, 34:193-238.

Sudman, Seymour, and Graham Kalton. 1986. "New Developments in the Sampling of Special Populations." Annual Review of Sociology 12:401–29.

Thompson, S. K. and O. Frank. 2000. "Model-based estimation with linktracing sampling designs." Survey Methodology, 26(1):87-98.