Follow us on Instagram
Try our daily mini crossword
Play our latest news quiz
Download our new app on iOS/Android!

How sure can we be about COMBO results?

As a statistician and methodologist, I take a critical view of the entire process of survey research, from drawing a sample, to constructing a questionnaire to analyzing the data. Conducting a good survey is a much more difficult process than most people realize, and there are difficulties at every step of the process. The COMBO questionnaire was, in general, very well constructed. The questions were easy to follow, and they were neither leading nor vague. The answer categories were also generally well-constructed. The categories were mutually exclusive and exhaustive - meaning that everyone could pick one and only one answer to each question - and in general, the categories were meaningfully different. That is, the answer categories captured important distinctions between people.

Two of the most important questions in the survey, however, are somewhat problematic. First, the income item asked respondents to choose an income range that best represented their family's income in the previous year. The ranges for the most part coincided with standard of living differences between persons at different income levels in the United States. For example, the first category was "under $25,000," and the second was "$25,001-$50,000." These two categories represent different standards of living. Yet, two of the higher income categories may be too broad to be meaningful: $150,000-$250,000 and $250,000-$500,000. There is a different standard of living at the bottom and top ends of each of these categories.

ADVERTISEMENT

While I was glad to see a break at $250,000, since that is the often-touted break-point for being "rich" in the United States (households making more money than that make up the top 1.5 percent of American families), the standard of living for those with incomes of $150,000 is very different from that of those at $250,000, and the standard of living for those with incomes of $500,000 is very different from that of families making $250,000. Both of these large categories should have been split into smaller ones in an effort to better capture the standard of living differences between income levels. In addition, the survey should have asked respondents their families' home state, as incomes and accompanying standards of living vary considerably from state to state. In parts of the South, $50,000 will buy a small house; not so in New Jersey.

Second, self-assessed social class, while interesting, must be considered with caution. Asking respondents to self-assess produces a very unreliable measure of social class. For example, in the General Social Survey, a biannual survey of adults in the United States, more than 90 percent of respondents claimed to be middle class. As a result, there was only a weak relationship between self-assessed social class and income. Of course, one could argue that social class is about more than just income - it is also about consumption patterns and power - but with the COMBO survey (and others like it), when class differences are discussed, the real focus is on earnings differences. It is unclear, given that the survey asked a more reliable question (income), why it asked the self-assessment question, and why the report's discussion of the results doesn't center on income.

These more minor issues aside, the biggest problem with the survey is the sample. According to the report, an e-mail requesting participation was sent to the entire student body. Roughly 30 percent responded. The consequences of this low response rate for the validity of the results cannot be overstated. We cannot rule out that non-responders differ from responders in significant ways that make the sample unrepresentative of the larger population from which the sample was drawn. If every one of the 70 percent of students who did not complete the survey were from a single class, the class distribution in the sample would differ drastically from that of the general Princeton population.

The report attempts to argue that the sample seems representative of the student body because, for example, the racial composition of the sample is similar to the racial composition of the student body. This approach, however, is misleading because individuals are multidimensional: Not all people of the same race are interchangeable. For the sample to be representative, the joint distribution of every characteristic of respondents must be similar to the joint distribution in the population. Unfortunately, this is virtually impossible to evaluate, which is why random sampling is necessary: It gives us a basis for knowing the probability that the sample does not match the population. The non-random sample obtained by the COMBO study, in contrast, gives us no basis for determining how representative it is.

In sum, the COMBO study is interesting, but its value is limited because the sample is not random, and we therefore cannot know whether the results are reflective of the entire student body.

Scott Lynch is an associate sociology professor and is teaching SOC 301: Sociological Research Methods this semester. He can be reached at slynch@princeton.edu.

ADVERTISEMENT