9.5 Complexities in quantitative measurement

Matthew DeCarlo

44 9.5 Complexities in quantitative measurement

Learning Objectives

Define and provide examples for the four levels of measurement
Identify potential sources of error
Differentiate between systematic and random error

For quantitative methods, you should now have some idea about how conceptualization and operationalization work, and you should also know how to assess the quality of your measures. But measurement is sometimes a complex process, and some concepts are more complex than others. Measuring a person’s political party affiliation, for example, is less complex than measuring their sense of alienation. In this section, we’ll consider some of these complexities in measurement. First, we’ll take a look at the various levels of measurement that exist, and then we’ll consider how measures can be subject to bias and error.

Levels of measurement

When social scientists measure concepts, they sometimes use the language of variables and attributes. A variable refers to a grouping of several characteristics. Attributes are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable’s attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of measurement are categorical, meaning their attributes are categories rather than numbers. The latter two levels of measurement are continuous, meaning their attributes are numbers, not categories.

Hair color is an example of a nominal level of measurement. Nominal measures are categorical, and those categories cannot be mathematically ranked. As a brown-haired person (with some gray), I can’t say for sure that brown-haired people are better than blonde-haired people. There is no ranking order between hair colors. They are simply different. That is what constitutes a nominal level of measurement. Gender and race are also measured at the nominal level.

But what attributes are contained in the variable hair color? Blonde, brown, black, and red are common colors. However, if we listed only these attributes, my wife, who currently has purple hair, wouldn’t fit anywhere. That means our attributes were not exhaustive. Exhaustiveness means that all possible attributes are listed. We may have to list a lot of colors before we can meet the criteria of exhaustiveness. Clearly, there is a point at which exhaustiveness has been reasonably met. If a person insists that their hair color is light burnt sienna, it is not your responsibility to list that as an option. Rather, that person would reasonably be described as brown-haired. Perhaps listing a category for other color would suffice to make our list of colors exhaustive.

What about a person who has multiple hair colors at the same time, such as red and black? They would fall into multiple attributes. This violates the rule of mutual exclusivity, in which a person cannot fall into two different attributes. Instead of listing all of the possible combinations of colors, perhaps you might include a multi-color attribute to describe people with more than one hair color.

The discussion of hair color elides an important point with measurement—reification. You should remember reification from our previous discussion in this chapter. For many years, the attributes for gender were male and female. Now, our understanding of gender has evolved to encompass more attributes including transgender, non-binary, or genderqueer. Children of parents from different races were often classified as one race or another, even if they identified with both cultures equally. The option for bi-racial or multi-racial on a survey not only more accurately reflects the racial diversity in the real world but validates and acknowledges people who identify in that manner.

Unlike nominal-level measures, attributes at the ordinal level can be rank ordered. For example, someone’s degree of satisfaction in their romantic relationship can be ordered by rank. That is, you could say you are not at all satisfied, a little satisfied, moderately satisfied, or highly satisfied. Note that even though these have a rank order to them (not at all satisfied is certainly worse than highly satisfied), we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute.

This can get a little confusing when using Likert scales. If you have ever taken a customer satisfaction survey or completed a course evaluation for school, you are familiar with Likert scales. “On a scale of 1-5, with one being the lowest and 5 being the highest, how likely are you to recommend our company to other people?” Sound familiar? Likert scales use numbers but only as a shorthand to indicate what attribute (highly likely, somewhat likely, etc.) the person feels describes them best. You wouldn’t say you are “2” more likely to recommend the company. But you could say you are not very likely to recommend the company. Ordinal-level attributes must also be exhaustive and mutually exclusive, as with nominal-level variables.

a cartoon of a person choosing between a happy, neutral, and sad face

At the interval level, attributes must also be exhaustive and mutually exclusive. As well, the distance between attributes is known to be equal. Interval measures are also continuous, meaning their attributes are numbers, rather than categories. IQ scores are interval level, as are temperatures. Interval-level variables are not particularly common in social science research, but their defining characteristic is that we can say how much more or less one attribute differs from another. We cannot, however, say with certainty what the ratio of one attribute is in comparison to another. For example, it would not make sense to say that 50 degrees is half as hot as 100 degrees.

Finally, at the ratio level, attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point. Thus, with these variables, we can say what the ratio of one attribute is in comparison to another. Examples of ratio-level variables include age and years of education. We know, for example, that a person who is 12 years old is twice as old as someone who is 6 years old. The differences between each level of measurement are visualized in Table 9.1.

Table 9.1 Criteria for Different Levels of Measurement
	Nominal	Ordinal	Interval	Ratio
Exhaustive	X	X	X	X
Mutually exclusive	X	X	X	X
Rank-ordered		X	X	X
Equal distance between attributes			X	X
True zero point				X

Challenges in measurement

Unfortunately, measures never perfectly describe what exists in the real world. Good measures demonstrate validity and reliability but will always have some degree of error. Systematic error causes our measures to consistently output incorrect data, usually due to an identifiable process. Imagine you created a measure of height, but you didn’t put an option for anyone over six feet tall. If you gave that measure to your local college or university, some of the taller members of the basketball team might not be measured accurately. In fact, you would be under the mistaken impression that the tallest person at your school was six feet tall, when in actuality there are likely people taller than six feet at your school. This error seems innocent, but if you were using that measure to help you build a new building, those people might hit their heads!

A less innocent form of error arises when researchers using question wording that might cause participants to think one answer choice is preferable to another. For example, if I were to ask you “Do you think global warming is caused by human activity?” you would probably feel comfortable answering honestly. But what if I asked you “Do you agree with 99% of scientists that global warming is caused by human activity?” Would you feel comfortable saying no, if that’s what you honestly felt? I doubt it. That is an example of a leading question, a question with wording that influences how a participant responds. We’ll discuss leading questions and other problems in question wording in greater detail in Chapter 11.

two cartoon people, one with question marks emanating from their head another with lightbulbs

In addition to error created by the researcher, your participants can cause error in measurement. Some people will respond without fully understanding a question, particularly if the question is worded in a confusing way. That’s one source of error. Let’s consider another. If we asked people if they always washed their hands after using the bathroom, would we expect people to be perfectly honest? Polling people about whether they wash their hands after using the bathroom might only elicit what people would like others to think they do, rather than what they actually do. This is an example of social desirability bias, in which participants in a research study want to present themselves in a positive, socially desirable way to the researcher. People in your study will want to seem tolerant, open-minded, and intelligent, but their true feelings may be closed-minded, simple, and biased. So, they lie. This occurs often in political polling, which may show greater support for a candidate from a minority race, gender, or political party than actually exists in the electorate.

A related form of bias is called acquiescence bias, also known as “yea-saying.” It occurs when people say yes to whatever the researcher asks, even when doing so contradicts previous answers. For example, a person might say yes to both “I am a confident leader in group discussions” and “I feel anxious interacting in group discussions.” Those two responses are unlikely to both be true for the same person. Why would someone do this? Similar to social desirability, people want to be agreeable and nice to the researcher asking them questions or they might ignore contradictory feelings when responding to each question. Respondents may also act on cultural reasons, trying to “save face” for themselves or the person asking the questions. Regardless of the reason, the results of your measure don’t match what the person truly feels.

So far, we have discussed sources of error that come from choices made by respondents or researchers. Usually, systematic errors will result in responses that are incorrect in one direction or another. For example, social desirability bias usually means more people will say they will vote for a third party in an election than actually do. Systematic errors such as these can be reduced, but there is another source of error in measurement that can never be eliminated, and that is random error. Unlike systematic error, which biases responses consistently in one direction or another, random error is unpredictable and does not consistently result in scores that are consistently higher or lower on a given measure. Instead, random error is more like statistical noise, which will likely average out across participants.

coffee spilled on papers

Random error is present in any measurement. If you’ve ever stepped on a bathroom scale twice and gotten two slightly different results, maybe a difference of a tenth of a pound, then you’ve experienced random error. Maybe you were standing slightly differently or had a fraction of your foot off of the scale the first time. If you were to take enough measures of your weight on the same scale, you’d be able to figure out your true weight. In social science, if you gave someone a scale measuring depression on a day after they lost their job, they would likely score differently than if they had just gotten a promotion and a raise. Even if the person were clinically depressed, our measure is subject to influence by the random occurrences of life. Thus, social scientists speak with humility about our measures. We are reasonably confident that what we found is true, but we must always acknowledge that our measures are only an approximation of reality.

Humility is important in scientific measurement, as errors can have real consequences. At the time of the writing of this textbook, my wife and I are expecting our first child. Like most people, we used a pregnancy test from the pharmacy. If the test said my wife was pregnant when she was not pregnant, that would be a false positive. On the other hand, if the test indicated that she was not pregnant when she was in fact pregnant, that would be a false negative. Even if the test is 99% accurate, that means that one in a hundred women will get an erroneous result when they use a home pregnancy test. For us, a false positive would have been initially exciting, then devastating when we found out we were not having a child. A false negative would have been disappointing at first and then quite shocking when we found out we were indeed having a child. While both false positives and false negatives are not very likely for home pregnancy tests (when taken correctly), measurement error can have consequences for the people being measured.

Key Takeaways

In social science, our variables can be one of four different levels of measurement: nominal, ordinal, interval, or ratio.
Systematic error may arise from the researcher, participant, or measurement instrument.
Systematic error biases results in a particular direction, whereas random error can be in any direction.
All measures are prone to error and should interpreted with humility.

Glossary

Acquiescence bias- when respondents say yes to whatever the researcher asks
Attributes- are the characteristics that make up a variable
Categorical measures- a measure with attributes that are categories
Continuous measures- a measures with attributes that are numbers
Exhaustiveness- all possible attributes are listed
False negative- when a measure does not indicate the presence of a phenomenon, when in reality it is present
False positive- when a measure indicates the presence of a phenomenon, when in reality it is not present
Interval level- a level of measurement that is continuous, can be rank ordered, is exhaustive and mutually exclusive, and for which the distance between attributes is known to be equal
Leading question- a question with wording that influences how a participant responds
Likert scales- ordinal measures that use numbers as a shorthand (e.g., 1=highly likely, 2=somewhat likely, etc.) to indicate what attribute the person feels describes them best
Mutual exclusivity- a person cannot identify with two different attributes simultaneously
Nominal- level of measurement that is categorical and those categories cannot be mathematically ranked, though they are exhaustive and mutually exclusive
Ordinal- level of measurement that is categorical, those categories can be rank ordered, and they are exhaustive and mutually exclusive
Random error- unpredictable error that does not consistently result in scores that are consistently higher or lower on a given measure
Ratio level- level of measurement in which attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point
Social desirability bias- when respondents answer based on what they think other people would like, rather than what is true
Systematic error- measures consistently output incorrect data, usually in one direction and due to an identifiable process
Variable- refers to a grouping of several characteristics

Image attributions

user satisfaction by mcmurryjulie CC-0

question by jambulboy CC-0

mistake by stevepb CC-0

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License