Individual Variation?

Phonetics, Statistics, Experimentation
Karthik Durvasula
2024-02-14

Individual variation has become an important issue in Phonetics and Sociophonetics over the last 10-15 years.1 A typical experiment proceeds with measuring some phonetic dimension in the production (F1, F2, vowel duration,…) or some perceptual response to stimuli varying along a phonetic dimension (in an identification/discrimination/… task), and then the researcher notices that everyone doesn’t behave the same — there is a range of mean values instead, and there are some who stand away from the others. “Eureka! there is individual variation, let’s say something more about it in the paper”. Worse still, “let’s try to interpret this as something important.” Even worse still, reviewers want you to explain the variation.

However, participant mean responses will vary in any experiment. Unless the variance of the random variable under study is zero2, there will be participant variation in any experiment.3 Furthermore, if the variance is sufficiently high (as is typically the case in phonetic or sociophonetic measurements), and the sample size is sufficiently small, it is easy to get what look like participants that are quite far.4

Let me show these (somewhat obvious) points with some simulations. Imagine there are 10 participants in your experiment and that each of them has exactly the same underlying distribution, and you sample 10 values for each speaker and calculate the mean for each speaker. It is obvious that you will get a distribution of mean values, even if everyone has the same underlying distribution. Furthermore, it looks like there are two groups even, but this is just superficial — we know they have the same underlying distribution as the others, and that the differences are purely due to randomness.5

#Set the randomisation seed to make it replicable
set.seed(1000)

#Assigning relevant values
numValuesPerSpeaker = 10
numSpeakers = 10

#Generating a mean for each speaker 
speakerMeans = replicate(n=numSpeakers, 
                         mean(rnorm(n=numValuesPerSpeaker, mean=60, sd=40)))

#Histogram
hist(speakerMeans,
     xlab="Speaker means for some phonetic dimension (say duration)",
     main=NULL)

Of course, if you had many more measurements per speaker, then such an observation is less likely. But, crucially, it is far from impossible. In fact, it is not impossible for a few reasons. First, that is how random variation works, and this is what is happening in the figure below. But, such differences are easier to see if the variance of the underlying distribution is sufficiently large.

set.seed(1000)
numValuesPerSpeaker = 100
numSpeakers = 10
speakerMeans = replicate(n=numSpeakers, 
                         mean(rnorm(n=numValuesPerSpeaker, mean=60, sd=40)))

hist(speakerMeans,
     xlab="Speaker means for some phonetic dimension (say duration)",
     main=NULL)

Second, a small number of participants exacerbates the issue, but it could happen even with large sample sizes, if you are unlucky. Below, I increased the number of participants to 100, and you still see at the right tail of the distribution that someone “looks different” from the distribution of the other participants.

set.seed(1000)
numValuesPerSpeaker = 100
numSpeakers = 100
speakerMeans = replicate(n=numSpeakers, 
                         mean(rnorm(n=numValuesPerSpeaker, mean=60, sd=40)))

hist(speakerMeans,
     xlab="Speaker means for some phonetic dimension (say duration)",
     main=NULL)

Third, within-participant measurements are not independent. On any experiment day, there could be a variety of reasons why a participant’s measurements are clustered away from the others though the participant has the same underlying distribution as the others. For example, let’s say that we are measuring durations, and the participant just happens to be very tired. And let’s say that the effect of being tired on durations is that they are lengthened, then it is easy to get much longer durations for that participant even if they have the same underlying distribution for the category as everyone else.

So, how do we identify individual variation? The only legit way to do it is to replicate the experiment (multiple independent times?) on the same participants and try to ensure that any biasing factors are missing during the replications. If the differences between the participants sustain, then that is interesting. You can see immediately that establishing individual variation is a LOT more work! This is essentially the point that Senn (2014) tried to make in the context of arguing against the current evidence in favour of personalised medicine.6

OK, the above might be obvious, but wait, there is more. The relationship between perception and production is another popular issue in Phonetics and Sociophonetics. The typical experimental question is whether there is a correlation between the production pattern of a speaker for some phonetic measure and the perception of the same by the speaker. This sub-sub-field has a bunch of null results along with some “positive” results. Why might that be the case? Let’s say that the speakers/listeners in an experiment in fact use the same underlying distribution for production and perception. They will still exhibit random variation; and if one correlates the results of the random variation in production with those in perception, it is likely to result in a null result, even if all the speakers have the same distribution and if the distribution used in production and perception is identical.

So, how does one go about solving this problem related to the production-perception link? We have two options. First, we can estimate what the production and perception distributions (including mean tendencies) are for each speaker through multiple independent experiments, and correlate those estimated values. This is what I mentioned above, but this has not been done to my knowledge. Second, we can look at groups of speakers who we know have different production distributions (either different dialects, or different languages), and then test if their perceptions for the same cue differ. This has been done way too many times, and there is a consistent difference in such cases. So, I am often left wondering why the question of “is there a link between perception and production?”7 is still being debated.

Bottomline, the next time you see variation between speakers in an experiment, ask yourself Is this random variation or variation due to other factors or true variation in the underlying system under study, and how could I tell them apart? And the next time you think about the production-perception link, ask yourself What is the right experiment to run and do we really need to?

Of course, this problem, like every other problem, is worsened by the multiple comparisons problem — if you try to look at your data multiple ways, and then present only some of the analyses that “work out”, i.e., have a sufficiently low p-value. But more on that another time.

Senn, Stephen. 2014. “Mastering Variation: Variance Components and Personalised Medicine.” Statistics in Medicine 35 (7): 966–77. https://doi.org/10.1002/sim.6739.

  1. The issues I discuss here logically apply to all linguistic sub-disciplines, as it is a general methodological point, but I will stick to domains I know something about.↩︎

  2. Then is it really a random variable anymore?↩︎

  3. Thank you sampling distribution!↩︎

  4. These are sometimes even labelled outliers without any independent basis.↩︎

  5. Which could just be a proxy for our ignorance about other factors of course.↩︎

  6. The proper experiments have simply not been run to establish the need for personalised medicine in his opinion.↩︎

  7. I take this question here to mean, is the distribution for any random variable or phonetic measure used in production the same as that in perception for each speaker↩︎

References

Citation

For attribution, please cite this work as

Durvasula (2024, Feb. 14). Karthik Durvasula: Individual Variation?. Retrieved from https://karthikdurvasula.gitlab.io/posts/2024-02-14-individual-variation/

BibTeX citation

@misc{durvasula2024individual,
  author = {Durvasula, Karthik},
  title = {Karthik Durvasula: Individual Variation?},
  url = {https://karthikdurvasula.gitlab.io/posts/2024-02-14-individual-variation/},
  year = {2024}
}