In previous blog posts (Durvasula 2025b, 2025c, 2025d), I presented a variety of parallel and serial categorical sampling based perception models that were able to derive crucial facts about perceptual behaviour related to observed gradience and also predict an important set of reaction time patterns. The latter have been (nearly?) completely ignored on the literature debating gradience vs.~not in perceptual representations.
In the most recent blog post on the topic (Durvasula 2025d), I presented the results of a parallel-serial model that captures the benefits of independent parallel and serial categorical sampling models, while avoiding their pitfalls.1 The parallel-serial model involves an initial percept of the audio input in terms of N categorical representations (even just two is enough) — this term is called rep in the functions. And then, there is repeated sampling from the initial set till one hits a sequence of M identical parses (a sequence of less than 5 is enough) — this term is called runSize in the functions below. The relevant functions are pasted below for the reader, but really, I encourage you to look at the previous blog posts for detailed discussion.
categoricalPerceiver = function(input, boundary, boundarySD, reps=2){
# Getting the categorical boundaries
CategoricalBoundaries = rnorm(reps, mean=boundary,
sd = boundarySD)
#Creating a data.frame with repetition information.
Results = data.frame(Reps = 1:reps)
#Adding the participant number and their categorical boundary
#Then giving a percept based on the boundary
Results %>%
mutate(Percept = ifelse(input<CategoricalBoundaries,"Cat1","Cat2"))
}
#Output of the categorisation event for each participant
# categoricalPerceiver(30, 30, 10, 5)
RepeatedSamplerTillRunAchieved = function(input=30,
boundary, boundarySD,
outputTotalCount=T,
reps=2,
runSize=4,
PerceiverModel=categoricalPerceiver){
# input=30
# boundary=30
# boundarySD=5
# reps = 2
# runSize = 5
# PerceiverModel=categoricalPerceiver
# Getting the initial set of percepts
initialParse = PerceiverModel(input,boundary,boundarySD,reps)$Percept
#Initialising the counter and the resampling sequence, and then resampling
counter=0
resampledSequence = c()
repeat{
counter = counter + 1
# Getting the last few secondary-percepts to see if there is a run
resampledSequence[counter]=sample(x=initialParse,size=1,replace=T)
# Was there a sufficiently long run?
# but first there needs to be at least as many samples
if(counter>=runSize){
resampledSequenceRun = resampledSequence[(counter-(runSize-1)):counter]
# If there is indeed a run of categories of the desired length, then stop
if(sum(resampledSequenceRun[1]==resampledSequenceRun)==runSize){
# return(list(ResampleSequence=resampledSequence,
return(data.frame(TotalNumberOfIteration=counter,
FinalPercept=resampledSequenceRun[1]))
break
}
}
}
}
# Example output
# RepeatedSamplerTillRunAchieved(input=5,boundary=30,boundarySD=100,reps=2,runSize=10)
SubjectAverager = function(NumSubjects=30,input=30,
boundary, boundarySD,
reps=2,runSize=4,
PerceiverModel=categoricalPerceiver,
SecondaryPerceiverModel=RepeatedSamplerTillRunAchieved){
SubjectSimulation=data.frame(Sub=1:NumSubjects) %>%
group_by(Sub) %>%
nest() %>%
mutate(Results=SecondaryPerceiverModel(input,boundary,boundarySD,reps,
runSize=runSize,PerceiverModel=categoricalPerceiver)) %>%
select(-data) %>%
unnest()
RTAvg = mean(SubjectSimulation$TotalNumberOfIteration)
Cat2PerceptProp = sum(SubjectSimulation$FinalPercept=="Cat2")/nrow(SubjectSimulation)
# Returning the info
data.frame(Cat2PerceptProp,RTAvg)
}
# SubjectAverager(input=5,boundary=30,boundarySD=5,reps=2,runSize=4)
The original thrust of the blog posts was to focus on important phenomenological predictions, i.e., having to do with predicting general patterns or phenomena in (behavioural) data. But in the most recent blog post on the topic (Durvasula 2025d), I also tried to see how well the model fit the gross identification curve based on actual data from Caplan, Hafri, and Trueswell (2021). While the model did a decent job with reaction times, it didn’t get a great fit with the actual perceptual contour in the study — the model’s slope was steeper than actual human performance. To simulate the results from the model, I had used a global categorical boundary mean (43 ms) and a global categorical boundary standard deviation (5.92 ms) — that’s it. So, in a sense, it is quite impressive that such a simple model did so well in the first place, even when compared to actual human data! Today’s objective is to see how much participant-level information do we need to improve the fit between the predicted and observed identification functions — as we will see, not much!
OK, an important caveat: in my opinion, the primary hope of developing theoretical models is to capture phenomena, and not all the nuances (twists and turns) of the actual observed data in experiments. If we start focussing on fitting models to observed data, we are likely to end up simply curve-fitting. Having said that, if the fit is good with minimal additional information added to such general models, then that’s to be appreciated. I will say that I am perhaps veering into curve-fitting territory with this blog post, and that makes me a bit uncomfortable. But, that’s what blog posts are for, exploration.
I will compare three types of parallel-serial models. First, a parallel-serial model with just a global mean and a global standard deviation for the categorical boundary. Such a model claims that, really, all the participants have the same perceptual boundary, and behave exactly the same way in the experiment (with the same cognitive load,…) and hence have the same mean and standard deviation for the categorical boundary. This on its face is unrealistic.
Second, a parallel-serial model with a global mean and a global variance, along with a participant-specific variance, for the categorical boundary. Such a model claims that all the participants have the same perceptual boundary, but are not cognitively burdened by the task in exactly the same way (so, they might have different cognitive loads, amongst other performance factors, on the day of the experiment due to a variety of uncontrolled factors). Given that, as discussed in previous blog posts, cognitive burden can be cashed out as more vs. less variance, we can implement this intuition in the parallel-serial model as participants having the same mean, but different effective variances for the categorical boundary. This one is better than the previous model, but the more I think about it, the less certain I am sure of why participants would have exactly the same category boundary mean all the time, even if they had the same underlying systems.
Third, a parallel-serial model with a participant-specific mean, along with a global standard deviation and a participant-specific standard deviation for the categorical boundary. Such a model claims that all the participants have different perceptual boundaries, but don’t behave exactly the same way in the experiment (with the same cognitive load,…), and could perhaps be affected by different experimental performance factors. Therefore, participants have different means, and variances for the categorical boundary. On the face of it, this implementation seems the most likely.
Important note about the third model: the fact that there are participant-level means in the model by itself doesn’t tell us that we are committing to different underlying systems for speakers. As I noted in previous blog posts on variation (Durvasula 2024, 2025a), this participant-level variation in means could just be an experiment-day effect. To show true independent variation, one needs to show that the participant-level means are consistent across independent experimental attempts — something that I have not really seen in such work.
Caplan, Hafri, and Trueswell (2021) studied, in short, the effect of the timing of lexical information on perceptual retuning. Handily, their paper had identification data on t~d that we can use to compare our model against. They ran a binary t~d identification task on stimuli varying in VOT along with a variety of other conditions. They had a huge number of participants (N=154), so the dataset forms a great way to establish some results without worrying about accidental patterns (at least, for the broad generalisations).
Here is some more info about their stimuli and experiment: “consisting of 162 trials. Participants received new instructions telling them to press a button to decide whether the audio they heard was ta or da. The side of the screen on which the ta and da choices appeared was consistent within each participant but randomized between participants. On each trial, participants were exposed to audio of a CV syllable edited along a continuum between ta and da. After listening to the audio, participants were asked to judge whether the syllable contained t or d. The 162 test trials were divided between two exemplar ta/da tokens and nine VOT levels (20, 30, 40, 45, 50, 55, 60, 70, and 80 ms), with nine repetitions for each exemplar and level (2 × 9 × 9). The order of test items was randomized within a set of nine blocks such that every stimulus was heard once before it was repeated.”
You can see the general identification function later in this post. Here, I plan to jump into the specifics.
As before, let’s first get the actual slopes that participants had for the continua. The distribution of the slopes is shown in the figure below. We can also calculate the slopes for each block of data, so that we can calculate the additional within-participant variance on the day of the experiment.
# Fitting logistic regression models to each participant
# And then plotting the slope estimates
IndividualFits=Data %>%
group_by(uniqueid) %>%
nest() %>%
# Didn't use glmer cos I wasn't sure of the r.e. structure
mutate(Results=map(data,function(x=data){tidy(glm(data=x,t.d_choice~VOT,family="binomial"))})) %>%
select(-data) %>%
unnest()
# To get participant-internal variance
IndividualFitsByBlock=Data %>%
group_by(uniqueid,blockNum) %>%
nest() %>%
# Didn't use glmer cos I wasn't sure of the r.e. structure
mutate(Results=map(data,function(x=data){tidy(glm(data=x,t.d_choice~VOT,family="binomial"))})) %>%
select(-data) %>%
unnest()
# Plotting
IndividualFits %>%
filter(term=="VOT") %>%
filter(estimate<1) %>% #Just seeing what the data looks like without this person
ggplot(aes(x=estimate))+
geom_density()+
xlab("Slope")+
ylab("Distribution of slopes of the identification function")
As before, the slopes can be used to identify the categorical boundary. So, using the slope information obtained above, one can calculate the overall categorical boundary, the overall variance in the categorical boundary, the by-participant categorical boundaries, and the within-participant variance in the categorical boundary.
# Getting category boundaries for each condition and for each block for each participant
IndividualCategoricalBoundaries=IndividualFits %>%
select(uniqueid,term,estimate) %>%
pivot_wider(names_from=term,values_from=estimate) %>%
mutate(categoricalBoundary = -`(Intercept)`/VOT)
# #mean/median categorical boundary
CategoricalBoundarySummaries = IndividualCategoricalBoundaries %>%
group_by() %>%
summarise(meanCategoricalBoundary=mean(categoricalBoundary),
medianCategoricalBoundary=median(categoricalBoundary),
sdCategoricalBoundary=sd(categoricalBoundary))
# Getting category boundaries for each condition and for each block for each participant
# To get within participant variance and by-participant mean of the categorical boundary
IndividualCategoricalBoundariesByBlock=IndividualFitsByBlock %>%
select(uniqueid,term,estimate,blockNum) %>%
pivot_wider(names_from=term,values_from=estimate) %>%
mutate(categoricalBoundary = -`(Intercept)`/VOT) %>%
# ungroup() %>%
# Getting by-participant means and sds
group_by(uniqueid) %>%
summarise(meanCategoricalBoundaryBySubject = mean(categoricalBoundary),
sdCategoricalBoundaryBySubject = sd(categoricalBoundary))
Now, we have all the information we need for simulating the relevant models. Again, as a reminder, I will simulate multiple parallel-serial sampling models, which differ in the amount of participant-level information that is entered into the simulations, that include the following about the categorical boundary.
nSim=200
Since the simulations involve random sampling, any single simulation might accidentally have a good fit to the observed identification function. In order to avoid that issue, we need to simulate the data multiple times and then check the results. Below, I repeated the simulations 200 times. The simulations took about 200 minutes, so I didn’t think doing more was wise, particularly for a blog post! Note, the reason it takes so long is that the original data has about 24k rows, and each row is a single response by a participant for a single token presentation. Therefore, in the simulation, I have to simulate a percept for each of the 24K stimuli. Plus, we are considering 4 models. So, that’s about a 100K. Now, that’s a single repetition. If you do 200 reps, you need about 20 million percept simulations! OK, the real reason is tidyverse functions are slow, and my MacBook ain’t no super computer! Maybe, if I wrote all of this with data.table in R or all of it in C++, I could have done a lot more. Shoulda, coulda, woulda!
The figure below has the average identification curve over multiple simulations for each model. As was seen before in a previous blog post, the first model with an overall mean and variance was not such a great fit. Since increasing the variance will flattened the curve, we could simply increase the variance used and get a flatter identification function. In fact, we can say that there exists a variance such that the fit to actual data is amazing — one could easily find that variance through maximum likelihood estimation or some such jiggery-pokery. And of course, given the complexity of the Caplan, Hafri, and Trueswell (2021)’s experiment design (blocks, the priming bias, source of the stimuli,…), one could easily add up the sources of variance to get to a close approximation of the value that maximises fit. But, if we are trying to explain actual participant-level performance, the real challenge is in independently justifying and measuring the relevant parameters. One could potentially estimate those parameters from the data. However, here, I will take an easier tack.
The other models described above satisfy this hope without being awfully complex, and it’s great that they still offer improved fits over the simple global mean/variance model. If these model was indeed better than the original, then they should also produce participant slopes that are more correlated with observed participant slopes. The original (first) model will not capture any participant-specific patterns, because there is no participant-specific information in it, however, it will be able to describe the general shape of the distribution of participant slopes — and it does this pretty well in this regard. Again, we could have just ended there, but I am trying to something more fun here, cos why not!
SimulatorDifferentModels = function(data=Data,individualCategoricalBoundariesByBlock=IndividualCategoricalBoundariesByBlock,summaryCB=CategoricalBoundarySummaries){
data %>%
select(uniqueid,VOT) %>%
# So that there's a common variable with summaryCB
mutate(All="all") %>%
left_join(summaryCB %>% mutate(All="all")) %>%
left_join(individualCategoricalBoundariesByBlock %>%
select(uniqueid,
meanCategoricalBoundaryBySubject,
sdCategoricalBoundaryBySubject)) %>%
mutate(SimNum = row_number()) %>%
mutate(combinedSD=sqrt(sdCategoricalBoundary^2+sdCategoricalBoundaryBySubject^2),
combinedSD2=sdCategoricalBoundary+sdCategoricalBoundaryBySubject) %>%
group_by(SimNum) %>%
mutate(
# Both means and variances are the same across all participants
FinalPercept = RepeatedSamplerTillRunAchieved(input=VOT,boundary=meanCategoricalBoundary,boundarySD=sdCategoricalBoundary),
# This keeps the same mean, but the variances depend on the participant
FinalPerceptDiffSDs = RepeatedSamplerTillRunAchieved(input=VOT,boundary=meanCategoricalBoundary,boundarySD=combinedSD),
# This keeps the same mean, but the sd/variances depend on the participant
FinalPerceptDiffSDs2 = RepeatedSamplerTillRunAchieved(input=VOT,boundary=meanCategoricalBoundary,boundarySD=combinedSD2),
# Both means and variances depend on the participant
FinalPerceptDiffSDsMeans = RepeatedSamplerTillRunAchieved(input=VOT,boundary=meanCategoricalBoundaryBySubject,boundarySD=combinedSD)
) %>%
unnest() %>%
ungroup() %>%
mutate(Type="Simulated") %>%
rename(Sim_SameMeanSD=FinalPercept,
Sim_SameMeanDiffSDs=FinalPercept1,
Sim_SameMeanDiffSDs2=FinalPercept2,
Sim_DiffMeanDiffSDs=FinalPercept3) %>%
select(uniqueid,VOT,Sim_SameMeanSD,Sim_SameMeanDiffSDs,Sim_SameMeanDiffSDs2,Sim_DiffMeanDiffSDs) %>%
pivot_longer(Sim_SameMeanSD:Sim_DiffMeanDiffSDs,names_to="Type",values_to="FinalPercept") %>%
mutate(Type = factor(Type,levels=c("Sim_SameMeanSD","Sim_SameMeanDiffSDs","Sim_SameMeanDiffSDs2","Sim_DiffMeanDiffSDs"))) %>%
mutate(t_choice = ifelse(FinalPercept=="Cat2",1,0))
}
# nSim=20
Simulations=data.frame(SimulationRep = 1:nSim) %>%
group_by(SimulationRep) %>%
nest() %>%
mutate(Results=map(data,function(.x=data){SimulatorDifferentModels()})) %>%
unnest() %>%
ungroup()
# Plotting the average perceptual contours
Simulations %>%
group_by(uniqueid,VOT,Type,SimulationRep) %>%
summarise(t_choice = mean(t_choice)) %>%
# ungroup() %>%
# filter(SimulationRep==1) %>%
select(uniqueid,VOT,t_choice,Type) %>%
rbind(Data %>%
mutate(Type="Actual") %>%
select(uniqueid,VOT,t_choice,Type)) %>%
mutate(Type = factor(Type,levels=c("Actual","Sim_SameMeanSD","Sim_SameMeanDiffSDs","Sim_SameMeanDiffSDs2","Sim_DiffMeanDiffSDs"))) %>%
group_by(Type, VOT) %>%
summarise(tProp = mean(t_choice)) %>%
ggplot(aes(VOT,tProp,colour=Type),group=Type)+
geom_line()
Now, we could go a bit further. For each set of simulated data, we can re-calculate slopes for each participant and see how well the different models fit each participants’ actual slopes. Below are the distributions of the predicted slope for each participant for each of the models. As you can see, the last three models (as listed above) have slope distributions with similar modes.
# Fitting logistic regression models to each participant
# And then plotting the slope estimates
IndividualFitsGenerated=Simulations %>%
group_by(uniqueid,Type,SimulationRep) %>%
nest() %>%
# Didn't use glmer cos I wasn't sure of the r.e. structure
summarise(Results=map(data,function(x=data){tidy(glm(data=x,t_choice~VOT,family="binomial"))})) %>%
unnest(Results)
# Plotting
IndividualFitsGenerated %>%
select(uniqueid,Type,term,estimate) %>%
rbind(IndividualFits %>% select(uniqueid,term,estimate) %>% mutate(Type="Actual")) %>%
filter(term=="VOT") %>%
group_by(uniqueid,Type) %>%
summarise(meanEstimate=mean(estimate)) %>%
# filter(estimate<1) %>% #Just seeing what the data looks like with this person
ggplot(aes(x=meanEstimate))+
# geom_density(aes(x=estimate,col="red"))+
geom_density()+
xlab("Slopes")+
facet_wrap(~Type)
Finally, we can calculate the correlations between the predicted and observed participant-level slopes. The code below does this. I plotted the slopes extracted for each model against actual participant slopes below. Next, I calculated the Pearson correlation co-efficient. Furthermore, while the relationship appears to be generally monotonic, it was not clearly linear in some of the cases where I simulated the data4; therefore, I also calculated Kendall’s tau for each of the three models. Finally, I added a column with the standard deviations to give you a sense of the range of variation in the correlations for each type. The sophisticates can now calculate the confidence intervals for the correlations if they are interested – as a rule of thumb, one could calculate \(\pm\) 2 sd from the correlation estimates to get rough 95% confidence intervals.
# Getting predicted slopes
FitsActualGenerated=IndividualFitsGenerated %>%
select(uniqueid,term,estimate,Type,SimulationRep) %>%
rename(estimateSimulated=estimate) %>%
filter(term=="VOT") %>%
left_join(IndividualFits %>% select(uniqueid,term,estimate),by=c("uniqueid","term")) %>%
# This eliminates a participant who is just quite extreme compared to the rest.
# This needs to be thought about more.
filter(estimate<1)
# Plot
FitsActualGenerated %>%
group_by(uniqueid,Type) %>%
summarise(Sim_meanEstimate = mean(estimateSimulated),
# Since all the values are the same within the grouping,
# the mean will be the same
estimate = mean(estimate)) %>%
ggplot(aes(Sim_meanEstimate,estimate))+
geom_point()+
geom_smooth(method="lm")+
facet_wrap(Type~.,scales="free")
# Calculation of correlations
FitsActualGenerated %>%
# filter(estimateSimulated>0.01) %>%
group_by(Type,SimulationRep) %>%
summarise(corrPearson = cor(estimateSimulated,estimate),
corrKendall = cor(estimateSimulated,estimate,method="kendall")) %>%
group_by(Type) %>%
summarise(meanPearson = mean(corrPearson),
sdPearson = sd(corrPearson),
meanKendall = mean(corrKendall),
sdKendall = sd(corrKendall))
# A tibble: 4 × 5
Type meanPearson sdPearson meanKendall sdKendall
<fct> <dbl> <dbl> <dbl> <dbl>
1 Sim_SameMeanSD 0.0151 0.0843 0.00864 0.0539
2 Sim_SameMeanDiffSDs 0.322 0.0669 0.257 0.0440
3 Sim_SameMeanDiffSDs2 0.365 0.0530 0.293 0.0368
4 Sim_DiffMeanDiffSDs 0.303 0.0803 0.262 0.0428
Let’s stop and consider this for a second. With just a participant-level variance adjustment, and a global categorical mean, we are able to get an improved (good) fit to the observed identification function! The addition of a participant-level mean doesn’t really improve the predictions for the participants, once confidence intervals are taken into consideration. So, the additional subject-level mean parameters aren’t getting us anything useful.
There is some improvement in fit5 in both the overall identification function and individual participant slopes when we use the wrong formula to calculate combined standard deviation (namely, SD(A+B) = SD(A) + SD(B)). I can’t think of a reason of why this would happen that I can confirm right now. The only thing I can think of is that the wrong formula is getting us closer to estimating the sum of the different sources of variance in the experiment, that I hinted at earlier in the experiment. This will require more estimation of the relevant experimental manipulations (blocks, priming bias, …), and I might return to this in a later blog post. If the reader has any other thoughts on the matter, please share away.
Along with giving a succinct explanation of the observed identification function, parallel-serial categorical models of perception make fantastic and novel predictions about behavioural data (particularly, the patterns of reaction time data). Such models should be pursued just for this very reason.
But, the itch to curve fit is there in all of us. And in this blog post, I have really scratched that itch and observed that, with just a participant-level variance adjustment, and a global categorical mean, we are able to get a good fit to the observed overall identification function and a not-bad fit to individual-level performance. The participant-level differences in the variance associated with the categorical boundaries itself can be attributed to participants not being in experiments with the same attention, focus, working memory capacity, … So there is a coherence to this model that I enjoy.
Old rant alert! A lot of people make a big deal about observed variation between participants in experiments and leap to the conclusion that the underlying cognitive systems (grammars, parsers, parametric knowledge, …) are somehow different (see Durvasula 2024; Durvasula 2025a, for more discussion). In reality, even if all the participants had exactly the same system, we can still get the variation simply as a by-product of a certain amount of noise in the generation which is the same across all the participants, and variance due to experiment-day performance factors that a participant might face. Add to that, that participants can also have random fluctuations in means that over different days, then heck, showing evidence of individual-level variation is quite tough.
One note of caution: in generating participant-level slopes, I first fitted participant-level slopes to actual observed data; used that to extract category boundary information; used that to generate the predicted percept for each stimulus; finally, calculated the predicted slopes for each participant. I believe this is not circular reasoning, but a little part of me still wonders if it is, and if I am missing a nuance of the issue. If a reader has some thoughts on the issue, please let me know.
Reminder, this last model was Spencer Caplan’s wonderful innovation over the original (independently, serial or parallel) categorical models I presented in previous blog posts.↩︎
If there are two independent variables, here overall noise (A) and participant-level noise (B), then the total variance of A+B, Var(A+B) = Var(A) + Var(B). Furthermore, for any random variable, Var(X) = SD(X)2. Therefore, \(SD(A+B)\ =\ \sqrt{(SD(A))^{2} + (SD(B))^{2}}\). This is the reason for the calculation of combinedSD in the model.↩︎
This uses the incorrect formula of SD(A+B) = SD(A) + SD(B) for the calculation of combinedSD2 in the modelling. I first chanced upon this through an error. But, the calculation maintains the same intuition, and remarkably for some reason I don’t understand, this model does better than the rest!↩︎
Note, since, this is written in RMarkdown, the final simulations may not have this issue.↩︎
Consistently across every set of simulations I ran, in fact.↩︎
For attribution, please cite this work as
Durvasula (2025, Aug. 20). Karthik Durvasula: Perception as sampling with categorical representations. Part 4. Retrieved from https://karthikdurvasula.gitlab.io/posts/2025-08-20-Perception as repeated sampling part 4/
BibTeX citation
@misc{durvasula2025perception, author = {Durvasula, Karthik}, title = {Karthik Durvasula: Perception as sampling with categorical representations. Part 4}, url = {https://karthikdurvasula.gitlab.io/posts/2025-08-20-Perception as repeated sampling part 4/}, year = {2025} }