Casey L. Roark, PhD — University of Pittsburgh, Department of Communication Science & Disorders Center for the Neural Basis of Cognition
Accurate and precise measurement of behavior is critical for understanding human cognition. In experimental contexts, we collect behavioral measures from participants such as reaction times of button presses. My work investigates how we learn to group complex objects in the sensory world into different categories. I leverage computational models of behavior that capitalize on reaction time information to reveal psychologically meaningful and distinct cognitive processes. Critically, these computational models rely on accurate and precise measurements of reaction time. Over the past two years, my lab has leveraged the Gorilla Experiment Builder and online recruitment strategies to better understand individual differences in learning and decision making. In this talk, I will discuss how online research has enabled examination of the mechanisms of time-sensitive learning and decision making in a diverse, global population.
Casey Roark 0:00
Okay. Oh, I’m sure…there we go. Okay. So all right, so I’m gonna give talk to you a little bit about my research on learning and decision making, and how I have leveraged online methods to study that. So first, I’m going to give a brief overview of the type of work that I do given this broad audience that we have here. And then I’m going to give you details about a specific study that I first conducted in person, and then used online research methods to conduct a replication in a wider sample.
Right. So my work focuses on how we learn about the sensory world, and particularly how we organise that world into categories. So your knowledge of the category dog enables you to quickly identify this creature here as a dog. And similarly, you know that this creature is also a dog, even though it has this fancy little zebra print coat on it. Alright, so what about this creature? What is this, that might have taken you just a split second longer to recognise that this is actually a horse and not a zebra, because even though it has this pattern on its coat, it is in fact a horse underneath there. And finally, you can quickly identify this last creature as a zebra, based on its coat and other features.
So you’re able to leverage your existing category knowledge to generalise to things that you’ve probably never seen before, like this dog or a horse in a zebra print jacket. And this seems maybe slightly trivial, but a computer trying to solve this problem, or a child, infant, might have trouble telling us that this is a horse or a dog instead of a zebra because of their other visual similarities. And so as humans, we can achieve these remarkable feats of generalisation that machines, for example, find very difficult.
And so categorization is also not limited to only the visual modality, we use categories and sounds as well. And so categorization allows you to listen to my voice, even if you’ve never heard me speak before. And when I say the words, bear and pair, you can recognise these as different words that map onto these different meanings, even though they really only differ in this first sound, the buh versus puh sound. And we’re able to do this remarkably flexibly across different speakers across different contexts. And so categorization is really at the heart of these fundamental processes like object recognition, individual modality, and speech perception in the auditory modality.
And I’m particularly interested in how we learn about new categories. So for example, if you wanted to take up a bird watching hobby, you might need to learn to distinguish between these two different species of birds, which are a house finch and a purple Finch. Similarly, if you’re learning a new language, you need to learn about the sounds of that language, which might be different from your own. So for instance, native speakers of non tonal languages like English, would need to learn to distinguish between tonal pitch patterns to distinguish words in tonal languages, like Mandarin Chinese. So for example, in Mandarin Chinese, you have the same syllable here, I’m showing /ma/ mapped with four different tone patterns, which completely changes the meaning of that underlying word. And just to give you an example of what the sounds like, I hope the sound is coming through now, here is an example of this first tone, it’s just high and stable tone over time, Ma. And then the second tone is a rising tone over time, Ma.
Alright, so how do we study this in an experimental context. So in a kind of very kind of pared down version of this kind of interest, it’s not as gamified as some other tasks we’ve heard about today, we would play in an odd, for example, a sound from a particular kind of category. So I use these kind of alien like sounds that are kind of interesting for people to hear, and again, hoping the sound is coming through like that. And people make these overt choices about what category they think that belongs to. So in this case, deciding is this category one or two, and then they get some kind of feedback about the response. So correct or incorrect. And people might also do this in a visual kind of task, which I’ll talk about more today. So seeing an image like this kind of arbitrary image, just showing you on the screen that varies in the width and orientation of these lines. And then they’re making these over decisions and getting that feedback.
Alright, so then what we can do is look at people’s ability to learn categories in these contexts. And so they’re learning to make more accurate decisions, given the feedback that they’re getting. So here I’m showing you the proportion, correct or accuracy across blocks of a training task, where we train people on these auditory and visual categories. So the auditory is in red and visual in blue, you can see that overall on average, and the darker line here, people are able to learn these categories. And then in the lighter lines, what I’m showing you is individual participant performance. So you can see there’s lots of very ability and how well people are able to learn with some people up here in really high levels of performance, and others around this dashed line, which reflects chance levels of performance.
And so we can also look at other aspects of their behaviour to understand that the psychological processing is going on as people are learning. So one of these is in their reaction time or how fast they respond. So this is measured in milliseconds here. And it’s just the time it takes them to actually push the button to identify what category they think that either sound or image belong to. And so we can see here that our participants were slightly slower in the visual task and this kind of early blocks. So the blue line here is higher than the red line. But these kind of converge over time as the learning task goes on.
Okay, so what we really want to understand is what this information about people’s choices, and the reaction times can tell us about what’s going on psychologically, in learners minds as they’re doing these tasks. So to understand this, we leverage computational models called drift diffusion models that take into account both how accurate decisions are, and also how fast these decisions are to estimate separable psychological processes in decision making. So I’ll give you a sort of toy example here to kind of just explain the logic of these models. So when you saw this creature earlier, you made again, this probably split second decision about whether this was a horse or a zebra, but it was still a decision that you had to make. And so we can think of this, as soon as you see this image, this decision process starts unfolding across time. So we start kind of accumulating evidence towards either deciding whether this is a horse or a zebra.
So let’s say you probably start a little closer to making the zebra sort of decision, because I just showed you, the dog and the zebra print jackets, maybe I primed you slightly. But then as you get more and more information from this image, seeing, okay, maybe it’s just this got this like a weird flap going on here, this is not a real zebra, this has to be a horse, you’re going to shoot up and evid- you accumulate the evidence towards making that decision. This is definitely a horse, not a zebra. And so we see this process through these models as the accumulation of evidence towards these contrasting choices. So here horse or zebra, in the categorization context, category one or category two. And then you make a decision when to cross a threshold of evidence that you need to actually accumulate. So once you get enough information that this was a horse, that’s when you make your decision.
So again, just being really explicit about how this works in our kind of more arbitrary tasks, where we either play a sound or show an image where people are deciding this category, we see this process unfolding across time. So they’re accumulating evidence towards a particular decision, let’s say category one, in this case, at a particular rate. So basically, how fast they’re getting information from that stimulus represents kind of how easy it is for them to kind of get information to inform their decision. And then again, they’re going to try to reach this decision thresholds. And once they reach that threshold in this evidence accumulation process, that’s when they’re actually going to initiate their response actually start the process of pressing the button, which is reflected in this dashed line here.
So this is that underlying process that we are trying to estimate using this modelling approaches. And we’re going to look at how participants learn this. And to distinguish between two different categories, we can estimate these kind of parameters here at the individual subject level, and also longitudinally across blocks as they are learning. Right, so then, let me show you what we found here for this auditory and visual task in the lab. So here at first, I’ll show you this parameter of evidence accumulation rate, again, how fast they’re able to get the information, they need to make the decision about that stimulus. And so higher values here are representing kind of more efficient evidence accumulation. So you’re getting information a lot more quickly and efficiently as a process. And then here’s what that looks like for the auditory and visual tasks. So we see here sort of this crossover, where initially in our visual task, participants are less efficient at getting information than they are in the auditory tasks, with its crosses over across time. And by the end of training, they’re more efficient in the visual domain than the auditory domain.
We can also look at this other parameter, we’ve talked about this decision threshold. And so here we can define these parameters based on whether they were more cautious or less cautious in their responses. So higher values here are reflecting times where participants are waiting to gather enough information. So for example, they’re looking at that horse longer and longer to make sure that they really have it right that it’s a horse and not a zebra. So here again, we’re seeing the sort of crossover between the two modalities where initially participants are more cautious with the auditory modality and they show this sort of steep decline and in how cautious they are about that process as their accuracy increases across these different blocks.
All right, so all of this is really about online research, right. And I’ve just talked to you about in person research. So I want to show you now how we have leveraged online data collection through Gorilla to rapidly and efficiently collect data to replicate this in person study in a wider online sample. So in person, we ran this study on 30 participants, and online, we were able to run in nearly 100 participants. And just to give you a sense of how much time this took us in person, with a dedicated person there to run the study, it took about a month to collect this data, versus data collected via Gorilla in under 48 hours, so extremely, extremely fast. And the in person study, we were limited to our local population in Pittsburgh, Pennsylvania in the US, whereas in our online replication, we were able to get a global population through Prolific specifically.
And then finally, in the lab, in person, we ran participants on our kind of controlled laboratory computers and professional level headphones. In environments, we could ensure were extremely quiet, where it’s online, sorry, online, we ran participants on their own computers and using their own headphones, which are obviously have a more variety of quality compared to our in person study. Right, then we can talk about what actually happened in this online replication. So as a reminder, this is what our in person study looked like with accuracy, and our individual differences across participants. And then this is what the online study looked like. So you can see here, there’s more participants and more of these lighter lines here. But generally, we’re seeing the same kind of pattern of accuracy, we don’t see a lot of differences between modalities. And people generally are able to learn.
Then we can also look at this measure of reaction time that we looked at. So this is our in person study. And then this is what it looks like online. So immediately, I’ll note that the scale here has changed for reaction time. So where it’s in here, we’re in the sub second sort of range in our in person study, on average, we have some folks who are kind of getting up above one second, and even in this case above two seconds to respond, on average on a trial. And so we’re seeing a lot more variability in how the reaction times look over time, we still have plenty of folks here who are responding very quickly.
All right, then what does our decision processes assessed by these drift diffusion models? What are those look like? So again, we’re looking at our evidence accumulation rate on our in person sample. And this is what our online sample looks like. So seeing still that crossover across modalities, and really very similar pattern across in person and online. And then we have our decision threshold, again, our in person seeing a different sort of pattern of crossover here between the modalities. And then this is what we see online.
So we effectively perfectly replicated these results. And this is really exciting and meaningful. Because these folks were different from our in person sample. Again, this is a global population, people were using their own machines, their own headphones, we saw that overall, they were slower in a lot of cases. But they learned just as well as people who were seated in there kind of a quiet control level of environment. And yet, we’re still seeing the same patterns of the psychological processes through these drift diffusion models. And this also really tells us that using gorilla to collect these reaction time measures is capturing information about the psychological processes that we see inside the lab as well, using just different software that we’ve used across the years.
So I’ll just briefly kind of summarise the data benefits of collecting data online that we saw both in this experiment and what I’ve seen in my research in general. So first, as we’ve talked about, I’ve seen this ability to replicate in samples outside of psychology, subject pools, or otherwise homogenous samples that we see often inside of the lab or in a limited ability to kind of collect the data across, you know, a broad sample population. And so this is both in the study that I’ve discussed in detail today. But also another study looking at incidental category learning in multiple experiments in this other citation that I have here.
Online Data Collection has also really enabled seamless collaboration across the world. So I have colleagues in Hong Kong who are able to access both the stimulus materials and also data and were able to access to extremely simply to be able to collaborate really easily rather than sending files back and forth or sharing it in some other way. That does get a bit clunky here, we can just work on it in the same platform.
And then finally, this gives us really the ability to recruit samples with more diverse experiences. So obviously I’ve mentioned Just kind of ability to look at the global population. But something else that we particularly looked at in this specific study that I’ve listed here is looking at people with a diverse array of music experiences. So we just kind of looked at a sample, not really specifically sampling for music experience, but kind of just seeing what happens when you look at kind of just a broad sample of individuals with music experience. And that’s really something that’s only able to do with online research. Because it’s, you get a more diverse sample that way. All right. And this is, again, just all really important so that we can examine things like learning and decision making efficiently and using these more general populations. And with that, I really want to thank you for your time, and also the resources that have supported this work. I put my contact information there on the screen, and also the information about my collaborators who are involved with this specific project that I’ve talked about in detail today. And I’d be happy to answer any questions.
Jo Everhshed 15:56
Casey, that was absolutely brilliant. Thank you. Anybody who’s got any questions for Casey, can you start putting them in the q&a? Now? Hopefully, we get round to them. I’m still processing your talk doing this in real time. It’s getting towards the end of the day. I’m really sorry. But I did have a question. You, you’ve actually been really generous and positive about online research, generally. But there must have been some challenges getting this to work across so many people at scale across so many rescue research groups. What? Yeah, what were the challenges? And what person did you have to become in order to resolve them?
Casey Roark 16:33
Yeah, I really love the phrasing of that question. And it looks like someone has asked that in the chat as well. Yeah, so I definitely there are, of course, drawbacks. And I think that Gloria talked about this a lot in her talk earlier, looking at specifically this question of involving sound in these experiments online. So it’s important to have checks of whether or not people are wearing headphones, making sure you have you know, checks throughout an experiment to make sure people are continually attending to your sounds, and not just like throwing the headphones off to the side and just continually pressing buttons. So those are some kind of real drawbacks. In general, just thinking about how you can see variability in behaviour, for example, I show a lot of folks who weren’t really able to learn the categories, or performing it chance levels. And there’s a question kind of always, in the back of my mind is like, are they actually just struggling to learn and performing at chance? Or did they just completely like check out and they’re not interested in learning. And this is something we have to solve both in person and online. But I think it becomes especially hard when you can’t just kind of follow up with them after in person with a little bit of, you know, demand and say, Hey, like, did you really try in this experiment? Yes, it’s a challenge.
Jo Evershed 17:43
Definitely a challenge. Have you ever considered using insert like additional incentives? So I think prolific allow you to pay a bonus when people perform well, just to incentivize people not to do that and at least see if the data is different. When when they do?
Casey Roark 17:59
Yeah, that’s a great question. So incentives and rewards are really important. And learning, as was discussed in some of the kind of marketing research today thing. But I think one thing, that’s kind of it is important to kind of look at whether these things are different about offering incentives or not. But I’m really also curious about learning when people are learning what’s happening when people are struggling to learn. So I, this is something I’ve done in my in person studies before is saying, Hey, you’re gonna get a bonus, but then offering the bonus, regardless of really how they perform, so that it’s kind of more fair across different participants. But you are starting to try to encourage that. So that’s definitely something that can be an incentive, though, it brings up questions of fairness that I just want to highlight as well.
Jo Evershed 18:44
Yeah, the fairness question really gets us as researchers, doesn’t it? It also makes me think of something that I presented earlier, that Jenny Rodd said is like, it’s really hard to tell the difference between participants who just suck at your task, versus those that aren’t trying because recognising those four /ma/ tones is actually hard. Like to the Western ear, it’s really not an easy task.
Casey Roark 19:08
Yes, it’s very challenging. And so often I give people a variety of different tasks. So trying to kind of understand like, if you do really well on one task, but not well, in another, that’s pretty similar. It might just be an attention kind of level of thing, where you’re just kind of fatigued and tired and you don’t want to do the task anymore. So giving people these kind of multiple ways to measure their behaviour over time and different tasks could also be a solution to that.
Jo Evershed 19:34