(Reac­tion) Tim­ing is every­thing: Inves­ti­gat­ing learn­ing and deci­sion mak­ing in online research


By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

Casey L. Roark, PhD — Uni­ver­si­ty of Pitts­burgh, Depart­ment of Com­mu­ni­ca­tion Sci­ence & Dis­or­ders Cen­ter for the Neur­al Basis of Cognition


Accu­rate and pre­cise mea­sure­ment of behav­ior is crit­i­cal for under­stand­ing human cog­ni­tion. In exper­i­men­tal con­texts, we col­lect behav­ioral mea­sures from par­tic­i­pants such as reac­tion times of but­ton press­es. My work inves­ti­gates how we learn to group com­plex objects in the sen­so­ry world into dif­fer­ent cat­e­gories. I lever­age com­pu­ta­tion­al mod­els of behav­ior that cap­i­tal­ize on reac­tion time infor­ma­tion to reveal psy­cho­log­i­cal­ly mean­ing­ful and dis­tinct cog­ni­tive process­es. Crit­i­cal­ly, these com­pu­ta­tion­al mod­els rely on accu­rate and pre­cise mea­sure­ments of reac­tion time. Over the past two years, my lab has lever­aged the Goril­la Exper­i­ment Builder and online recruit­ment strate­gies to bet­ter under­stand indi­vid­ual dif­fer­ences in learn­ing and deci­sion mak­ing. In this talk, I will dis­cuss how online research has enabled exam­i­na­tion of the mech­a­nisms of time-sen­si­tive learn­ing and deci­sion mak­ing in a diverse, glob­al population.

Full Tran­script:

Casey Roark 0:00
Okay. Oh, I’m sure…there we go. Okay. So all right, so I’m gonna give talk to you a lit­tle bit about my research on learn­ing and deci­sion mak­ing, and how I have lever­aged online meth­ods to study that. So first, I’m going to give a brief overview of the type of work that I do giv­en this broad audi­ence that we have here. And then I’m going to give you details about a spe­cif­ic study that I first con­duct­ed in per­son, and then used online research meth­ods to con­duct a repli­ca­tion in a wider sample.

Right. So my work focus­es on how we learn about the sen­so­ry world, and par­tic­u­lar­ly how we organ­ise that world into cat­e­gories. So your knowl­edge of the cat­e­go­ry dog enables you to quick­ly iden­ti­fy this crea­ture here as a dog. And sim­i­lar­ly, you know that this crea­ture is also a dog, even though it has this fan­cy lit­tle zebra print coat on it. Alright, so what about this crea­ture? What is this, that might have tak­en you just a split sec­ond longer to recog­nise that this is actu­al­ly a horse and not a zebra, because even though it has this pat­tern on its coat, it is in fact a horse under­neath there. And final­ly, you can quick­ly iden­ti­fy this last crea­ture as a zebra, based on its coat and oth­er features.

So you’re able to lever­age your exist­ing cat­e­go­ry knowl­edge to gen­er­alise to things that you’ve prob­a­bly nev­er seen before, like this dog or a horse in a zebra print jack­et. And this seems maybe slight­ly triv­ial, but a com­put­er try­ing to solve this prob­lem, or a child, infant, might have trou­ble telling us that this is a horse or a dog instead of a zebra because of their oth­er visu­al sim­i­lar­i­ties. And so as humans, we can achieve these remark­able feats of gen­er­al­i­sa­tion that machines, for exam­ple, find very difficult.

And so cat­e­go­riza­tion is also not lim­it­ed to only the visu­al modal­i­ty, we use cat­e­gories and sounds as well. And so cat­e­go­riza­tion allows you to lis­ten to my voice, even if you’ve nev­er heard me speak before. And when I say the words, bear and pair, you can recog­nise these as dif­fer­ent words that map onto these dif­fer­ent mean­ings, even though they real­ly only dif­fer in this first sound, the buh ver­sus puh sound. And we’re able to do this remark­ably flex­i­bly across dif­fer­ent speak­ers across dif­fer­ent con­texts. And so cat­e­go­riza­tion is real­ly at the heart of these fun­da­men­tal process­es like object recog­ni­tion, indi­vid­ual modal­i­ty, and speech per­cep­tion in the audi­to­ry modality.

And I’m par­tic­u­lar­ly inter­est­ed in how we learn about new cat­e­gories. So for exam­ple, if you want­ed to take up a bird watch­ing hob­by, you might need to learn to dis­tin­guish between these two dif­fer­ent species of birds, which are a house finch and a pur­ple Finch. Sim­i­lar­ly, if you’re learn­ing a new lan­guage, you need to learn about the sounds of that lan­guage, which might be dif­fer­ent from your own. So for instance, native speak­ers of non tonal lan­guages like Eng­lish, would need to learn to dis­tin­guish between tonal pitch pat­terns to dis­tin­guish words in tonal lan­guages, like Man­darin Chi­nese. So for exam­ple, in Man­darin Chi­nese, you have the same syl­la­ble here, I’m show­ing /ma/ mapped with four dif­fer­ent tone pat­terns, which com­plete­ly changes the mean­ing of that under­ly­ing word. And just to give you an exam­ple of what the sounds like, I hope the sound is com­ing through now, here is an exam­ple of this first tone, it’s just high and sta­ble tone over time, Ma. And then the sec­ond tone is a ris­ing tone over time, Ma.

Alright, so how do we study this in an exper­i­men­tal con­text. So in a kind of very kind of pared down ver­sion of this kind of inter­est, it’s not as gam­i­fied as some oth­er tasks we’ve heard about today, we would play in an odd, for exam­ple, a sound from a par­tic­u­lar kind of cat­e­go­ry. So I use these kind of alien like sounds that are kind of inter­est­ing for peo­ple to hear, and again, hop­ing the sound is com­ing through like that. And peo­ple make these overt choic­es about what cat­e­go­ry they think that belongs to. So in this case, decid­ing is this cat­e­go­ry one or two, and then they get some kind of feed­back about the response. So cor­rect or incor­rect. And peo­ple might also do this in a visu­al kind of task, which I’ll talk about more today. So see­ing an image like this kind of arbi­trary image, just show­ing you on the screen that varies in the width and ori­en­ta­tion of these lines. And then they’re mak­ing these over deci­sions and get­ting that feedback.

Alright, so then what we can do is look at peo­ple’s abil­i­ty to learn cat­e­gories in these con­texts. And so they’re learn­ing to make more accu­rate deci­sions, giv­en the feed­back that they’re get­ting. So here I’m show­ing you the pro­por­tion, cor­rect or accu­ra­cy across blocks of a train­ing task, where we train peo­ple on these audi­to­ry and visu­al cat­e­gories. So the audi­to­ry is in red and visu­al in blue, you can see that over­all on aver­age, and the dark­er line here, peo­ple are able to learn these cat­e­gories. And then in the lighter lines, what I’m show­ing you is indi­vid­ual par­tic­i­pant per­for­mance. So you can see there’s lots of very abil­i­ty and how well peo­ple are able to learn with some peo­ple up here in real­ly high lev­els of per­for­mance, and oth­ers around this dashed line, which reflects chance lev­els of performance.

And so we can also look at oth­er aspects of their behav­iour to under­stand that the psy­cho­log­i­cal pro­cess­ing is going on as peo­ple are learn­ing. So one of these is in their reac­tion time or how fast they respond. So this is mea­sured in mil­lisec­onds here. And it’s just the time it takes them to actu­al­ly push the but­ton to iden­ti­fy what cat­e­go­ry they think that either sound or image belong to. And so we can see here that our par­tic­i­pants were slight­ly slow­er in the visu­al task and this kind of ear­ly blocks. So the blue line here is high­er than the red line. But these kind of con­verge over time as the learn­ing task goes on.

Okay, so what we real­ly want to under­stand is what this infor­ma­tion about peo­ple’s choic­es, and the reac­tion times can tell us about what’s going on psy­cho­log­i­cal­ly, in learn­ers minds as they’re doing these tasks. So to under­stand this, we lever­age com­pu­ta­tion­al mod­els called drift dif­fu­sion mod­els that take into account both how accu­rate deci­sions are, and also how fast these deci­sions are to esti­mate sep­a­ra­ble psy­cho­log­i­cal process­es in deci­sion mak­ing. So I’ll give you a sort of toy exam­ple here to kind of just explain the log­ic of these mod­els. So when you saw this crea­ture ear­li­er, you made again, this prob­a­bly split sec­ond deci­sion about whether this was a horse or a zebra, but it was still a deci­sion that you had to make. And so we can think of this, as soon as you see this image, this deci­sion process starts unfold­ing across time. So we start kind of accu­mu­lat­ing evi­dence towards either decid­ing whether this is a horse or a zebra.

So let’s say you prob­a­bly start a lit­tle clos­er to mak­ing the zebra sort of deci­sion, because I just showed you, the dog and the zebra print jack­ets, maybe I primed you slight­ly. But then as you get more and more infor­ma­tion from this image, see­ing, okay, maybe it’s just this got this like a weird flap going on here, this is not a real zebra, this has to be a horse, you’re going to shoot up and evid- you accu­mu­late the evi­dence towards mak­ing that deci­sion. This is def­i­nite­ly a horse, not a zebra. And so we see this process through these mod­els as the accu­mu­la­tion of evi­dence towards these con­trast­ing choic­es. So here horse or zebra, in the cat­e­go­riza­tion con­text, cat­e­go­ry one or cat­e­go­ry two. And then you make a deci­sion when to cross a thresh­old of evi­dence that you need to actu­al­ly accu­mu­late. So once you get enough infor­ma­tion that this was a horse, that’s when you make your decision.

So again, just being real­ly explic­it about how this works in our kind of more arbi­trary tasks, where we either play a sound or show an image where peo­ple are decid­ing this cat­e­go­ry, we see this process unfold­ing across time. So they’re accu­mu­lat­ing evi­dence towards a par­tic­u­lar deci­sion, let’s say cat­e­go­ry one, in this case, at a par­tic­u­lar rate. So basi­cal­ly, how fast they’re get­ting infor­ma­tion from that stim­u­lus rep­re­sents kind of how easy it is for them to kind of get infor­ma­tion to inform their deci­sion. And then again, they’re going to try to reach this deci­sion thresh­olds. And once they reach that thresh­old in this evi­dence accu­mu­la­tion process, that’s when they’re actu­al­ly going to ini­ti­ate their response actu­al­ly start the process of press­ing the but­ton, which is reflect­ed in this dashed line here.

So this is that under­ly­ing process that we are try­ing to esti­mate using this mod­el­ling approach­es. And we’re going to look at how par­tic­i­pants learn this. And to dis­tin­guish between two dif­fer­ent cat­e­gories, we can esti­mate these kind of para­me­ters here at the indi­vid­ual sub­ject lev­el, and also lon­gi­tu­di­nal­ly across blocks as they are learn­ing. Right, so then, let me show you what we found here for this audi­to­ry and visu­al task in the lab. So here at first, I’ll show you this para­me­ter of evi­dence accu­mu­la­tion rate, again, how fast they’re able to get the infor­ma­tion, they need to make the deci­sion about that stim­u­lus. And so high­er val­ues here are rep­re­sent­ing kind of more effi­cient evi­dence accu­mu­la­tion. So you’re get­ting infor­ma­tion a lot more quick­ly and effi­cient­ly as a process. And then here’s what that looks like for the audi­to­ry and visu­al tasks. So we see here sort of this crossover, where ini­tial­ly in our visu­al task, par­tic­i­pants are less effi­cient at get­ting infor­ma­tion than they are in the audi­to­ry tasks, with its cross­es over across time. And by the end of train­ing, they’re more effi­cient in the visu­al domain than the audi­to­ry domain.

We can also look at this oth­er para­me­ter, we’ve talked about this deci­sion thresh­old. And so here we can define these para­me­ters based on whether they were more cau­tious or less cau­tious in their respons­es. So high­er val­ues here are reflect­ing times where par­tic­i­pants are wait­ing to gath­er enough infor­ma­tion. So for exam­ple, they’re look­ing at that horse longer and longer to make sure that they real­ly have it right that it’s a horse and not a zebra. So here again, we’re see­ing the sort of crossover between the two modal­i­ties where ini­tial­ly par­tic­i­pants are more cau­tious with the audi­to­ry modal­i­ty and they show this sort of steep decline and in how cau­tious they are about that process as their accu­ra­cy increas­es across these dif­fer­ent blocks.

All right, so all of this is real­ly about online research, right. And I’ve just talked to you about in per­son research. So I want to show you now how we have lever­aged online data col­lec­tion through Goril­la to rapid­ly and effi­cient­ly col­lect data to repli­cate this in per­son study in a wider online sam­ple. So in per­son, we ran this study on 30 par­tic­i­pants, and online, we were able to run in near­ly 100 par­tic­i­pants. And just to give you a sense of how much time this took us in per­son, with a ded­i­cat­ed per­son there to run the study, it took about a month to col­lect this data, ver­sus data col­lect­ed via Goril­la in under 48 hours, so extreme­ly, extreme­ly fast. And the in per­son study, we were lim­it­ed to our local pop­u­la­tion in Pitts­burgh, Penn­syl­va­nia in the US, where­as in our online repli­ca­tion, we were able to get a glob­al pop­u­la­tion through Pro­lif­ic specifically.

And then final­ly, in the lab, in per­son, we ran par­tic­i­pants on our kind of con­trolled lab­o­ra­to­ry com­put­ers and pro­fes­sion­al lev­el head­phones. In envi­ron­ments, we could ensure were extreme­ly qui­et, where it’s online, sor­ry, online, we ran par­tic­i­pants on their own com­put­ers and using their own head­phones, which are obvi­ous­ly have a more vari­ety of qual­i­ty com­pared to our in per­son study. Right, then we can talk about what actu­al­ly hap­pened in this online repli­ca­tion. So as a reminder, this is what our in per­son study looked like with accu­ra­cy, and our indi­vid­ual dif­fer­ences across par­tic­i­pants. And then this is what the online study looked like. So you can see here, there’s more par­tic­i­pants and more of these lighter lines here. But gen­er­al­ly, we’re see­ing the same kind of pat­tern of accu­ra­cy, we don’t see a lot of dif­fer­ences between modal­i­ties. And peo­ple gen­er­al­ly are able to learn.

Then we can also look at this mea­sure of reac­tion time that we looked at. So this is our in per­son study. And then this is what it looks like online. So imme­di­ate­ly, I’ll note that the scale here has changed for reac­tion time. So where it’s in here, we’re in the sub sec­ond sort of range in our in per­son study, on aver­age, we have some folks who are kind of get­ting up above one sec­ond, and even in this case above two sec­onds to respond, on aver­age on a tri­al. And so we’re see­ing a lot more vari­abil­i­ty in how the reac­tion times look over time, we still have plen­ty of folks here who are respond­ing very quickly.

All right, then what does our deci­sion process­es assessed by these drift dif­fu­sion mod­els? What are those look like? So again, we’re look­ing at our evi­dence accu­mu­la­tion rate on our in per­son sam­ple. And this is what our online sam­ple looks like. So see­ing still that crossover across modal­i­ties, and real­ly very sim­i­lar pat­tern across in per­son and online. And then we have our deci­sion thresh­old, again, our in per­son see­ing a dif­fer­ent sort of pat­tern of crossover here between the modal­i­ties. And then this is what we see online.

So we effec­tive­ly per­fect­ly repli­cat­ed these results. And this is real­ly excit­ing and mean­ing­ful. Because these folks were dif­fer­ent from our in per­son sam­ple. Again, this is a glob­al pop­u­la­tion, peo­ple were using their own machines, their own head­phones, we saw that over­all, they were slow­er in a lot of cas­es. But they learned just as well as peo­ple who were seat­ed in there kind of a qui­et con­trol lev­el of envi­ron­ment. And yet, we’re still see­ing the same pat­terns of the psy­cho­log­i­cal process­es through these drift dif­fu­sion mod­els. And this also real­ly tells us that using goril­la to col­lect these reac­tion time mea­sures is cap­tur­ing infor­ma­tion about the psy­cho­log­i­cal process­es that we see inside the lab as well, using just dif­fer­ent soft­ware that we’ve used across the years.

So I’ll just briefly kind of sum­marise the data ben­e­fits of col­lect­ing data online that we saw both in this exper­i­ment and what I’ve seen in my research in gen­er­al. So first, as we’ve talked about, I’ve seen this abil­i­ty to repli­cate in sam­ples out­side of psy­chol­o­gy, sub­ject pools, or oth­er­wise homoge­nous sam­ples that we see often inside of the lab or in a lim­it­ed abil­i­ty to kind of col­lect the data across, you know, a broad sam­ple pop­u­la­tion. And so this is both in the study that I’ve dis­cussed in detail today. But also anoth­er study look­ing at inci­den­tal cat­e­go­ry learn­ing in mul­ti­ple exper­i­ments in this oth­er cita­tion that I have here.

Online Data Col­lec­tion has also real­ly enabled seam­less col­lab­o­ra­tion across the world. So I have col­leagues in Hong Kong who are able to access both the stim­u­lus mate­ri­als and also data and were able to access to extreme­ly sim­ply to be able to col­lab­o­rate real­ly eas­i­ly rather than send­ing files back and forth or shar­ing it in some oth­er way. That does get a bit clunky here, we can just work on it in the same platform.

And then final­ly, this gives us real­ly the abil­i­ty to recruit sam­ples with more diverse expe­ri­ences. So obvi­ous­ly I’ve men­tioned Just kind of abil­i­ty to look at the glob­al pop­u­la­tion. But some­thing else that we par­tic­u­lar­ly looked at in this spe­cif­ic study that I’ve list­ed here is look­ing at peo­ple with a diverse array of music expe­ri­ences. So we just kind of looked at a sam­ple, not real­ly specif­i­cal­ly sam­pling for music expe­ri­ence, but kind of just see­ing what hap­pens when you look at kind of just a broad sam­ple of indi­vid­u­als with music expe­ri­ence. And that’s real­ly some­thing that’s only able to do with online research. Because it’s, you get a more diverse sam­ple that way. All right. And this is, again, just all real­ly impor­tant so that we can exam­ine things like learn­ing and deci­sion mak­ing effi­cient­ly and using these more gen­er­al pop­u­la­tions. And with that, I real­ly want to thank you for your time, and also the resources that have sup­port­ed this work. I put my con­tact infor­ma­tion there on the screen, and also the infor­ma­tion about my col­lab­o­ra­tors who are involved with this spe­cif­ic project that I’ve talked about in detail today. And I’d be hap­py to answer any questions.

Jo Everhshed 15:56
Casey, that was absolute­ly bril­liant. Thank you. Any­body who’s got any ques­tions for Casey, can you start putting them in the q&a? Now? Hope­ful­ly, we get round to them. I’m still pro­cess­ing your talk doing this in real time. It’s get­ting towards the end of the day. I’m real­ly sor­ry. But I did have a ques­tion. You, you’ve actu­al­ly been real­ly gen­er­ous and pos­i­tive about online research, gen­er­al­ly. But there must have been some chal­lenges get­ting this to work across so many peo­ple at scale across so many res­cue research groups. What? Yeah, what were the chal­lenges? And what per­son did you have to become in order to resolve them?

Casey Roark 16:33
Yeah, I real­ly love the phras­ing of that ques­tion. And it looks like some­one has asked that in the chat as well. Yeah, so I def­i­nite­ly there are, of course, draw­backs. And I think that Glo­ria talked about this a lot in her talk ear­li­er, look­ing at specif­i­cal­ly this ques­tion of involv­ing sound in these exper­i­ments online. So it’s impor­tant to have checks of whether or not peo­ple are wear­ing head­phones, mak­ing sure you have you know, checks through­out an exper­i­ment to make sure peo­ple are con­tin­u­al­ly attend­ing to your sounds, and not just like throw­ing the head­phones off to the side and just con­tin­u­al­ly press­ing but­tons. So those are some kind of real draw­backs. In gen­er­al, just think­ing about how you can see vari­abil­i­ty in behav­iour, for exam­ple, I show a lot of folks who weren’t real­ly able to learn the cat­e­gories, or per­form­ing it chance lev­els. And there’s a ques­tion kind of always, in the back of my mind is like, are they actu­al­ly just strug­gling to learn and per­form­ing at chance? Or did they just com­plete­ly like check out and they’re not inter­est­ed in learn­ing. And this is some­thing we have to solve both in per­son and online. But I think it becomes espe­cial­ly hard when you can’t just kind of fol­low up with them after in per­son with a lit­tle bit of, you know, demand and say, Hey, like, did you real­ly try in this exper­i­ment? Yes, it’s a challenge.

Jo Ever­shed 17:43
Def­i­nite­ly a chal­lenge. Have you ever con­sid­ered using insert like addi­tion­al incen­tives? So I think pro­lif­ic allow you to pay a bonus when peo­ple per­form well, just to incen­tivize peo­ple not to do that and at least see if the data is dif­fer­ent. When when they do?

Casey Roark 17:59
Yeah, that’s a great ques­tion. So incen­tives and rewards are real­ly impor­tant. And learn­ing, as was dis­cussed in some of the kind of mar­ket­ing research today thing. But I think one thing, that’s kind of it is impor­tant to kind of look at whether these things are dif­fer­ent about offer­ing incen­tives or not. But I’m real­ly also curi­ous about learn­ing when peo­ple are learn­ing what’s hap­pen­ing when peo­ple are strug­gling to learn. So I, this is some­thing I’ve done in my in per­son stud­ies before is say­ing, Hey, you’re gonna get a bonus, but then offer­ing the bonus, regard­less of real­ly how they per­form, so that it’s kind of more fair across dif­fer­ent par­tic­i­pants. But you are start­ing to try to encour­age that. So that’s def­i­nite­ly some­thing that can be an incen­tive, though, it brings up ques­tions of fair­ness that I just want to high­light as well.

Jo Ever­shed 18:44
Yeah, the fair­ness ques­tion real­ly gets us as researchers, does­n’t it? It also makes me think of some­thing that I pre­sent­ed ear­li­er, that Jen­ny Rodd said is like, it’s real­ly hard to tell the dif­fer­ence between par­tic­i­pants who just suck at your task, ver­sus those that aren’t try­ing because recog­nis­ing those four /ma/ tones is actu­al­ly hard. Like to the West­ern ear, it’s real­ly not an easy task.

Casey Roark 19:08
Yes, it’s very chal­leng­ing. And so often I give peo­ple a vari­ety of dif­fer­ent tasks. So try­ing to kind of under­stand like, if you do real­ly well on one task, but not well, in anoth­er, that’s pret­ty sim­i­lar. It might just be an atten­tion kind of lev­el of thing, where you’re just kind of fatigued and tired and you don’t want to do the task any­more. So giv­ing peo­ple these kind of mul­ti­ple ways to mea­sure their behav­iour over time and dif­fer­ent tasks could also be a solu­tion to that.

Jo Ever­shed 19:34

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

With thanks to our sponsors!