(Reac­tion) Tim­ing is every­thing: Inves­ti­gat­ing learn­ing and deci­sion mak­ing in online research


Casey L. Roark, PhD — Uni­ver­si­ty of Pitts­burgh, Depart­ment of Com­mu­ni­ca­tion Sci­ence & Dis­or­ders Cen­ter for the Neur­al Basis of Cognition


Accu­rate and pre­cise mea­sure­ment of behav­ior is crit­i­cal for under­stand­ing human cog­ni­tion. In exper­i­men­tal con­texts, we col­lect behav­ioral mea­sures from par­tic­i­pants such as reac­tion times of but­ton press­es. My work inves­ti­gates how we learn to group com­plex objects in the sen­so­ry world into dif­fer­ent cat­e­gories. I lever­age com­pu­ta­tion­al mod­els of behav­ior that cap­i­tal­ize on reac­tion time infor­ma­tion to reveal psy­cho­log­i­cal­ly mean­ing­ful and dis­tinct cog­ni­tive process­es. Crit­i­cal­ly, these com­pu­ta­tion­al mod­els rely on accu­rate and pre­cise mea­sure­ments of reac­tion time. Over the past two years, my lab has lever­aged the Goril­la Exper­i­ment Builder and online recruit­ment strate­gies to bet­ter under­stand indi­vid­ual dif­fer­ences in learn­ing and deci­sion mak­ing. In this talk, I will dis­cuss how online research has enabled exam­i­na­tion of the mech­a­nisms of time-sen­si­tive learn­ing and deci­sion mak­ing in a diverse, glob­al population.

Full Tran­script:

Okay. Oh, I’m sure…there we go. Okay. So all right, so I’m gonna give talk to you a lit­tle bit about my research on learn­ing and deci­sion mak­ing, and how I have lever­aged online meth­ods to study that. So first, I’m going to give a brief overview of the type of work that I do giv­en this broad audi­ence that we have here. And then I’m going to give you details about a spe­cif­ic study that I first con­duct­ed in per­son, and then used online research meth­ods to con­duct a repli­ca­tion in a wider sample.

Right. So my work focus­es on how we learn about the sen­so­ry world, and par­tic­u­lar­ly how we organ­ise that world into cat­e­gories. So your knowl­edge of the cat­e­go­ry dog enables you to quick­ly iden­ti­fy this crea­ture here as a dog. And sim­i­lar­ly, you know that this crea­ture is also a dog, even though it has this fan­cy lit­tle zebra print coat on it. Alright, so what about this crea­ture? What is this, that might have tak­en you just a split sec­ond longer to recog­nise that this is actu­al­ly a horse and not a zebra, because even though it has this pat­tern on its coat, it is in fact a horse under­neath there. And final­ly, you can quick­ly iden­ti­fy this last crea­ture as a zebra, based on its coat and oth­er features.

So you’re able to lever­age your exist­ing cat­e­go­ry knowl­edge to gen­er­alise to things that you’ve prob­a­bly nev­er seen before, like this dog or a horse in a zebra print jack­et. And this seems maybe slight­ly triv­ial, but a com­put­er try­ing to solve this prob­lem, or a child, infant, might have trou­ble telling us that this is a horse or a dog instead of a zebra because of their oth­er visu­al sim­i­lar­i­ties. And so as humans, we can achieve these remark­able feats of gen­er­al­i­sa­tion that machines, for exam­ple, find very difficult.

And so cat­e­go­riza­tion is also not lim­it­ed to only the visu­al modal­i­ty, we use cat­e­gories and sounds as well. And so cat­e­go­riza­tion allows you to lis­ten to my voice, even if you’ve nev­er heard me speak before. And when I say the words, bear and pair, you can recog­nise these as dif­fer­ent words that map onto these dif­fer­ent mean­ings, even though they real­ly only dif­fer in this first sound, the buh ver­sus puh sound. And we’re able to do this remark­ably flex­i­bly across dif­fer­ent speak­ers across dif­fer­ent con­texts. And so cat­e­go­riza­tion is real­ly at the heart of these fun­da­men­tal process­es like object recog­ni­tion, indi­vid­ual modal­i­ty, and speech per­cep­tion in the audi­to­ry modality.

And I’m par­tic­u­lar­ly inter­est­ed in how we learn about new cat­e­gories. So for exam­ple, if you want­ed to take up a bird watch­ing hob­by, you might need to learn to dis­tin­guish between these two dif­fer­ent species of birds, which are a house finch and a pur­ple Finch. Sim­i­lar­ly, if you’re learn­ing a new lan­guage, you need to learn about the sounds of that lan­guage, which might be dif­fer­ent from your own. So for instance, native speak­ers of non tonal lan­guages like Eng­lish, would need to learn to dis­tin­guish between tonal pitch pat­terns to dis­tin­guish words in tonal lan­guages, like Man­darin Chi­nese. So for exam­ple, in Man­darin Chi­nese, you have the same syl­la­ble here, I’m show­ing /ma/ mapped with four dif­fer­ent tone pat­terns, which com­plete­ly changes the mean­ing of that under­ly­ing word. And just to give you an exam­ple of what the sounds like, I hope the sound is com­ing through now, here is an exam­ple of this first tone, it’s just high and sta­ble tone over time, Ma. And then the sec­ond tone is a ris­ing tone over time, Ma.

Alright, so how do we study this in an exper­i­men­tal con­text. So in a kind of very kind of pared down ver­sion of this kind of inter­est, it’s not as gam­i­fied as some oth­er tasks we’ve heard about today, we would play in an odd, for exam­ple, a sound from a par­tic­u­lar kind of cat­e­go­ry. So I use these kind of alien like sounds that are kind of inter­est­ing for peo­ple to hear, and again, hop­ing the sound is com­ing through like that. And peo­ple make these overt choic­es about what cat­e­go­ry they think that belongs to. So in this case, decid­ing is this cat­e­go­ry one or two, and then they get some kind of feed­back about the response. So cor­rect or incor­rect. And peo­ple might also do this in a visu­al kind of task, which I’ll talk about more today. So see­ing an image like this kind of arbi­trary image, just show­ing you on the screen that varies in the width and ori­en­ta­tion of these lines. And then they’re mak­ing these over deci­sions and get­ting that feedback.

Alright, so then what we can do is look at peo­ple’s abil­i­ty to learn cat­e­gories in these con­texts. And so they’re learn­ing to make more accu­rate deci­sions, giv­en the feed­back that they’re get­ting. So here I’m show­ing you the pro­por­tion, cor­rect or accu­ra­cy across blocks of a train­ing task, where we train peo­ple on these audi­to­ry and visu­al cat­e­gories. So the audi­to­ry is in red and visu­al in blue, you can see that over­all on aver­age, and the dark­er line here, peo­ple are able to learn these cat­e­gories. And then in the lighter lines, what I’m show­ing you is indi­vid­ual par­tic­i­pant per­for­mance. So you can see there’s lots of very abil­i­ty and how well peo­ple are able to learn with some peo­ple up here in real­ly high lev­els of per­for­mance, and oth­ers around this dashed line, which reflects chance lev­els of performance.

And so we can also look at oth­er aspects of their behav­iour to under­stand that the psy­cho­log­i­cal pro­cess­ing is going on as peo­ple are learn­ing. So one of these is in their reac­tion time or how fast they respond. So this is mea­sured in mil­lisec­onds here. And it’s just the time it takes them to actu­al­ly push the but­ton to iden­ti­fy what cat­e­go­ry they think that either sound or image belong to. And so we can see here that our par­tic­i­pants were slight­ly slow­er in the visu­al task and this kind of ear­ly blocks. So the blue line here is high­er than the red line. But these kind of con­verge over time as the learn­ing task goes on.

Okay, so what we real­ly want to under­stand is what this infor­ma­tion about peo­ple’s choic­es, and the reac­tion times can tell us about what’s going on psy­cho­log­i­cal­ly, in learn­ers minds as they’re doing these tasks. So to under­stand this, we lever­age com­pu­ta­tion­al mod­els called drift dif­fu­sion mod­els that take into account both how accu­rate deci­sions are, and also how fast these deci­sions are to esti­mate sep­a­ra­ble psy­cho­log­i­cal process­es in deci­sion mak­ing. So I’ll give you a sort of toy exam­ple here to kind of just explain the log­ic of these mod­els. So when you saw this crea­ture ear­li­er, you made again, this prob­a­bly split sec­ond deci­sion about whether this was a horse or a zebra, but it was still a deci­sion that you had to make. And so we can think of this, as soon as you see this image, this deci­sion process starts unfold­ing across time. So we start kind of accu­mu­lat­ing evi­dence towards either decid­ing whether this is a horse or a zebra.

So let’s say you prob­a­bly start a lit­tle clos­er to mak­ing the zebra sort of deci­sion, because I just showed you, the dog and the zebra print jack­ets, maybe I primed you slight­ly. But then as you get more and more infor­ma­tion from this image, see­ing, okay, maybe it’s just this got this like a weird flap going on here, this is not a real zebra, this has to be a horse, you’re going to shoot up and evid- you accu­mu­late the evi­dence towards mak­ing that deci­sion. This is def­i­nite­ly a horse, not a zebra. And so we see this process through these mod­els as the accu­mu­la­tion of evi­dence towards these con­trast­ing choic­es. So here horse or zebra, in the cat­e­go­riza­tion con­text, cat­e­go­ry one or cat­e­go­ry two. And then you make a deci­sion when to cross a thresh­old of evi­dence that you need to actu­al­ly accu­mu­late. So once you get enough infor­ma­tion that this was a horse, that’s when you make your decision.

So again, just being real­ly explic­it about how this works in our kind of more arbi­trary tasks, where we either play a sound or show an image where peo­ple are decid­ing this cat­e­go­ry, we see this process unfold­ing across time. So they’re accu­mu­lat­ing evi­dence towards a par­tic­u­lar deci­sion, let’s say cat­e­go­ry one, in this case, at a par­tic­u­lar rate. So basi­cal­ly, how fast they’re get­ting infor­ma­tion from that stim­u­lus rep­re­sents kind of how easy it is for them to kind of get infor­ma­tion to inform their deci­sion. And then again, they’re going to try to reach this deci­sion thresh­olds. And once they reach that thresh­old in this evi­dence accu­mu­la­tion process, that’s when they’re actu­al­ly going to ini­ti­ate their response actu­al­ly start the process of press­ing the but­ton, which is reflect­ed in this dashed line here.

So this is that under­ly­ing process that we are try­ing to esti­mate using this mod­el­ling approach­es. And we’re going to look at how par­tic­i­pants learn this. And to dis­tin­guish between two dif­fer­ent cat­e­gories, we can esti­mate these kind of para­me­ters here at the indi­vid­ual sub­ject lev­el, and also lon­gi­tu­di­nal­ly across blocks as they are learn­ing. Right, so then, let me show you what we found here for this audi­to­ry and visu­al task in the lab. So here at first, I’ll show you this para­me­ter of evi­dence accu­mu­la­tion rate, again, how fast they’re able to get the infor­ma­tion, they need to make the deci­sion about that stim­u­lus. And so high­er val­ues here are rep­re­sent­ing kind of more effi­cient evi­dence accu­mu­la­tion. So you’re get­ting infor­ma­tion a lot more quick­ly and effi­cient­ly as a process. And then here’s what that looks like for the audi­to­ry and visu­al tasks. So we see here sort of this crossover, where ini­tial­ly in our visu­al task, par­tic­i­pants are less effi­cient at get­ting infor­ma­tion than they are in the audi­to­ry tasks, with its cross­es over across time. And by the end of train­ing, they’re more effi­cient in the visu­al domain than the audi­to­ry domain.

We can also look at this oth­er para­me­ter, we’ve talked about this deci­sion thresh­old. And so here we can define these para­me­ters based on whether they were more cau­tious or less cau­tious in their respons­es. So high­er val­ues here are reflect­ing times where par­tic­i­pants are wait­ing to gath­er enough infor­ma­tion. So for exam­ple, they’re look­ing at that horse longer and longer to make sure that they real­ly have it right that it’s a horse and not a zebra. So here again, we’re see­ing the sort of crossover between the two modal­i­ties where ini­tial­ly par­tic­i­pants are more cau­tious with the audi­to­ry modal­i­ty and they show this sort of steep decline and in how cau­tious they are about that process as their accu­ra­cy increas­es across these dif­fer­ent blocks.

All right, so all of this is real­ly about online research, right. And I’ve just talked to you about in per­son research. So I want to show you now how we have lever­aged online data col­lec­tion through Goril­la to rapid­ly and effi­cient­ly col­lect data to repli­cate this in per­son study in a wider online sam­ple. So in per­son, we ran this study on 30 par­tic­i­pants, and online, we were able to run in near­ly 100 par­tic­i­pants. And just to give you a sense of how much time this took us in per­son, with a ded­i­cat­ed per­son there to run the study, it took about a month to col­lect this data, ver­sus data col­lect­ed via Goril­la in under 48 hours, so extreme­ly, extreme­ly fast. And the in per­son study, we were lim­it­ed to our local pop­u­la­tion in Pitts­burgh, Penn­syl­va­nia in the US, where­as in our online repli­ca­tion, we were able to get a glob­al pop­u­la­tion through Pro­lif­ic specifically.

And then final­ly, in the lab, in per­son, we ran par­tic­i­pants on our kind of con­trolled lab­o­ra­to­ry com­put­ers and pro­fes­sion­al lev­el head­phones. In envi­ron­ments, we could ensure were extreme­ly qui­et, where it’s online, sor­ry, online, we ran par­tic­i­pants on their own com­put­ers and using their own head­phones, which are obvi­ous­ly have a more vari­ety of qual­i­ty com­pared to our in per­son study. Right, then we can talk about what actu­al­ly hap­pened in this online repli­ca­tion. So as a reminder, this is what our in per­son study looked like with accu­ra­cy, and our indi­vid­ual dif­fer­ences across par­tic­i­pants. And then this is what the online study looked like. So you can see here, there’s more par­tic­i­pants and more of these lighter lines here. But gen­er­al­ly, we’re see­ing the same kind of pat­tern of accu­ra­cy, we don’t see a lot of dif­fer­ences between modal­i­ties. And peo­ple gen­er­al­ly are able to learn.

Then we can also look at this mea­sure of reac­tion time that we looked at. So this is our in per­son study. And then this is what it looks like online. So imme­di­ate­ly, I’ll note that the scale here has changed for reac­tion time. So where it’s in here, we’re in the sub sec­ond sort of range in our in per­son study, on aver­age, we have some folks who are kind of get­ting up above one sec­ond, and even in this case above two sec­onds to respond, on aver­age on a tri­al. And so we’re see­ing a lot more vari­abil­i­ty in how the reac­tion times look over time, we still have plen­ty of folks here who are respond­ing very quickly.

All right, then what does our deci­sion process­es assessed by these drift dif­fu­sion mod­els? What are those look like? So again, we’re look­ing at our evi­dence accu­mu­la­tion rate on our in per­son sam­ple. And this is what our online sam­ple looks like. So see­ing still that crossover across modal­i­ties, and real­ly very sim­i­lar pat­tern across in per­son and online. And then we have our deci­sion thresh­old, again, our in per­son see­ing a dif­fer­ent sort of pat­tern of crossover here between the modal­i­ties. And then this is what we see online.

So we effec­tive­ly per­fect­ly repli­cat­ed these results. And this is real­ly excit­ing and mean­ing­ful. Because these folks were dif­fer­ent from our in per­son sam­ple. Again, this is a glob­al pop­u­la­tion, peo­ple were using their own machines, their own head­phones, we saw that over­all, they were slow­er in a lot of cas­es. But they learned just as well as peo­ple who were seat­ed in there kind of a qui­et con­trol lev­el of envi­ron­ment. And yet, we’re still see­ing the same pat­terns of the psy­cho­log­i­cal process­es through these drift dif­fu­sion mod­els. And this also real­ly tells us that using goril­la to col­lect these reac­tion time mea­sures is cap­tur­ing infor­ma­tion about the psy­cho­log­i­cal process­es that we see inside the lab as well, using just dif­fer­ent soft­ware that we’ve used across the years.

So I’ll just briefly kind of sum­marise the data ben­e­fits of col­lect­ing data online that we saw both in this exper­i­ment and what I’ve seen in my research in gen­er­al. So first, as we’ve talked about, I’ve seen this abil­i­ty to repli­cate in sam­ples out­side of psy­chol­o­gy, sub­ject pools, or oth­er­wise homoge­nous sam­ples that we see often inside of the lab or in a lim­it­ed abil­i­ty to kind of col­lect the data across, you know, a broad sam­ple pop­u­la­tion. And so this is both in the study that I’ve dis­cussed in detail today. But also anoth­er study look­ing at inci­den­tal cat­e­go­ry learn­ing in mul­ti­ple exper­i­ments in this oth­er cita­tion that I have here.

Online Data Col­lec­tion has also real­ly enabled seam­less col­lab­o­ra­tion across the world. So I have col­leagues in Hong Kong who are able to access both the stim­u­lus mate­ri­als and also data and were able to access to extreme­ly sim­ply to be able to col­lab­o­rate real­ly eas­i­ly rather than send­ing files back and forth or shar­ing it in some oth­er way. That does get a bit clunky here, we can just work on it in the same platform.

And then final­ly, this gives us real­ly the abil­i­ty to recruit sam­ples with more diverse expe­ri­ences. So obvi­ous­ly I’ve men­tioned Just kind of abil­i­ty to look at the glob­al pop­u­la­tion. But some­thing else that we par­tic­u­lar­ly looked at in this spe­cif­ic study that I’ve list­ed here is look­ing at peo­ple with a diverse array of music expe­ri­ences. So we just kind of looked at a sam­ple, not real­ly specif­i­cal­ly sam­pling for music expe­ri­ence, but kind of just see­ing what hap­pens when you look at kind of just a broad sam­ple of indi­vid­u­als with music expe­ri­ence. And that’s real­ly some­thing that’s only able to do with online research. Because it’s, you get a more diverse sam­ple that way. All right. And this is, again, just all real­ly impor­tant so that we can exam­ine things like learn­ing and deci­sion mak­ing effi­cient­ly and using these more gen­er­al pop­u­la­tions. And with that, I real­ly want to thank you for your time, and also the resources that have sup­port­ed this work. I put my con­tact infor­ma­tion there on the screen, and also the infor­ma­tion about my col­lab­o­ra­tors who are involved with this spe­cif­ic project that I’ve talked about in detail today. And I’d be hap­py to answer any questions.

Casey, that was absolute­ly bril­liant. Thank you. Any­body who’s got any ques­tions for Casey, can you start putting them in the q&a? Now? Hope­ful­ly, we get round to them. I’m still pro­cess­ing your talk doing this in real time. It’s get­ting towards the end of the day. I’m real­ly sor­ry. But I did have a ques­tion. You, you’ve actu­al­ly been real­ly gen­er­ous and pos­i­tive about online research, gen­er­al­ly. But there must have been some chal­lenges get­ting this to work across so many peo­ple at scale across so many res­cue research groups. What? Yeah, what were the chal­lenges? And what per­son did you have to become in order to resolve them?

Yeah, I real­ly love the phras­ing of that ques­tion. And it looks like some­one has asked that in the chat as well. Yeah, so I def­i­nite­ly there are, of course, draw­backs. And I think that Glo­ria talked about this a lot in her talk ear­li­er, look­ing at specif­i­cal­ly this ques­tion of involv­ing sound in these exper­i­ments online. So it’s impor­tant to have checks of whether or not peo­ple are wear­ing head­phones, mak­ing sure you have you know, checks through­out an exper­i­ment to make sure peo­ple are con­tin­u­al­ly attend­ing to your sounds, and not just like throw­ing the head­phones off to the side and just con­tin­u­al­ly press­ing but­tons. So those are some kind of real draw­backs. In gen­er­al, just think­ing about how you can see vari­abil­i­ty in behav­iour, for exam­ple, I show a lot of folks who weren’t real­ly able to learn the cat­e­gories, or per­form­ing it chance lev­els. And there’s a ques­tion kind of always, in the back of my mind is like, are they actu­al­ly just strug­gling to learn and per­form­ing at chance? Or did they just com­plete­ly like check out and they’re not inter­est­ed in learn­ing. And this is some­thing we have to solve both in per­son and online. But I think it becomes espe­cial­ly hard when you can’t just kind of fol­low up with them after in per­son with a lit­tle bit of, you know, demand and say, Hey, like, did you real­ly try in this exper­i­ment? Yes, it’s a challenge.

Def­i­nite­ly a chal­lenge. Have you ever con­sid­ered using insert like addi­tion­al incen­tives? So I think pro­lif­ic allow you to pay a bonus when peo­ple per­form well, just to incen­tivize peo­ple not to do that and at least see if the data is dif­fer­ent. When when they do?

Yeah, that’s a great ques­tion. So incen­tives and rewards are real­ly impor­tant. And learn­ing, as was dis­cussed in some of the kind of mar­ket­ing research today thing. But I think one thing, that’s kind of it is impor­tant to kind of look at whether these things are dif­fer­ent about offer­ing incen­tives or not. But I’m real­ly also curi­ous about learn­ing when peo­ple are learn­ing what’s hap­pen­ing when peo­ple are strug­gling to learn. So I, this is some­thing I’ve done in my in per­son stud­ies before is say­ing, Hey, you’re gonna get a bonus, but then offer­ing the bonus, regard­less of real­ly how they per­form, so that it’s kind of more fair across dif­fer­ent par­tic­i­pants. But you are start­ing to try to encour­age that. So that’s def­i­nite­ly some­thing that can be an incen­tive, though, it brings up ques­tions of fair­ness that I just want to high­light as well.

Yeah, the fair­ness ques­tion real­ly gets us as researchers, does­n’t it? It also makes me think of some­thing that I pre­sent­ed ear­li­er, that Jen­ny Rodd said is like, it’s real­ly hard to tell the dif­fer­ence between par­tic­i­pants who just suck at your task, ver­sus those that aren’t try­ing because recog­nis­ing those four /ma/ tones is actu­al­ly hard. Like to the West­ern ear, it’s real­ly not an easy task.

Yes, it’s very chal­leng­ing. And so often I give peo­ple a vari­ety of dif­fer­ent tasks. So try­ing to kind of under­stand like, if you do real­ly well on one task, but not well, in anoth­er, that’s pret­ty sim­i­lar. It might just be an atten­tion kind of lev­el of thing, where you’re just kind of fatigued and tired and you don’t want to do the task any­more. So giv­ing peo­ple these kind of mul­ti­ple ways to mea­sure their behav­iour over time and dif­fer­ent tasks could also be a solu­tion to that.

