Jason Geller:
So first off, I want to thank Rachel for invit­ing me to speak on this pan­el. For the past cou­ple of years, I’ve been an obser­vant of The BeOnline Con­fer­ence, so it’s real­ly awe­some to actu­al­ly be a par­tic­i­pant and talk­ing about audi­to­ry research online. So today, I’m going to be talk­ing to you about a project that I start­ed work­ing on when I was a post­doc at the Uni­ver­si­ty of Iowa. And while there, myself and also col­leagues devel­oped the task called the Iowa Test of Con­so­nant Per­cep­tion, and what we did is we tried to val­i­date it. So that is what I’m going to be talk­ing to you about today.

Jason Geller:
So to start, I want you to imag­ine that you’re at a bar, pre pan­dem­ic, and you’re hav­ing a con­ver­sa­tion. You’re trad­ing con­ver­sa­tion back and forth, and while you’re doing this there’s traf­fic com­ing into your domain, there’s a blar­ing music, and there are oth­er peo­ple talk­ing. So a crit­i­cal ques­tion for speech per­cep­tion is how are we able to attend to the con­ver­sa­tion that we’re hav­ing with the peo­ple close to us while ignor­ing all this extra­ne­ous noise that’s also occur­ring con­cur­rent­ly while we’re try­ing to have this con­ver­sa­tion? So this is kind of the clas­sic cock­tail par­ty problem.

Jason Geller:
So one way that we can kind of assess this speech and noise issue is by using speech and noise tasks. This is what audi­ol­o­gists and also lab­o­ra­to­ry researchers use, and they come in two fla­vors, there is an open-set tasks and then there’s closed-set tasks, and under­neath those dimen­sions there are sin­gle word recog­ni­tion tasks, as well as sen­tence based tasks. And what would a par­tic­i­pant see in an open-set task? So let me play you an exam­ple of that.

[crosstalk 00:01:50].

Jason Geller:
So a word or a sen­tence would be spliced into that mul­ti-speak­er bab­ble and indi­vid­u­als would have to kind of search their men­tal lex­i­con, choose the word they think they heard, and then they have to pro­duce it. And if you weren’t able to hear what that word was in the mul­ti-speak­er bab­ble, it was ball.

Jason Geller:
In con­trast, closed-set tasks don’t have a pro­duc­tion ele­ment, the same as the open-set tasks. Instead, they usu­al­ly have a forced choice task where they’re pre­sent­ed with sev­er­al options and they have to choose which one they think it is. And like I said before, it was ball that was inter­spersed into that speech and noise.

Jason Geller:
So gen­er­al­ly speak­ing, sen­tence-based, open-set tasks are gen­er­al­ly pre­ferred as they’re more rep­re­sen­ta­tive of every­day lis­ten­ing sit­u­a­tions, so they’re more eco­log­i­cal­ly valid. How­ev­er, open-set tasks are dif­fi­cult to use exper­i­men­tal­ly, right? So a sen­tence-based open-set task would engage a whole host of process­es that are not direct­ly relat­ed to speech per­cep­tion. So as I said before, open-set tasks require pro­duc­tion, so if indi­vid­u­als have a lan­guage impair­ment such as apha­sia, they would­n’t be able to do that task. Sen­tence-based tasks require work­ing mem­o­ry depend­ing on how hard or syn­tac­ti­cal­ly com­plex the sen­tences is, and also it relies on con­text. So indi­vid­u­als can use con­text to infer maybe upcom­ing words. So again, it’s not direct­ly tap­ping speech perception.

Jason Geller:
So what we need is a closed-set task that bet­ter approx­i­mates every­day lis­ten­ing sit­u­a­tions. So in every­day lis­ten­ing sit­u­a­tions, there’s lex­i­cal com­pe­ti­tion, so rep­re­sen­ta­tions are bat­tling each oth­er for selec­tion, and then there’s also talk­er vari­abil­i­ty. So dif­fer­ent talk­ers, and also speech might be accent­ed or not, so we have to take that into account. With those goals in mind, we set out to cre­ate a task called the Iowa Test of Con­so­nant Per­cep­tion that would hope­ful­ly meet those goals. This par­tic­u­lar task is a four alter­na­tive word choice closed-set task. There’s 120 tar­get words, and each tar­get word belongs to a set, and with­in that set, it appears both as a tar­get and a foil. We record­ed each tar­get word with four speak­ers, so two women, two males, and all of the foils are min­i­mal pairs dif­fer­ent by the first con­so­nant. And the noise, we use a mul­ti-speak­er bab­ble. So this is an exam­ple of the mul­ti-speak­er babble.

[crosstalk 00:04:19]

Jason Geller:
What I want to point out here is that all of the analy­sis scripts, mate­ri­als and data for the Iowa Test of Con­so­nant Per­cep­tion are avail­able at our OSF page, so we’re hop­ing that indi­vid­u­als could use this to repli­cate our results here or roll their own Iowa Test of Con­so­nant Perception.

Jason Geller:
So, when we start­ed this val­i­da­tion project, we weren’t in a pan­dem­ic, so data col­lec­tion was going pret­ty well. And then the pan­dem­ic hap­pened and metaphor­i­cal­ly speak­ing, peo­ple left the bar. We could­n’t have peo­ple in the lab any­more, so we kind of had to decide on an alter­na­tive. And I decid­ed that we should try to val­i­date this online. So as Bob Dylan said, “The times they are a‑changing.” And more and more researchers are putting their exper­i­ments online. And a lot of audi­to­ry researchers, as we have heard today are also tak­ing the research online. So I thought that it would be per­fect to try to val­i­date this online.

Jason Geller:
So for the pro­ce­dure, we had two ses­sions and these were spaced one week apart and we used Goril­la as our exper­i­men­tal and host­ing plat­form. And we use Pro­lif­ic as our recruit­ment platform.

Jason Geller:
So in ses­sion one, we had 199 par­tic­i­pants and indi­vid­u­als first did a head­phone screen­er. So we used the [Woods At All 00:05:39] head­phone screen­er that Rachel talked about. Then after that, they did the Iowa Test of Con­so­nant Per­cep­tion, and this was 240 tri­als with two speak­ers. Then after that, they did the Con­so­nant-Nucle­us-Con­so­nant test, which is a hun­dred words in noise. And the rea­son why we chose this par­tic­u­lar test is because it’s what’s being used in Uni­ver­si­ty of Iowa Hos­pi­tals. So we want­ed to look at cor­re­la­tions between this and anoth­er test.

Jason Geller:
In ses­sion two, 98 par­tic­i­pants returned. The attri­tion rate is not the great­est, but it is what it is. For ses­sion two, indi­vid­u­als had to com­plete a head­phone screen­er again. Then they were giv­en the Iowa Test of Con­so­nant Per­cep­tion again. This is 240 tri­als and we chose two dif­fer­ent speak­ers. And the rea­son why we had two dif­fer­ent speak­ers is so there was­n’t any learn­ing affects. After this, they did the AZbio, which is just 20 sen­tences in noise. And again, we’re using this AZbio test because it’s what’s being used at the Uni­ver­si­ty of Iowa hos­pi­tals and the clin­ics. Then after this, they did some demographics.

Jason Geller:
So what did the par­tic­i­pants actu­al­ly see? So all of these are avail­able on open mate­ri­als, so why don’t I just show you? So first, let’s look at the CNC task and what they did.

[crosstalk 00:06:55] talk [crosstalk 00:06:55].

Jason Geller:
Yeah. So there’s a fix­a­tion cross, and then there’s a word inter­spersed in that noise and you just have to type in what you thought you heard.

[crosstalk 00:07:05] cake [crosstalk 00:07:07].

Jason Geller:
Again. And the AZbio is very sim­i­lar, but instead of a word, there’s a sen­tence and they had to type out the sen­tence that they thought that they heard. For the ITCP, which we have a code name for, is isn’t, and this is very similar.

[crosstalk 00:07:25]

Jason Geller:
So they hear the word and noise and then there’s four choic­es for them to choose from. And this is the prac­tice tri­al, so there’s feed­back, but they would pick maybe that they heard gone, and that’s incor­rect. So, that is what these tasks look like online.

Jason Geller:
Okay, so back to the pre­sen­ta­tion. So before I get into the val­i­da­tion piece, what we want­ed to do was pilot the stim­uli. So what we did is we ran a study with 50 par­tic­i­pants and we assessed all of these words just in silence so we could get kind of a over­all intel­li­gi­bil­i­ty of these stim­uli. And over­all accu­ra­cy was about 95%, so that’s good.

Jason Geller:
Now let’s get into the val­i­da­tion piece. So what we real­ly want­ed to know was, what is the reli­a­bil­i­ty of the ITCP? And we did this by look­ing at test-retest. So we had indi­vid­u­als come in dur­ing ses­sion one to do the ITCP and then a week lat­er they did the ITCP again. So using the inter-class cor­re­la­tion, which is a mea­sure of agree­ment, we get high reli­a­bil­i­ty. So 0.8, which is good. And this is kind of just a scat­ter cloud of ses­sion one of the ITCP and ses­sion two of the ITCP, and we can see that there’s kind of this pos­i­tive large correlation.

Jason Geller:
We were also inter­est­ed in just look­ing at how the ITCP cor­re­lates with the oth­er tasks that we had them do. So for this, we looked at ses­sion one of the ITCP and the CNC and what we observed is a cor­re­la­tion of 0.54, and this is actu­al­ly a robust mea­sure of cor­re­la­tion, so it’s per­cent­age bend, which takes into account some of these out­liers. And we get a cor­re­la­tion of about 0.54. While it’s pos­i­tive and fair­ly large by con­ven­tion­al stan­dards, it’s not real­ly psy­cho­me­t­ri­cal­ly where we want­ed to be, which is unfortunate.

Jason Geller:
And then we also did the same thing for AZbio. So again, we see the scat­ter plot here. We see that there’s a pos­i­tive cor­re­la­tion and it’s fair­ly large, so it’s 0.59. But again, it’s not where we want it psychometrically.

Jason Geller:
In addi­tion to this val­i­da­tion piece, we also did some explorato­ry work where we looked at how things like talk­er and vow­el con­text and man­ner and place affect accu­ra­cy. And unfor­tu­nate­ly, I can’t talk about that research today, but what I do want to talk a lit­tle bit about is kind of this IRT one para­me­ter Rausch mod­el that we fit, which we extract­ed all of the item eas­i­ness esti­mates from. So we can see here. So the pal­let is not as nice as Vio­let’s, but I still like the pal­lets here. And we can see that all these items kind of fall with­in kind of the sweet spot of one to neg­a­tive one. So there’s not real­ly items that are too hard or too easy, which is some­thing that we want. And I want to stress this, that we want­ed to pro­vide some­thing like this so researchers could use this, and roll their own ITCP, so maybe exclude or include cer­tain items. So hope­ful­ly that will be use­ful to folks that want to do some of the speech and noise work.

Jason Geller:
So to kind of sum up, we see that the ITCP is high­ly reli­able. So we had an ICC of about 0.8. The valid­i­ty mea­sures, I think that there that’s an open ques­tion and I think we need to do more work. So as kind of next steps, we want to look at the val­i­da­tion in the lab. As I men­tioned ear­li­er, we were already start­ing to val­i­date this in the lab and then we had to stop doing that. But the data looks pret­ty good and it’s pret­ty com­pa­ra­ble from what we’re observ­ing online, so that’s what we want to see.

Jason Geller:
One thing that I would be real­ly inter­est­ed in look­ing at is doing a val­i­da­tion of this study with indi­vid­u­als with hear­ing impair­ment, so hear­ing aid users and cochlear implant users. I think that’d be real­ly inter­est­ing if we can actu­al­ly have them stay home, they don’t have to come into the clin­ic, and they can just do this task online and we can use their infor­ma­tion like that.

Jason Geller:
And then last­ly, we want to use this exper­i­men­tal­ly. So we want to do eye track­ing research, EEG and PET research. And that’s all being planned out right now at the Uni­ver­si­ty of Iowa. So we’re real­ly look­ing for­ward to the results that are going to come out from this.

Jason Geller:
So, I want to end this by giv­ing some advice that I wish I had when I first start­ed these mul­ti-day stud­ies. So, it’s real­ly, real­ly hard to do these mul­ti-day stud­ies. There’s lots of attri­tion. So I wish I would’ve known of these things going into it, which I did not. So one kind of piece of advice is to give bonus­es for com­plet­ing the sec­ond ses­sion. So you need to set up sep­a­rate stud­ies on your recruit­ment plat­form and then just offer bonus­es for them to fin­ish the sec­ond task. I think that real­ly incen­tivizes folks to come back for the sec­ond test. I first did this with just hav­ing every­thing as one ses­sion and it end­ed hor­ri­bly. There was lots of peo­ple tak­ing it and not com­ing back for the sec­ond ses­sion, so that real­ly hurt my numbers.

Jason Geller:
It’s very impor­tant that you’re explic­it in your study descrip­tion. So you need to lay out exact­ly what you want the par­tic­i­pants to do. And also, so there’s no ambi­gu­i­ty when par­tic­i­pants email you and say that there was some issues with the exper­i­ment or they did­n’t do the sec­ond part, or can I do the sec­ond part? You just need to be explic­it. Very impor­tant is to email sub­jects mul­ti­ple times to remind them of an upcom­ing ses­sion. I don’t know if Pro­lif­ic fixed this, but it was very hard to just let par­tic­i­pants that you want­ed to email sep­a­rate­ly. You had to email every­one that par­tic­i­pat­ed in your study, which is not ideal.

Jason Geller:
And then last­ly, just try to make your exper­i­ment a rea­son­able length. So for this par­tic­u­lar project, each ses­sion took about 40 min­utes and real­ly that’s not ide­al. You want to make sure that it’s man­age­able for them to com­plete, and they’re not bored, or they don’t lose moti­va­tion. So maybe if I had to do this again, and I prob­a­bly would­n’t have it be so long or I’d spread it out over mul­ti­ple days so it’s in a rea­son­able length. So that’s kind of my advice or things that I wish I knew when I first start­ed this mul­ti-day exper­i­ments. And with that, thank you. And I look for­ward to your questions.

Speak­er 3:
That was fan­tas­tic, Jason, thank you so much. As always with your work, I’m just impressed with such top-notch empir­i­cal meth­ods, and what a deep com­mit­ment to open mate­ri­als as well. It’s just won­der­ful. We might have time for one quick ques­tion. Again, we can also use the chat and the Q&A forum and time and Gath­er Town.

Speak­er 3:
Okay. Christi­na, you can share the slides. One thing that struck me dur­ing your talk, Jason, and some­thing that I think all of us say, we talk about val­i­dat­ing what we see online with what we see in the lab and to some degree, I think it’s inter­est­ing that that isn’t reversed. That we’re not kind of refram­ing the nar­ra­tive that why should­n’t we be val­i­dat­ing what we see in the lab to a bit more nat­ur­al envi­ron­ment? Real­ly great work.


