What Did You Say? A Web-Based Val­i­da­tion of a Speech-In-Noise Task.

Jason Geller, Rut­gers Uni­ver­si­ty
@jgeller_phd

YouTube

By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

Full Tran­script:

Jason Geller:
So first off, I want to thank Rachel for invit­ing me to speak on this pan­el. For the past cou­ple of years, I’ve been an obser­vant of The BeOnline Con­fer­ence, so it’s real­ly awe­some to actu­al­ly be a par­tic­i­pant and talk­ing about audi­to­ry research online. So today, I’m going to be talk­ing to you about a project that I start­ed work­ing on when I was a post­doc at the Uni­ver­si­ty of Iowa. And while there, myself and also col­leagues devel­oped the task called the Iowa Test of Con­so­nant Per­cep­tion, and what we did is we tried to val­i­date it. So that is what I’m going to be talk­ing to you about today.

Jason Geller:
So to start, I want you to imag­ine that you’re at a bar, pre pan­dem­ic, and you’re hav­ing a con­ver­sa­tion. You’re trad­ing con­ver­sa­tion back and forth, and while you’re doing this there’s traf­fic com­ing into your domain, there’s a blar­ing music, and there are oth­er peo­ple talk­ing. So a crit­i­cal ques­tion for speech per­cep­tion is how are we able to attend to the con­ver­sa­tion that we’re hav­ing with the peo­ple close to us while ignor­ing all this extra­ne­ous noise that’s also occur­ring con­cur­rent­ly while we’re try­ing to have this con­ver­sa­tion? So this is kind of the clas­sic cock­tail par­ty problem.

Jason Geller:
So one way that we can kind of assess this speech and noise issue is by using speech and noise tasks. This is what audi­ol­o­gists and also lab­o­ra­to­ry researchers use, and they come in two fla­vors, there is an open-set tasks and then there’s closed-set tasks, and under­neath those dimen­sions there are sin­gle word recog­ni­tion tasks, as well as sen­tence based tasks. And what would a par­tic­i­pant see in an open-set task? So let me play you an exam­ple of that.

record­ing:
[crosstalk 00:01:50].

Jason Geller:
So a word or a sen­tence would be spliced into that mul­ti-speak­er bab­ble and indi­vid­u­als would have to kind of search their men­tal lex­i­con, choose the word they think they heard, and then they have to pro­duce it. And if you weren’t able to hear what that word was in the mul­ti-speak­er bab­ble, it was ball.

Jason Geller:
In con­trast, closed-set tasks don’t have a pro­duc­tion ele­ment, the same as the open-set tasks. Instead, they usu­al­ly have a forced choice task where they’re pre­sent­ed with sev­er­al options and they have to choose which one they think it is. And like I said before, it was ball that was inter­spersed into that speech and noise.

Jason Geller:
So gen­er­al­ly speak­ing, sen­tence-based, open-set tasks are gen­er­al­ly pre­ferred as they’re more rep­re­sen­ta­tive of every­day lis­ten­ing sit­u­a­tions, so they’re more eco­log­i­cal­ly valid. How­ev­er, open-set tasks are dif­fi­cult to use exper­i­men­tal­ly, right? So a sen­tence-based open-set task would engage a whole host of process­es that are not direct­ly relat­ed to speech per­cep­tion. So as I said before, open-set tasks require pro­duc­tion, so if indi­vid­u­als have a lan­guage impair­ment such as apha­sia, they would­n’t be able to do that task. Sen­tence-based tasks require work­ing mem­o­ry depend­ing on how hard or syn­tac­ti­cal­ly com­plex the sen­tences is, and also it relies on con­text. So indi­vid­u­als can use con­text to infer maybe upcom­ing words. So again, it’s not direct­ly tap­ping speech perception.

Jason Geller:
So what we need is a closed-set task that bet­ter approx­i­mates every­day lis­ten­ing sit­u­a­tions. So in every­day lis­ten­ing sit­u­a­tions, there’s lex­i­cal com­pe­ti­tion, so rep­re­sen­ta­tions are bat­tling each oth­er for selec­tion, and then there’s also talk­er vari­abil­i­ty. So dif­fer­ent talk­ers, and also speech might be accent­ed or not, so we have to take that into account. With those goals in mind, we set out to cre­ate a task called the Iowa Test of Con­so­nant Per­cep­tion that would hope­ful­ly meet those goals. This par­tic­u­lar task is a four alter­na­tive word choice closed-set task. There’s 120 tar­get words, and each tar­get word belongs to a set, and with­in that set, it appears both as a tar­get and a foil. We record­ed each tar­get word with four speak­ers, so two women, two males, and all of the foils are min­i­mal pairs dif­fer­ent by the first con­so­nant. And the noise, we use a mul­ti-speak­er bab­ble. So this is an exam­ple of the mul­ti-speak­er babble.

record­ing:
[crosstalk 00:04:19]

Jason Geller:
What I want to point out here is that all of the analy­sis scripts, mate­ri­als and data for the Iowa Test of Con­so­nant Per­cep­tion are avail­able at our OSF page, so we’re hop­ing that indi­vid­u­als could use this to repli­cate our results here or roll their own Iowa Test of Con­so­nant Perception.

Jason Geller:
So, when we start­ed this val­i­da­tion project, we weren’t in a pan­dem­ic, so data col­lec­tion was going pret­ty well. And then the pan­dem­ic hap­pened and metaphor­i­cal­ly speak­ing, peo­ple left the bar. We could­n’t have peo­ple in the lab any­more, so we kind of had to decide on an alter­na­tive. And I decid­ed that we should try to val­i­date this online. So as Bob Dylan said, “The times they are a‑changing.” And more and more researchers are putting their exper­i­ments online. And a lot of audi­to­ry researchers, as we have heard today are also tak­ing the research online. So I thought that it would be per­fect to try to val­i­date this online.

Jason Geller:
So for the pro­ce­dure, we had two ses­sions and these were spaced one week apart and we used Goril­la as our exper­i­men­tal and host­ing plat­form. And we use Pro­lif­ic as our recruit­ment platform.

Jason Geller:
So in ses­sion one, we had 199 par­tic­i­pants and indi­vid­u­als first did a head­phone screen­er. So we used the [Woods At All 00:05:39] head­phone screen­er that Rachel talked about. Then after that, they did the Iowa Test of Con­so­nant Per­cep­tion, and this was 240 tri­als with two speak­ers. Then after that, they did the Con­so­nant-Nucle­us-Con­so­nant test, which is a hun­dred words in noise. And the rea­son why we chose this par­tic­u­lar test is because it’s what’s being used in Uni­ver­si­ty of Iowa Hos­pi­tals. So we want­ed to look at cor­re­la­tions between this and anoth­er test.

Jason Geller:
In ses­sion two, 98 par­tic­i­pants returned. The attri­tion rate is not the great­est, but it is what it is. For ses­sion two, indi­vid­u­als had to com­plete a head­phone screen­er again. Then they were giv­en the Iowa Test of Con­so­nant Per­cep­tion again. This is 240 tri­als and we chose two dif­fer­ent speak­ers. And the rea­son why we had two dif­fer­ent speak­ers is so there was­n’t any learn­ing affects. After this, they did the AZbio, which is just 20 sen­tences in noise. And again, we’re using this AZbio test because it’s what’s being used at the Uni­ver­si­ty of Iowa hos­pi­tals and the clin­ics. Then after this, they did some demographics.

Jason Geller:
So what did the par­tic­i­pants actu­al­ly see? So all of these are avail­able on open mate­ri­als, so why don’t I just show you? So first, let’s look at the CNC task and what they did.

record­ing:
[crosstalk 00:06:55] talk [crosstalk 00:06:55].

Jason Geller:
Yeah. So there’s a fix­a­tion cross, and then there’s a word inter­spersed in that noise and you just have to type in what you thought you heard.

record­ing:
[crosstalk 00:07:05] cake [crosstalk 00:07:07].

Jason Geller:
Again. And the AZbio is very sim­i­lar, but instead of a word, there’s a sen­tence and they had to type out the sen­tence that they thought that they heard. For the ITCP, which we have a code name for, is isn’t, and this is very similar.

record­ing:
[crosstalk 00:07:25]

Jason Geller:
So they hear the word and noise and then there’s four choic­es for them to choose from. And this is the prac­tice tri­al, so there’s feed­back, but they would pick maybe that they heard gone, and that’s incor­rect. So, that is what these tasks look like online.

Jason Geller:
Okay, so back to the pre­sen­ta­tion. So before I get into the val­i­da­tion piece, what we want­ed to do was pilot the stim­uli. So what we did is we ran a study with 50 par­tic­i­pants and we assessed all of these words just in silence so we could get kind of a over­all intel­li­gi­bil­i­ty of these stim­uli. And over­all accu­ra­cy was about 95%, so that’s good.

Jason Geller:
Now let’s get into the val­i­da­tion piece. So what we real­ly want­ed to know was, what is the reli­a­bil­i­ty of the ITCP? And we did this by look­ing at test-retest. So we had indi­vid­u­als come in dur­ing ses­sion one to do the ITCP and then a week lat­er they did the ITCP again. So using the inter-class cor­re­la­tion, which is a mea­sure of agree­ment, we get high reli­a­bil­i­ty. So 0.8, which is good. And this is kind of just a scat­ter cloud of ses­sion one of the ITCP and ses­sion two of the ITCP, and we can see that there’s kind of this pos­i­tive large correlation.

Jason Geller:
We were also inter­est­ed in just look­ing at how the ITCP cor­re­lates with the oth­er tasks that we had them do. So for this, we looked at ses­sion one of the ITCP and the CNC and what we observed is a cor­re­la­tion of 0.54, and this is actu­al­ly a robust mea­sure of cor­re­la­tion, so it’s per­cent­age bend, which takes into account some of these out­liers. And we get a cor­re­la­tion of about 0.54. While it’s pos­i­tive and fair­ly large by con­ven­tion­al stan­dards, it’s not real­ly psy­cho­me­t­ri­cal­ly where we want­ed to be, which is unfortunate.

Jason Geller:
And then we also did the same thing for AZbio. So again, we see the scat­ter plot here. We see that there’s a pos­i­tive cor­re­la­tion and it’s fair­ly large, so it’s 0.59. But again, it’s not where we want it psychometrically.

Jason Geller:
In addi­tion to this val­i­da­tion piece, we also did some explorato­ry work where we looked at how things like talk­er and vow­el con­text and man­ner and place affect accu­ra­cy. And unfor­tu­nate­ly, I can’t talk about that research today, but what I do want to talk a lit­tle bit about is kind of this IRT one para­me­ter Rausch mod­el that we fit, which we extract­ed all of the item eas­i­ness esti­mates from. So we can see here. So the pal­let is not as nice as Vio­let’s, but I still like the pal­lets here. And we can see that all these items kind of fall with­in kind of the sweet spot of one to neg­a­tive one. So there’s not real­ly items that are too hard or too easy, which is some­thing that we want. And I want to stress this, that we want­ed to pro­vide some­thing like this so researchers could use this, and roll their own ITCP, so maybe exclude or include cer­tain items. So hope­ful­ly that will be use­ful to folks that want to do some of the speech and noise work.

Jason Geller:
So to kind of sum up, we see that the ITCP is high­ly reli­able. So we had an ICC of about 0.8. The valid­i­ty mea­sures, I think that there that’s an open ques­tion and I think we need to do more work. So as kind of next steps, we want to look at the val­i­da­tion in the lab. As I men­tioned ear­li­er, we were already start­ing to val­i­date this in the lab and then we had to stop doing that. But the data looks pret­ty good and it’s pret­ty com­pa­ra­ble from what we’re observ­ing online, so that’s what we want to see.

Jason Geller:
One thing that I would be real­ly inter­est­ed in look­ing at is doing a val­i­da­tion of this study with indi­vid­u­als with hear­ing impair­ment, so hear­ing aid users and cochlear implant users. I think that’d be real­ly inter­est­ing if we can actu­al­ly have them stay home, they don’t have to come into the clin­ic, and they can just do this task online and we can use their infor­ma­tion like that.

Jason Geller:
And then last­ly, we want to use this exper­i­men­tal­ly. So we want to do eye track­ing research, EEG and PET research. And that’s all being planned out right now at the Uni­ver­si­ty of Iowa. So we’re real­ly look­ing for­ward to the results that are going to come out from this.

Jason Geller:
So, I want to end this by giv­ing some advice that I wish I had when I first start­ed these mul­ti-day stud­ies. So, it’s real­ly, real­ly hard to do these mul­ti-day stud­ies. There’s lots of attri­tion. So I wish I would’ve known of these things going into it, which I did not. So one kind of piece of advice is to give bonus­es for com­plet­ing the sec­ond ses­sion. So you need to set up sep­a­rate stud­ies on your recruit­ment plat­form and then just offer bonus­es for them to fin­ish the sec­ond task. I think that real­ly incen­tivizes folks to come back for the sec­ond test. I first did this with just hav­ing every­thing as one ses­sion and it end­ed hor­ri­bly. There was lots of peo­ple tak­ing it and not com­ing back for the sec­ond ses­sion, so that real­ly hurt my numbers.

Jason Geller:
It’s very impor­tant that you’re explic­it in your study descrip­tion. So you need to lay out exact­ly what you want the par­tic­i­pants to do. And also, so there’s no ambi­gu­i­ty when par­tic­i­pants email you and say that there was some issues with the exper­i­ment or they did­n’t do the sec­ond part, or can I do the sec­ond part? You just need to be explic­it. Very impor­tant is to email sub­jects mul­ti­ple times to remind them of an upcom­ing ses­sion. I don’t know if Pro­lif­ic fixed this, but it was very hard to just let par­tic­i­pants that you want­ed to email sep­a­rate­ly. You had to email every­one that par­tic­i­pat­ed in your study, which is not ideal.

Jason Geller:
And then last­ly, just try to make your exper­i­ment a rea­son­able length. So for this par­tic­u­lar project, each ses­sion took about 40 min­utes and real­ly that’s not ide­al. You want to make sure that it’s man­age­able for them to com­plete, and they’re not bored, or they don’t lose moti­va­tion. So maybe if I had to do this again, and I prob­a­bly would­n’t have it be so long or I’d spread it out over mul­ti­ple days so it’s in a rea­son­able length. So that’s kind of my advice or things that I wish I knew when I first start­ed this mul­ti-day exper­i­ments. And with that, thank you. And I look for­ward to your questions.

Speak­er 3:
That was fan­tas­tic, Jason, thank you so much. As always with your work, I’m just impressed with such top-notch empir­i­cal meth­ods, and what a deep com­mit­ment to open mate­ri­als as well. It’s just won­der­ful. We might have time for one quick ques­tion. Again, we can also use the chat and the Q&A forum and time and Gath­er Town.

Speak­er 3:
Okay. Christi­na, you can share the slides. One thing that struck me dur­ing your talk, Jason, and some­thing that I think all of us say, we talk about val­i­dat­ing what we see online with what we see in the lab and to some degree, I think it’s inter­est­ing that that isn’t reversed. That we’re not kind of refram­ing the nar­ra­tive that why should­n’t we be val­i­dat­ing what we see in the lab to a bit more nat­ur­al envi­ron­ment? Real­ly great work.

 

Get on the Waitlist

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below and sign up for our newsletter. You will be the first to know when we release new content and open applications for BeOnline 2022.

With thanks to our sponsors!

What Did You Say? A Web-Based Val­i­da­tion of a Speech-In-Noise Task.