Jason Geller, Rutgers University
@jgeller_phd
Full Transcript:
Jason Geller:
So first off, I want to thank Rachel for inviting me to speak on this panel. For the past couple of years, I’ve been an observant of The BeOnline Conference, so it’s really awesome to actually be a participant and talking about auditory research online. So today, I’m going to be talking to you about a project that I started working on when I was a postdoc at the University of Iowa. And while there, myself and also colleagues developed the task called the Iowa Test of Consonant Perception, and what we did is we tried to validate it. So that is what I’m going to be talking to you about today.
Jason Geller:
So to start, I want you to imagine that you’re at a bar, pre pandemic, and you’re having a conversation. You’re trading conversation back and forth, and while you’re doing this there’s traffic coming into your domain, there’s a blaring music, and there are other people talking. So a critical question for speech perception is how are we able to attend to the conversation that we’re having with the people close to us while ignoring all this extraneous noise that’s also occurring concurrently while we’re trying to have this conversation? So this is kind of the classic cocktail party problem.
Jason Geller:
So one way that we can kind of assess this speech and noise issue is by using speech and noise tasks. This is what audiologists and also laboratory researchers use, and they come in two flavors, there is an open-set tasks and then there’s closed-set tasks, and underneath those dimensions there are single word recognition tasks, as well as sentence based tasks. And what would a participant see in an open-set task? So let me play you an example of that.
recording:
[crosstalk 00:01:50].
Jason Geller:
So a word or a sentence would be spliced into that multi-speaker babble and individuals would have to kind of search their mental lexicon, choose the word they think they heard, and then they have to produce it. And if you weren’t able to hear what that word was in the multi-speaker babble, it was ball.
Jason Geller:
In contrast, closed-set tasks don’t have a production element, the same as the open-set tasks. Instead, they usually have a forced choice task where they’re presented with several options and they have to choose which one they think it is. And like I said before, it was ball that was interspersed into that speech and noise.
Jason Geller:
So generally speaking, sentence-based, open-set tasks are generally preferred as they’re more representative of everyday listening situations, so they’re more ecologically valid. However, open-set tasks are difficult to use experimentally, right? So a sentence-based open-set task would engage a whole host of processes that are not directly related to speech perception. So as I said before, open-set tasks require production, so if individuals have a language impairment such as aphasia, they wouldn’t be able to do that task. Sentence-based tasks require working memory depending on how hard or syntactically complex the sentences is, and also it relies on context. So individuals can use context to infer maybe upcoming words. So again, it’s not directly tapping speech perception.
Jason Geller:
So what we need is a closed-set task that better approximates everyday listening situations. So in everyday listening situations, there’s lexical competition, so representations are battling each other for selection, and then there’s also talker variability. So different talkers, and also speech might be accented or not, so we have to take that into account. With those goals in mind, we set out to create a task called the Iowa Test of Consonant Perception that would hopefully meet those goals. This particular task is a four alternative word choice closed-set task. There’s 120 target words, and each target word belongs to a set, and within that set, it appears both as a target and a foil. We recorded each target word with four speakers, so two women, two males, and all of the foils are minimal pairs different by the first consonant. And the noise, we use a multi-speaker babble. So this is an example of the multi-speaker babble.
recording:
[crosstalk 00:04:19]
Jason Geller:
What I want to point out here is that all of the analysis scripts, materials and data for the Iowa Test of Consonant Perception are available at our OSF page, so we’re hoping that individuals could use this to replicate our results here or roll their own Iowa Test of Consonant Perception.
Jason Geller:
So, when we started this validation project, we weren’t in a pandemic, so data collection was going pretty well. And then the pandemic happened and metaphorically speaking, people left the bar. We couldn’t have people in the lab anymore, so we kind of had to decide on an alternative. And I decided that we should try to validate this online. So as Bob Dylan said, “The times they are a‑changing.” And more and more researchers are putting their experiments online. And a lot of auditory researchers, as we have heard today are also taking the research online. So I thought that it would be perfect to try to validate this online.
Jason Geller:
So for the procedure, we had two sessions and these were spaced one week apart and we used Gorilla as our experimental and hosting platform. And we use Prolific as our recruitment platform.
Jason Geller:
So in session one, we had 199 participants and individuals first did a headphone screener. So we used the [Woods At All 00:05:39] headphone screener that Rachel talked about. Then after that, they did the Iowa Test of Consonant Perception, and this was 240 trials with two speakers. Then after that, they did the Consonant-Nucleus-Consonant test, which is a hundred words in noise. And the reason why we chose this particular test is because it’s what’s being used in University of Iowa Hospitals. So we wanted to look at correlations between this and another test.
Jason Geller:
In session two, 98 participants returned. The attrition rate is not the greatest, but it is what it is. For session two, individuals had to complete a headphone screener again. Then they were given the Iowa Test of Consonant Perception again. This is 240 trials and we chose two different speakers. And the reason why we had two different speakers is so there wasn’t any learning affects. After this, they did the AZbio, which is just 20 sentences in noise. And again, we’re using this AZbio test because it’s what’s being used at the University of Iowa hospitals and the clinics. Then after this, they did some demographics.
Jason Geller:
So what did the participants actually see? So all of these are available on open materials, so why don’t I just show you? So first, let’s look at the CNC task and what they did.
recording:
[crosstalk 00:06:55] talk [crosstalk 00:06:55].
Jason Geller:
Yeah. So there’s a fixation cross, and then there’s a word interspersed in that noise and you just have to type in what you thought you heard.
recording:
[crosstalk 00:07:05] cake [crosstalk 00:07:07].
Jason Geller:
Again. And the AZbio is very similar, but instead of a word, there’s a sentence and they had to type out the sentence that they thought that they heard. For the ITCP, which we have a code name for, is isn’t, and this is very similar.
recording:
[crosstalk 00:07:25]
Jason Geller:
So they hear the word and noise and then there’s four choices for them to choose from. And this is the practice trial, so there’s feedback, but they would pick maybe that they heard gone, and that’s incorrect. So, that is what these tasks look like online.
Jason Geller:
Okay, so back to the presentation. So before I get into the validation piece, what we wanted to do was pilot the stimuli. So what we did is we ran a study with 50 participants and we assessed all of these words just in silence so we could get kind of a overall intelligibility of these stimuli. And overall accuracy was about 95%, so that’s good.
Jason Geller:
Now let’s get into the validation piece. So what we really wanted to know was, what is the reliability of the ITCP? And we did this by looking at test-retest. So we had individuals come in during session one to do the ITCP and then a week later they did the ITCP again. So using the inter-class correlation, which is a measure of agreement, we get high reliability. So 0.8, which is good. And this is kind of just a scatter cloud of session one of the ITCP and session two of the ITCP, and we can see that there’s kind of this positive large correlation.
Jason Geller:
We were also interested in just looking at how the ITCP correlates with the other tasks that we had them do. So for this, we looked at session one of the ITCP and the CNC and what we observed is a correlation of 0.54, and this is actually a robust measure of correlation, so it’s percentage bend, which takes into account some of these outliers. And we get a correlation of about 0.54. While it’s positive and fairly large by conventional standards, it’s not really psychometrically where we wanted to be, which is unfortunate.
Jason Geller:
And then we also did the same thing for AZbio. So again, we see the scatter plot here. We see that there’s a positive correlation and it’s fairly large, so it’s 0.59. But again, it’s not where we want it psychometrically.
Jason Geller:
In addition to this validation piece, we also did some exploratory work where we looked at how things like talker and vowel context and manner and place affect accuracy. And unfortunately, I can’t talk about that research today, but what I do want to talk a little bit about is kind of this IRT one parameter Rausch model that we fit, which we extracted all of the item easiness estimates from. So we can see here. So the pallet is not as nice as Violet’s, but I still like the pallets here. And we can see that all these items kind of fall within kind of the sweet spot of one to negative one. So there’s not really items that are too hard or too easy, which is something that we want. And I want to stress this, that we wanted to provide something like this so researchers could use this, and roll their own ITCP, so maybe exclude or include certain items. So hopefully that will be useful to folks that want to do some of the speech and noise work.
Jason Geller:
So to kind of sum up, we see that the ITCP is highly reliable. So we had an ICC of about 0.8. The validity measures, I think that there that’s an open question and I think we need to do more work. So as kind of next steps, we want to look at the validation in the lab. As I mentioned earlier, we were already starting to validate this in the lab and then we had to stop doing that. But the data looks pretty good and it’s pretty comparable from what we’re observing online, so that’s what we want to see.
Jason Geller:
One thing that I would be really interested in looking at is doing a validation of this study with individuals with hearing impairment, so hearing aid users and cochlear implant users. I think that’d be really interesting if we can actually have them stay home, they don’t have to come into the clinic, and they can just do this task online and we can use their information like that.
Jason Geller:
And then lastly, we want to use this experimentally. So we want to do eye tracking research, EEG and PET research. And that’s all being planned out right now at the University of Iowa. So we’re really looking forward to the results that are going to come out from this.
Jason Geller:
So, I want to end this by giving some advice that I wish I had when I first started these multi-day studies. So, it’s really, really hard to do these multi-day studies. There’s lots of attrition. So I wish I would’ve known of these things going into it, which I did not. So one kind of piece of advice is to give bonuses for completing the second session. So you need to set up separate studies on your recruitment platform and then just offer bonuses for them to finish the second task. I think that really incentivizes folks to come back for the second test. I first did this with just having everything as one session and it ended horribly. There was lots of people taking it and not coming back for the second session, so that really hurt my numbers.
Jason Geller:
It’s very important that you’re explicit in your study description. So you need to lay out exactly what you want the participants to do. And also, so there’s no ambiguity when participants email you and say that there was some issues with the experiment or they didn’t do the second part, or can I do the second part? You just need to be explicit. Very important is to email subjects multiple times to remind them of an upcoming session. I don’t know if Prolific fixed this, but it was very hard to just let participants that you wanted to email separately. You had to email everyone that participated in your study, which is not ideal.
Jason Geller:
And then lastly, just try to make your experiment a reasonable length. So for this particular project, each session took about 40 minutes and really that’s not ideal. You want to make sure that it’s manageable for them to complete, and they’re not bored, or they don’t lose motivation. So maybe if I had to do this again, and I probably wouldn’t have it be so long or I’d spread it out over multiple days so it’s in a reasonable length. So that’s kind of my advice or things that I wish I knew when I first started this multi-day experiments. And with that, thank you. And I look forward to your questions.
Speaker 3:
That was fantastic, Jason, thank you so much. As always with your work, I’m just impressed with such top-notch empirical methods, and what a deep commitment to open materials as well. It’s just wonderful. We might have time for one quick question. Again, we can also use the chat and the Q&A forum and time and Gather Town.
Speaker 3:
Okay. Christina, you can share the slides. One thing that struck me during your talk, Jason, and something that I think all of us say, we talk about validating what we see online with what we see in the lab and to some degree, I think it’s interesting that that isn’t reversed. That we’re not kind of reframing the narrative that why shouldn’t we be validating what we see in the lab to a bit more natural environment? Really great work.