Face mask type affects audiovisual speech intelligibility and subjective listening effort in young and older adults.

Violet A. Brown, Washington University in St. Louis
@violetsarebrown

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Always unblock YouTube

Full Transcript:

Violet:
Hi everyone. So today I’m excited to tell you about some neat, very relevant data that we collected online using Gorilla where we looked at how face masks affect speech intelligibility. I think the clearest way to introduce this topic is to simply say, speech is hard. When you’re listening to continuous speech, as we all are right now, it really seems like these words I’m saying are these beautiful discrete units that go together like beads on a string or words on a page. But in reality, that is not at all the case. In reality, the acoustic input is really messy. For example, there aren’t clear pauses between words when we speak, which you’ve probably noticed if you’ve ever heard people speak a language you’re unfamiliar with, right? It just sounds like this continuous stream. And you’d be hard pressed to pick out particular words, but making this even more complicated speech often occurs in background noise, whether it’s the whirring of your computer fan or the sound of people talking in the other room, background noise makes it much harder to understand what’s being said.

Violet:
Now you might be looking at this visual and thinking, wait, that’s not that hard. I can totally read that. But that’s obviously because I wrote out what’s being said in white text, and I’ve distinguished between these competing inputs by labeling them with different colors. But the real input that hits your ears is more like this. It’s hard, right? You can’t read that.

Violet:
So I can talk about my speech speech is hard all day, but in the interest of time, I’m going to move on to one of the cues we can use to help us deal with this messy acoustic input. And that is visual cues provided by the talking face. So for many people being able to see the person who’s talking could help us deal with this messy acoustic input. I’m going to play a clip for you. That includes babbly background noise, like what you’d hear in a restaurant. That’ll play for a few seconds and then you’ll hear a woman’s voice reading a passage, and then you’ll be able to see her face as well. And I just want you to watch and listen and notice how your experience of hearing her changes when you can see her face.

Violet:
It makes a big difference, right? Given that we’re online, there actually might’ve been some audio visual asynchrony, but if the signals are lined up, seeing the talker really helps a ton. So I’ve talked about how speech is hard because the acoustic input is messy and how we can use visual cues to help us overcome that messy bottom of input. But you’ve probably noticed that it’s especially difficult to understand what someone is saying when they’re wearing a face mask and the reason face masks make it so much harder is that they interfere with the clarity of both the auditory and the visual signals, which I just told you are integral in understanding speech. As I’m sure everyone’s experienced, face masks make that speech sound really muffled and they reduce the amplitude of the speech across frequencies. So here’s a plot from our experiment. It’s a bit of a spoiler in terms of which masks we tested, but we’re going to roll with it anyway.

Violet:
So this is showing the average amplitude of speech produced in five different face masks conditions across a range of frequencies. The color boxes on the right side are me wearing those face masks. That top line is the no mask condition. The next line is the surgical mask. The next two are a cloth mask. The top one is without a filter and the bottom one is with a paper filter. And the bottom one is a transparent mask. This is a clear plastic window, so you can see the talker’s mouth. What I want you to take away from this figure is that all of the face masks are attenuating some frequencies, especially those high frequencies, but the particular masks differ a lot in how they affect the acoustics. The other thing I want you to notice from this figure is that the face masks occlude my mouth.

Violet:
So you can’t use those visual speech cues that I just told you, it can be really helpful. Okay, so those are the five face mask conditions we tested. And so here’s what we did. We presented online samples of 180 young and 180 older adults with 150 sentences each and each sentence contains four keywords that are scored for accuracy. So people just type in a text box, what they think the sentence was.

Violet:
The sentences occurred in one of five face masks, in either quiet, moderate levels of noise, or a lot of noise. And for this we used pink noise, which is like white noise, but different frequencies. And then after every block of 10 sentences, participants were asked to rate their subjective listening effort. Basically, this is a question that asks them how hard they had to work to achieve whatever level of performance they achieved.

Violet:
So before moving on, I want to note here that as Rachel was already saying, when you’re conducting speech research online, you want to ensure that your participants can actually hear what’s being said, when you bring people in the lab, you have a ton of control over their listening environment. And that’s not the case online. You have no idea what kind of headphone they’re using, if any, whether there’s background noise, how loud their volume is and so on.

Violet:
So what researchers often do is proceed the main speech task with a headphone check. And, and as we already talked about, that’s a task that’s really difficult to pass if you’re not wearing headphones, but we actually opted not to do that in this experiment because I’ve used them before. And they’re awesome, but they often catch people who are wearing headphones. So they’re a little bit conservative, not necessarily too conservative depending on what you need, but it’s a huge hassle to deal with correspondence from participants.

Violet:
And given that we were collecting data from 400 people, I didn’t want to deal with that. So instead we made it clear from the beginning that participants should wear headphones to complete the task. And then after they completed the experiment, we asked them what kind of output device they used. We told them it would not affect their payment. And then we just excluded people who reported using external speakers rather than headphones. I haven’t actually crunched the numbers to see what proportion of people it excluded as opposed to that more traditional headphone check. But in my experience, it seemed about the same, if not smaller and a huge plus that I didn’t have to deal with a million emails and allowing people to restart the experiment.

Violet:
Okay. So I mentioned, we conducted this study on young and older adults. And the reason we did that is we expected that older adults might be more effected by face masks than young adults. So they would have a harder time dealing with background noise and with face masks, but we actually found no evidence for that. So these are the mean intelligibility scores and subjective effort ratings collapsed across all conditions for young and older adults. And you can see that older adults had slightly poor intelligibility and more subjectively rated effort, but that effect is tiny. And the important thing to note, which you can’t see from these numbers, of course, because it’s collapsed is that there were no interactions with age. So older adults were not more effected by face masks or background noise, which is surprising. So in theory, we could have pooled the data from the age groups, but I’m going to show it to you separately because that’s what we pre-registered we would do regardless of any interactions.

Violet:
But first I want to show you the ladies of Inauguration Day, because I used the Inauguration Day color palette in R, which is awesome. You should check it out if you haven’t done it. So take note of their outfits. Okay. So here are the intelligibility data for the young adults. The key thing to point out here is that in quiet, face masks don’t do much for intelligibility, but as soon as you add even a moderate amount of background noise intelligibility gets worse in all of the mask conditions, especially the clear plastic window transparent mask. And that effect is even larger as you add more background noise. I also want to note that we’re getting a ton of separation across both mask type and noise level, despite not doing that traditional headphone check. And so this is showing us that maybe we don’t need to be quite as formal about our headphone checks to get insights about some of the effects we’re interested in, in speech research.

Violet:
That’s of course not to say that there aren’t situations that warrant more control over presentation of auditory stimuli, but at least for a straightforward intelligibility study, like this one, this seems really promising to me for online auditory research. And here is the intelligibility data for older adults. If you blinked, you might’ve missed it. And that’s because the pattern of results is strikingly similar. Intelligibility is a tiny bit worse overall, as I mentioned a minute ago, but it’s not much. And the pattern of results across mask type and noise level is consistent. So again, here’s young adults and here’s the older adults. Here is the same type of plot. But this time, instead of showing you intelligibility, I’m showing you subjective listening effort, ratings, these data mirror the intelligibility data really nicely. For the most part, the conditions in which people perform the worst are the same ones in which people rated the task as being subjectively more difficult.

Violet:
But there’s one key difference here between the intelligibility data and the effort data that I think is worth pointing out. So on the left side here, I didn’t change anything. This is the subjective effort data and on the right I’ve overlaid the intelligibility data. This is for young adults and this is just in quiet. So what you can see is that intelligibility didn’t differ in quiet, right? So everyone is performing basically at a hundred percent at ceiling. But if you look at the effort ratings, people, we are seeing a little bit of separation across mask types there. So even though people were performing at the same level in quiet, regardless of face masks, they rated some masks, particularly the transparent mask and the cloth masks as effortful to process.

Violet:
So this is a nice demonstration that accuracy and subjective effort aren’t necessarily the same. And that’s an important point for the listener, for the listener’s experience and for clinicians who might be trying to figure out what to do if somebody is having a difficult time recognizing speech. Here is that corresponding data for the older adults. These effort ratings across noise levels look really similar to the ratings provided by the young adults. And here’s is the effort rating side-by-side with the intelligibility. And again, these results are really similar to the pattern in the young adults. Effort ratings differ even when intelligibility does not. And this is for quiet.

Violet:
So to recap what I’ve gone over so far, we found that face masks have little effective on intelligibility in quiet, but they can impair intelligibility by as much as 30% relative to speech produced without a face mask. If you just add a little bit of background noise and those impairments get even larger in large amounts of the background noise. People rated the speech produced in face masks as more effortful to process than speech produced without a face mask, even in quiet. And again, those are the conditions in which intelligibility was largely unaffected by face masks, the transparent mask and the cloth mask with a filter tended to impair intelligibility the most. And they resulted in the highest subjective effort ratings. The finding about the transparent mask is interesting because I spent that time at the beginning, telling you that seeing the talker helps, but as the person who recorded these stimuli, I’m here to tell you that condensation is no joke.

Violet:
You really can’t see my mouth very clearly in that thing. It’s a little gross and foggy. And so what happens, it seems like the sound attenuation caused by that plastic window is outweighing any benefit you might get from being able to see the talker’s mouth. And the last thing to note is that this pattern of results was similar across age groups. This is somewhat surprising, but it might be partly because these older adults could adjust the volume on their output devices. So the signal to noise ratio is the same, but if people are hard of hearing, they still might have turned up the volume.

Violet:
All participants had self-reported normal hearing. And these older adults, weren’t very old. The range was 59 to 71. And that’s because it’s hard to collect data from much older people online. So this is an instance where it’s possible. We would have seen a different pattern of results in a more controlled lab setting, but the fact that older adults are able to do this task almost as well as young adults, again, means that maybe we don’t need to be quite as restrictive about who we sample for this kind of basic speech intelligibility study online.

Violet:
I’d like to thank my collaborators on this project, Kristin Van Engen, who’s my advisor and Jonathan Peelle. I also want to thank Gorilla of course, and Prolific for making online data collection possible. The people who paid the bills and all of you for listening. I also want to note that all of our stimuli data, code for analysis and pre-registration are available at that link, if you’re interested. So I’m happy to take any questions you might have.

Speaker 2:
Excellent Violet, thank you so much. Attendees, feel free to drop questions in the chat. And while we wait for that, I have one for you, Violet, how did you go about ensuring ages? It seems like we might’ve used Prolific filters. Did you have any secondary checks?

Violet:
Yes, we used prolific filters to only include certain age groups. And then we also had a questionnaire at the end. So that same questionnaire we included where we said, what kind of output device did you use? It will not affect your payment. We also ask them their age and some other demographic information. Yeah. And we found one instance where somebody appeared to have lied. We just remove them.

Speaker 2:
Excellent. And you had mentioned at the beginning that there might be some issues in syncing audio and video in web-based delivery compared to in the lab. On the scale of minor to fatal, what was your experience with getting that to sync up?

Violet:
That’s a really good question. I think everything worked out pretty well. I’ve talked to Gorilla a little bit about this and it seems okay. I got some messages from participants saying that it would load for a long time, but I think what it’s doing is it’s loading the video and then it plays it and it should be synced up. Also, the audio, it’s not like there is separate audio and video files. They have been combined beforehand. So I think it should be okay. I mean, at least we’re seeing a separation across these masks and from the no mask condition. So clearly it’s synced up enough that you’re getting visual benefit from that.

Speaker 2:
Yeah. Great tip for that type of research to embed the audio and video signals as opposed to presenting separate ones. Excellent. There’s another question here. Were there any issues with typos or spelling mistakes im the keyword typing, did this affect how you could interpret the results?

Violet:
Yeah. People are really messy typers. We told them to just try to type the whole sentence and I wrote an R script that does some of the cleaning up. I know there’s an R package that does that for you. And I didn’t use it for this just because this project was my baby and I’ve done this kind of thing before. So I wanted to stick with it before switching to a new method, but we had pre-registered certain kinds of typos that we would change. So, like homophones, common misspellings that are an addition deletion substitution away from a word, as long as that doesn’t itself form another word. So if it’s a non-word and they said, if it was stick and they just missed the k, that’s fine, but these are all things we pre-registered and I went through and any incorrect response, just hand scanned and make sure that my R script didn’t miss anything. It was a lot of work and I really need to switch to the other method I think.

Speaker 2:
Excellent. Thank you so much.

Violet A. Brown

Washington University in St. Louis

Follow on Twitter

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

Register Now

Face mask type affects audio­vi­su­al speech intel­li­gi­bil­i­ty and sub­jec­tive lis­ten­ing effort in young and old­er adults.

Full Tran­script:

Violet A. Brown

Washington University in St. Louis

Get on the Registration List

With thanks to our sponsors!

Face mask type affects audiovisual speech intelligibility and subjective listening effort in young and older adults.

Full Transcript: