Face mask type affects audio­vi­su­al speech intel­li­gi­bil­i­ty and sub­jec­tive lis­ten­ing effort in young and old­er adults.

Vio­let A. Brown, Wash­ing­ton Uni­ver­si­ty in St. Louis
@violetsarebrown

YouTube

By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

Full Tran­script:

Vio­let:
Hi every­one. So today I’m excit­ed to tell you about some neat, very rel­e­vant data that we col­lect­ed online using Goril­la where we looked at how face masks affect speech intel­li­gi­bil­i­ty. I think the clear­est way to intro­duce this top­ic is to sim­ply say, speech is hard. When you’re lis­ten­ing to con­tin­u­ous speech, as we all are right now, it real­ly seems like these words I’m say­ing are these beau­ti­ful dis­crete units that go togeth­er like beads on a string or words on a page. But in real­i­ty, that is not at all the case. In real­i­ty, the acoustic input is real­ly messy. For exam­ple, there aren’t clear paus­es between words when we speak, which you’ve prob­a­bly noticed if you’ve ever heard peo­ple speak a lan­guage you’re unfa­mil­iar with, right? It just sounds like this con­tin­u­ous stream. And you’d be hard pressed to pick out par­tic­u­lar words, but mak­ing this even more com­pli­cat­ed speech often occurs in back­ground noise, whether it’s the whirring of your com­put­er fan or the sound of peo­ple talk­ing in the oth­er room, back­ground noise makes it much hard­er to under­stand what’s being said.

Vio­let:
Now you might be look­ing at this visu­al and think­ing, wait, that’s not that hard. I can total­ly read that. But that’s obvi­ous­ly because I wrote out what’s being said in white text, and I’ve dis­tin­guished between these com­pet­ing inputs by label­ing them with dif­fer­ent col­ors. But the real input that hits your ears is more like this. It’s hard, right? You can’t read that.

Vio­let:
So I can talk about my speech speech is hard all day, but in the inter­est of time, I’m going to move on to one of the cues we can use to help us deal with this messy acoustic input. And that is visu­al cues pro­vid­ed by the talk­ing face. So for many peo­ple being able to see the per­son who’s talk­ing could help us deal with this messy acoustic input. I’m going to play a clip for you. That includes bab­bly back­ground noise, like what you’d hear in a restau­rant. That’ll play for a few sec­onds and then you’ll hear a wom­an’s voice read­ing a pas­sage, and then you’ll be able to see her face as well. And I just want you to watch and lis­ten and notice how your expe­ri­ence of hear­ing her changes when you can see her face.

Vio­let:
It makes a big dif­fer­ence, right? Giv­en that we’re online, there actu­al­ly might’ve been some audio visu­al asyn­chrony, but if the sig­nals are lined up, see­ing the talk­er real­ly helps a ton. So I’ve talked about how speech is hard because the acoustic input is messy and how we can use visu­al cues to help us over­come that messy bot­tom of input. But you’ve prob­a­bly noticed that it’s espe­cial­ly dif­fi­cult to under­stand what some­one is say­ing when they’re wear­ing a face mask and the rea­son face masks make it so much hard­er is that they inter­fere with the clar­i­ty of both the audi­to­ry and the visu­al sig­nals, which I just told you are inte­gral in under­stand­ing speech. As I’m sure every­one’s expe­ri­enced, face masks make that speech sound real­ly muf­fled and they reduce the ampli­tude of the speech across fre­quen­cies. So here’s a plot from our exper­i­ment. It’s a bit of a spoil­er in terms of which masks we test­ed, but we’re going to roll with it anyway.

Vio­let:
So this is show­ing the aver­age ampli­tude of speech pro­duced in five dif­fer­ent face masks con­di­tions across a range of fre­quen­cies. The col­or box­es on the right side are me wear­ing those face masks. That top line is the no mask con­di­tion. The next line is the sur­gi­cal mask. The next two are a cloth mask. The top one is with­out a fil­ter and the bot­tom one is with a paper fil­ter. And the bot­tom one is a trans­par­ent mask. This is a clear plas­tic win­dow, so you can see the talk­er’s mouth. What I want you to take away from this fig­ure is that all of the face masks are atten­u­at­ing some fre­quen­cies, espe­cial­ly those high fre­quen­cies, but the par­tic­u­lar masks dif­fer a lot in how they affect the acoustics. The oth­er thing I want you to notice from this fig­ure is that the face masks occlude my mouth.

Vio­let:
So you can’t use those visu­al speech cues that I just told you, it can be real­ly help­ful. Okay, so those are the five face mask con­di­tions we test­ed. And so here’s what we did. We pre­sent­ed online sam­ples of 180 young and 180 old­er adults with 150 sen­tences each and each sen­tence con­tains four key­words that are scored for accu­ra­cy. So peo­ple just type in a text box, what they think the sen­tence was.

Vio­let:
The sen­tences occurred in one of five face masks, in either qui­et, mod­er­ate lev­els of noise, or a lot of noise. And for this we used pink noise, which is like white noise, but dif­fer­ent fre­quen­cies. And then after every block of 10 sen­tences, par­tic­i­pants were asked to rate their sub­jec­tive lis­ten­ing effort. Basi­cal­ly, this is a ques­tion that asks them how hard they had to work to achieve what­ev­er lev­el of per­for­mance they achieved.

Vio­let:
So before mov­ing on, I want to note here that as Rachel was already say­ing, when you’re con­duct­ing speech research online, you want to ensure that your par­tic­i­pants can actu­al­ly hear what’s being said, when you bring peo­ple in the lab, you have a ton of con­trol over their lis­ten­ing envi­ron­ment. And that’s not the case online. You have no idea what kind of head­phone they’re using, if any, whether there’s back­ground noise, how loud their vol­ume is and so on.

Vio­let:
So what researchers often do is pro­ceed the main speech task with a head­phone check. And, and as we already talked about, that’s a task that’s real­ly dif­fi­cult to pass if you’re not wear­ing head­phones, but we actu­al­ly opt­ed not to do that in this exper­i­ment because I’ve used them before. And they’re awe­some, but they often catch peo­ple who are wear­ing head­phones. So they’re a lit­tle bit con­ser­v­a­tive, not nec­es­sar­i­ly too con­ser­v­a­tive depend­ing on what you need, but it’s a huge has­sle to deal with cor­re­spon­dence from participants.

Vio­let:
And giv­en that we were col­lect­ing data from 400 peo­ple, I did­n’t want to deal with that. So instead we made it clear from the begin­ning that par­tic­i­pants should wear head­phones to com­plete the task. And then after they com­plet­ed the exper­i­ment, we asked them what kind of out­put device they used. We told them it would not affect their pay­ment. And then we just exclud­ed peo­ple who report­ed using exter­nal speak­ers rather than head­phones. I haven’t actu­al­ly crunched the num­bers to see what pro­por­tion of peo­ple it exclud­ed as opposed to that more tra­di­tion­al head­phone check. But in my expe­ri­ence, it seemed about the same, if not small­er and a huge plus that I did­n’t have to deal with a mil­lion emails and allow­ing peo­ple to restart the experiment.

Vio­let:
Okay. So I men­tioned, we con­duct­ed this study on young and old­er adults. And the rea­son we did that is we expect­ed that old­er adults might be more effect­ed by face masks than young adults. So they would have a hard­er time deal­ing with back­ground noise and with face masks, but we actu­al­ly found no evi­dence for that. So these are the mean intel­li­gi­bil­i­ty scores and sub­jec­tive effort rat­ings col­lapsed across all con­di­tions for young and old­er adults. And you can see that old­er adults had slight­ly poor intel­li­gi­bil­i­ty and more sub­jec­tive­ly rat­ed effort, but that effect is tiny. And the impor­tant thing to note, which you can’t see from these num­bers, of course, because it’s col­lapsed is that there were no inter­ac­tions with age. So old­er adults were not more effect­ed by face masks or back­ground noise, which is sur­pris­ing. So in the­o­ry, we could have pooled the data from the age groups, but I’m going to show it to you sep­a­rate­ly because that’s what we pre-reg­is­tered we would do regard­less of any interactions.

Vio­let:
But first I want to show you the ladies of Inau­gu­ra­tion Day, because I used the Inau­gu­ra­tion Day col­or palette in R, which is awe­some. You should check it out if you haven’t done it. So take note of their out­fits. Okay. So here are the intel­li­gi­bil­i­ty data for the young adults. The key thing to point out here is that in qui­et, face masks don’t do much for intel­li­gi­bil­i­ty, but as soon as you add even a mod­er­ate amount of back­ground noise intel­li­gi­bil­i­ty gets worse in all of the mask con­di­tions, espe­cial­ly the clear plas­tic win­dow trans­par­ent mask. And that effect is even larg­er as you add more back­ground noise. I also want to note that we’re get­ting a ton of sep­a­ra­tion across both mask type and noise lev­el, despite not doing that tra­di­tion­al head­phone check. And so this is show­ing us that maybe we don’t need to be quite as for­mal about our head­phone checks to get insights about some of the effects we’re inter­est­ed in, in speech research.

Vio­let:
That’s of course not to say that there aren’t sit­u­a­tions that war­rant more con­trol over pre­sen­ta­tion of audi­to­ry stim­uli, but at least for a straight­for­ward intel­li­gi­bil­i­ty study, like this one, this seems real­ly promis­ing to me for online audi­to­ry research. And here is the intel­li­gi­bil­i­ty data for old­er adults. If you blinked, you might’ve missed it. And that’s because the pat­tern of results is strik­ing­ly sim­i­lar. Intel­li­gi­bil­i­ty is a tiny bit worse over­all, as I men­tioned a minute ago, but it’s not much. And the pat­tern of results across mask type and noise lev­el is con­sis­tent. So again, here’s young adults and here’s the old­er adults. Here is the same type of plot. But this time, instead of show­ing you intel­li­gi­bil­i­ty, I’m show­ing you sub­jec­tive lis­ten­ing effort, rat­ings, these data mir­ror the intel­li­gi­bil­i­ty data real­ly nice­ly. For the most part, the con­di­tions in which peo­ple per­form the worst are the same ones in which peo­ple rat­ed the task as being sub­jec­tive­ly more difficult.

Vio­let:
But there’s one key dif­fer­ence here between the intel­li­gi­bil­i­ty data and the effort data that I think is worth point­ing out. So on the left side here, I did­n’t change any­thing. This is the sub­jec­tive effort data and on the right I’ve over­laid the intel­li­gi­bil­i­ty data. This is for young adults and this is just in qui­et. So what you can see is that intel­li­gi­bil­i­ty did­n’t dif­fer in qui­et, right? So every­one is per­form­ing basi­cal­ly at a hun­dred per­cent at ceil­ing. But if you look at the effort rat­ings, peo­ple, we are see­ing a lit­tle bit of sep­a­ra­tion across mask types there. So even though peo­ple were per­form­ing at the same lev­el in qui­et, regard­less of face masks, they rat­ed some masks, par­tic­u­lar­ly the trans­par­ent mask and the cloth masks as effort­ful to process.

Vio­let:
So this is a nice demon­stra­tion that accu­ra­cy and sub­jec­tive effort aren’t nec­es­sar­i­ly the same. And that’s an impor­tant point for the lis­ten­er, for the lis­ten­er’s expe­ri­ence and for clin­i­cians who might be try­ing to fig­ure out what to do if some­body is hav­ing a dif­fi­cult time rec­og­niz­ing speech. Here is that cor­re­spond­ing data for the old­er adults. These effort rat­ings across noise lev­els look real­ly sim­i­lar to the rat­ings pro­vid­ed by the young adults. And here’s is the effort rat­ing side-by-side with the intel­li­gi­bil­i­ty. And again, these results are real­ly sim­i­lar to the pat­tern in the young adults. Effort rat­ings dif­fer even when intel­li­gi­bil­i­ty does not. And this is for quiet.

Vio­let:
So to recap what I’ve gone over so far, we found that face masks have lit­tle effec­tive on intel­li­gi­bil­i­ty in qui­et, but they can impair intel­li­gi­bil­i­ty by as much as 30% rel­a­tive to speech pro­duced with­out a face mask. If you just add a lit­tle bit of back­ground noise and those impair­ments get even larg­er in large amounts of the back­ground noise. Peo­ple rat­ed the speech pro­duced in face masks as more effort­ful to process than speech pro­duced with­out a face mask, even in qui­et. And again, those are the con­di­tions in which intel­li­gi­bil­i­ty was large­ly unaf­fect­ed by face masks, the trans­par­ent mask and the cloth mask with a fil­ter tend­ed to impair intel­li­gi­bil­i­ty the most. And they result­ed in the high­est sub­jec­tive effort rat­ings. The find­ing about the trans­par­ent mask is inter­est­ing because I spent that time at the begin­ning, telling you that see­ing the talk­er helps, but as the per­son who record­ed these stim­uli, I’m here to tell you that con­den­sa­tion is no joke.

Vio­let:
You real­ly can’t see my mouth very clear­ly in that thing. It’s a lit­tle gross and fog­gy. And so what hap­pens, it seems like the sound atten­u­a­tion caused by that plas­tic win­dow is out­weigh­ing any ben­e­fit you might get from being able to see the talk­er’s mouth. And the last thing to note is that this pat­tern of results was sim­i­lar across age groups. This is some­what sur­pris­ing, but it might be part­ly because these old­er adults could adjust the vol­ume on their out­put devices. So the sig­nal to noise ratio is the same, but if peo­ple are hard of hear­ing, they still might have turned up the volume.

Vio­let:
All par­tic­i­pants had self-report­ed nor­mal hear­ing. And these old­er adults, weren’t very old. The range was 59 to 71. And that’s because it’s hard to col­lect data from much old­er peo­ple online. So this is an instance where it’s pos­si­ble. We would have seen a dif­fer­ent pat­tern of results in a more con­trolled lab set­ting, but the fact that old­er adults are able to do this task almost as well as young adults, again, means that maybe we don’t need to be quite as restric­tive about who we sam­ple for this kind of basic speech intel­li­gi­bil­i­ty study online.

Vio­let:
I’d like to thank my col­lab­o­ra­tors on this project, Kristin Van Engen, who’s my advi­sor and Jonathan Peelle. I also want to thank Goril­la of course, and Pro­lif­ic for mak­ing online data col­lec­tion pos­si­ble. The peo­ple who paid the bills and all of you for lis­ten­ing. I also want to note that all of our stim­uli data, code for analy­sis and pre-reg­is­tra­tion are avail­able at that link, if you’re inter­est­ed. So I’m hap­py to take any ques­tions you might have.

Speak­er 2:
Excel­lent Vio­let, thank you so much. Atten­dees, feel free to drop ques­tions in the chat. And while we wait for that, I have one for you, Vio­let, how did you go about ensur­ing ages? It seems like we might’ve used Pro­lif­ic fil­ters. Did you have any sec­ondary checks?

Vio­let:
Yes, we used pro­lif­ic fil­ters to only include cer­tain age groups. And then we also had a ques­tion­naire at the end. So that same ques­tion­naire we includ­ed where we said, what kind of out­put device did you use? It will not affect your pay­ment. We also ask them their age and some oth­er demo­graph­ic infor­ma­tion. Yeah. And we found one instance where some­body appeared to have lied. We just remove them.

Speak­er 2:
Excel­lent. And you had men­tioned at the begin­ning that there might be some issues in sync­ing audio and video in web-based deliv­ery com­pared to in the lab. On the scale of minor to fatal, what was your expe­ri­ence with get­ting that to sync up?

Vio­let:
That’s a real­ly good ques­tion. I think every­thing worked out pret­ty well. I’ve talked to Goril­la a lit­tle bit about this and it seems okay. I got some mes­sages from par­tic­i­pants say­ing that it would load for a long time, but I think what it’s doing is it’s load­ing the video and then it plays it and it should be synced up. Also, the audio, it’s not like there is sep­a­rate audio and video files. They have been com­bined before­hand. So I think it should be okay. I mean, at least we’re see­ing a sep­a­ra­tion across these masks and from the no mask con­di­tion. So clear­ly it’s synced up enough that you’re get­ting visu­al ben­e­fit from that.

Speak­er 2:
Yeah. Great tip for that type of research to embed the audio and video sig­nals as opposed to pre­sent­ing sep­a­rate ones. Excel­lent. There’s anoth­er ques­tion here. Were there any issues with typos or spelling mis­takes im the key­word typ­ing, did this affect how you could inter­pret the results?

Vio­let:
Yeah. Peo­ple are real­ly messy typers. We told them to just try to type the whole sen­tence and I wrote an R script that does some of the clean­ing up. I know there’s an R pack­age that does that for you. And I did­n’t use it for this just because this project was my baby and I’ve done this kind of thing before. So I want­ed to stick with it before switch­ing to a new method, but we had pre-reg­is­tered cer­tain kinds of typos that we would change. So, like homo­phones, com­mon mis­spellings that are an addi­tion dele­tion sub­sti­tu­tion away from a word, as long as that does­n’t itself form anoth­er word. So if it’s a non-word and they said, if it was stick and they just missed the k, that’s fine, but these are all things we pre-reg­is­tered and I went through and any incor­rect response, just hand scanned and make sure that my R script did­n’t miss any­thing. It was a lot of work and I real­ly need to switch to the oth­er method I think.

Speak­er 2:
Excel­lent. Thank you so much.

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

With thanks to our sponsors!