VioÂlet A. Brown, WashÂingÂton UniÂverÂsiÂty in St. Louis
@violetsarebrown
Full TranÂscript:
VioÂlet:
Hi everyÂone. So today I’m excitÂed to tell you about some neat, very relÂeÂvant data that we colÂlectÂed online using GorilÂla where we looked at how face masks affect speech intelÂliÂgiÂbilÂiÂty. I think the clearÂest way to introÂduce this topÂic is to simÂply say, speech is hard. When you’re lisÂtenÂing to conÂtinÂuÂous speech, as we all are right now, it realÂly seems like these words I’m sayÂing are these beauÂtiÂful disÂcrete units that go togethÂer like beads on a string or words on a page. But in realÂiÂty, that is not at all the case. In realÂiÂty, the acoustic input is realÂly messy. For examÂple, there aren’t clear pausÂes between words when we speak, which you’ve probÂaÂbly noticed if you’ve ever heard peoÂple speak a lanÂguage you’re unfaÂmilÂiar with, right? It just sounds like this conÂtinÂuÂous stream. And you’d be hard pressed to pick out parÂticÂuÂlar words, but makÂing this even more comÂpliÂcatÂed speech often occurs in backÂground noise, whether it’s the whirring of your comÂputÂer fan or the sound of peoÂple talkÂing in the othÂer room, backÂground noise makes it much hardÂer to underÂstand what’s being said.
VioÂlet:
Now you might be lookÂing at this visuÂal and thinkÂing, wait, that’s not that hard. I can totalÂly read that. But that’s obviÂousÂly because I wrote out what’s being said in white text, and I’ve disÂtinÂguished between these comÂpetÂing inputs by labelÂing them with difÂferÂent colÂors. But the real input that hits your ears is more like this. It’s hard, right? You can’t read that.
VioÂlet:
So I can talk about my speech speech is hard all day, but in the interÂest of time, I’m going to move on to one of the cues we can use to help us deal with this messy acoustic input. And that is visuÂal cues proÂvidÂed by the talkÂing face. So for many peoÂple being able to see the perÂson who’s talkÂing could help us deal with this messy acoustic input. I’m going to play a clip for you. That includes babÂbly backÂground noise, like what you’d hear in a restauÂrant. That’ll play for a few secÂonds and then you’ll hear a womÂan’s voice readÂing a pasÂsage, and then you’ll be able to see her face as well. And I just want you to watch and lisÂten and notice how your expeÂriÂence of hearÂing her changes when you can see her face.
VioÂlet:
It makes a big difÂferÂence, right? GivÂen that we’re online, there actuÂalÂly might’ve been some audio visuÂal asynÂchrony, but if the sigÂnals are lined up, seeÂing the talkÂer realÂly helps a ton. So I’ve talked about how speech is hard because the acoustic input is messy and how we can use visuÂal cues to help us overÂcome that messy botÂtom of input. But you’ve probÂaÂbly noticed that it’s espeÂcialÂly difÂfiÂcult to underÂstand what someÂone is sayÂing when they’re wearÂing a face mask and the reaÂson face masks make it so much hardÂer is that they interÂfere with the clarÂiÂty of both the audiÂtoÂry and the visuÂal sigÂnals, which I just told you are inteÂgral in underÂstandÂing speech. As I’m sure everyÂone’s expeÂriÂenced, face masks make that speech sound realÂly mufÂfled and they reduce the ampliÂtude of the speech across freÂquenÂcies. So here’s a plot from our experÂiÂment. It’s a bit of a spoilÂer in terms of which masks we testÂed, but we’re going to roll with it anyway.
VioÂlet:
So this is showÂing the averÂage ampliÂtude of speech proÂduced in five difÂferÂent face masks conÂdiÂtions across a range of freÂquenÂcies. The colÂor boxÂes on the right side are me wearÂing those face masks. That top line is the no mask conÂdiÂtion. The next line is the surÂgiÂcal mask. The next two are a cloth mask. The top one is withÂout a filÂter and the botÂtom one is with a paper filÂter. And the botÂtom one is a transÂparÂent mask. This is a clear plasÂtic winÂdow, so you can see the talkÂer’s mouth. What I want you to take away from this figÂure is that all of the face masks are attenÂuÂatÂing some freÂquenÂcies, espeÂcialÂly those high freÂquenÂcies, but the parÂticÂuÂlar masks difÂfer a lot in how they affect the acoustics. The othÂer thing I want you to notice from this figÂure is that the face masks occlude my mouth.
VioÂlet:
So you can’t use those visuÂal speech cues that I just told you, it can be realÂly helpÂful. Okay, so those are the five face mask conÂdiÂtions we testÂed. And so here’s what we did. We preÂsentÂed online samÂples of 180 young and 180 oldÂer adults with 150 senÂtences each and each senÂtence conÂtains four keyÂwords that are scored for accuÂraÂcy. So peoÂple just type in a text box, what they think the senÂtence was.
VioÂlet:
The senÂtences occurred in one of five face masks, in either quiÂet, modÂerÂate levÂels of noise, or a lot of noise. And for this we used pink noise, which is like white noise, but difÂferÂent freÂquenÂcies. And then after every block of 10 senÂtences, parÂticÂiÂpants were asked to rate their subÂjecÂtive lisÂtenÂing effort. BasiÂcalÂly, this is a quesÂtion that asks them how hard they had to work to achieve whatÂevÂer levÂel of perÂforÂmance they achieved.
VioÂlet:
So before movÂing on, I want to note here that as Rachel was already sayÂing, when you’re conÂductÂing speech research online, you want to ensure that your parÂticÂiÂpants can actuÂalÂly hear what’s being said, when you bring peoÂple in the lab, you have a ton of conÂtrol over their lisÂtenÂing enviÂronÂment. And that’s not the case online. You have no idea what kind of headÂphone they’re using, if any, whether there’s backÂground noise, how loud their volÂume is and so on.
VioÂlet:
So what researchers often do is proÂceed the main speech task with a headÂphone check. And, and as we already talked about, that’s a task that’s realÂly difÂfiÂcult to pass if you’re not wearÂing headÂphones, but we actuÂalÂly optÂed not to do that in this experÂiÂment because I’ve used them before. And they’re aweÂsome, but they often catch peoÂple who are wearÂing headÂphones. So they’re a litÂtle bit conÂserÂvÂaÂtive, not necÂesÂsarÂiÂly too conÂserÂvÂaÂtive dependÂing on what you need, but it’s a huge hasÂsle to deal with corÂreÂsponÂdence from participants.
VioÂlet:
And givÂen that we were colÂlectÂing data from 400 peoÂple, I didÂn’t want to deal with that. So instead we made it clear from the beginÂning that parÂticÂiÂpants should wear headÂphones to comÂplete the task. And then after they comÂpletÂed the experÂiÂment, we asked them what kind of outÂput device they used. We told them it would not affect their payÂment. And then we just excludÂed peoÂple who reportÂed using exterÂnal speakÂers rather than headÂphones. I haven’t actuÂalÂly crunched the numÂbers to see what proÂporÂtion of peoÂple it excludÂed as opposed to that more traÂdiÂtionÂal headÂphone check. But in my expeÂriÂence, it seemed about the same, if not smallÂer and a huge plus that I didÂn’t have to deal with a milÂlion emails and allowÂing peoÂple to restart the experiment.
VioÂlet:
Okay. So I menÂtioned, we conÂductÂed this study on young and oldÂer adults. And the reaÂson we did that is we expectÂed that oldÂer adults might be more effectÂed by face masks than young adults. So they would have a hardÂer time dealÂing with backÂground noise and with face masks, but we actuÂalÂly found no eviÂdence for that. So these are the mean intelÂliÂgiÂbilÂiÂty scores and subÂjecÂtive effort ratÂings colÂlapsed across all conÂdiÂtions for young and oldÂer adults. And you can see that oldÂer adults had slightÂly poor intelÂliÂgiÂbilÂiÂty and more subÂjecÂtiveÂly ratÂed effort, but that effect is tiny. And the imporÂtant thing to note, which you can’t see from these numÂbers, of course, because it’s colÂlapsed is that there were no interÂacÂtions with age. So oldÂer adults were not more effectÂed by face masks or backÂground noise, which is surÂprisÂing. So in theÂoÂry, we could have pooled the data from the age groups, but I’m going to show it to you sepÂaÂrateÂly because that’s what we pre-regÂisÂtered we would do regardÂless of any interactions.
VioÂlet:
But first I want to show you the ladies of InauÂguÂraÂtion Day, because I used the InauÂguÂraÂtion Day colÂor palette in R, which is aweÂsome. You should check it out if you haven’t done it. So take note of their outÂfits. Okay. So here are the intelÂliÂgiÂbilÂiÂty data for the young adults. The key thing to point out here is that in quiÂet, face masks don’t do much for intelÂliÂgiÂbilÂiÂty, but as soon as you add even a modÂerÂate amount of backÂground noise intelÂliÂgiÂbilÂiÂty gets worse in all of the mask conÂdiÂtions, espeÂcialÂly the clear plasÂtic winÂdow transÂparÂent mask. And that effect is even largÂer as you add more backÂground noise. I also want to note that we’re getÂting a ton of sepÂaÂraÂtion across both mask type and noise levÂel, despite not doing that traÂdiÂtionÂal headÂphone check. And so this is showÂing us that maybe we don’t need to be quite as forÂmal about our headÂphone checks to get insights about some of the effects we’re interÂestÂed in, in speech research.
VioÂlet:
That’s of course not to say that there aren’t sitÂuÂaÂtions that warÂrant more conÂtrol over preÂsenÂtaÂtion of audiÂtoÂry stimÂuli, but at least for a straightÂforÂward intelÂliÂgiÂbilÂiÂty study, like this one, this seems realÂly promisÂing to me for online audiÂtoÂry research. And here is the intelÂliÂgiÂbilÂiÂty data for oldÂer adults. If you blinked, you might’ve missed it. And that’s because the patÂtern of results is strikÂingÂly simÂiÂlar. IntelÂliÂgiÂbilÂiÂty is a tiny bit worse overÂall, as I menÂtioned a minute ago, but it’s not much. And the patÂtern of results across mask type and noise levÂel is conÂsisÂtent. So again, here’s young adults and here’s the oldÂer adults. Here is the same type of plot. But this time, instead of showÂing you intelÂliÂgiÂbilÂiÂty, I’m showÂing you subÂjecÂtive lisÂtenÂing effort, ratÂings, these data mirÂror the intelÂliÂgiÂbilÂiÂty data realÂly niceÂly. For the most part, the conÂdiÂtions in which peoÂple perÂform the worst are the same ones in which peoÂple ratÂed the task as being subÂjecÂtiveÂly more difficult.
VioÂlet:
But there’s one key difÂferÂence here between the intelÂliÂgiÂbilÂiÂty data and the effort data that I think is worth pointÂing out. So on the left side here, I didÂn’t change anyÂthing. This is the subÂjecÂtive effort data and on the right I’ve overÂlaid the intelÂliÂgiÂbilÂiÂty data. This is for young adults and this is just in quiÂet. So what you can see is that intelÂliÂgiÂbilÂiÂty didÂn’t difÂfer in quiÂet, right? So everyÂone is perÂformÂing basiÂcalÂly at a hunÂdred perÂcent at ceilÂing. But if you look at the effort ratÂings, peoÂple, we are seeÂing a litÂtle bit of sepÂaÂraÂtion across mask types there. So even though peoÂple were perÂformÂing at the same levÂel in quiÂet, regardÂless of face masks, they ratÂed some masks, parÂticÂuÂlarÂly the transÂparÂent mask and the cloth masks as effortÂful to process.
VioÂlet:
So this is a nice demonÂstraÂtion that accuÂraÂcy and subÂjecÂtive effort aren’t necÂesÂsarÂiÂly the same. And that’s an imporÂtant point for the lisÂtenÂer, for the lisÂtenÂer’s expeÂriÂence and for clinÂiÂcians who might be tryÂing to figÂure out what to do if someÂbody is havÂing a difÂfiÂcult time recÂogÂnizÂing speech. Here is that corÂreÂspondÂing data for the oldÂer adults. These effort ratÂings across noise levÂels look realÂly simÂiÂlar to the ratÂings proÂvidÂed by the young adults. And here’s is the effort ratÂing side-by-side with the intelÂliÂgiÂbilÂiÂty. And again, these results are realÂly simÂiÂlar to the patÂtern in the young adults. Effort ratÂings difÂfer even when intelÂliÂgiÂbilÂiÂty does not. And this is for quiet.
VioÂlet:
So to recap what I’ve gone over so far, we found that face masks have litÂtle effecÂtive on intelÂliÂgiÂbilÂiÂty in quiÂet, but they can impair intelÂliÂgiÂbilÂiÂty by as much as 30% relÂaÂtive to speech proÂduced withÂout a face mask. If you just add a litÂtle bit of backÂground noise and those impairÂments get even largÂer in large amounts of the backÂground noise. PeoÂple ratÂed the speech proÂduced in face masks as more effortÂful to process than speech proÂduced withÂout a face mask, even in quiÂet. And again, those are the conÂdiÂtions in which intelÂliÂgiÂbilÂiÂty was largeÂly unafÂfectÂed by face masks, the transÂparÂent mask and the cloth mask with a filÂter tendÂed to impair intelÂliÂgiÂbilÂiÂty the most. And they resultÂed in the highÂest subÂjecÂtive effort ratÂings. The findÂing about the transÂparÂent mask is interÂestÂing because I spent that time at the beginÂning, telling you that seeÂing the talkÂer helps, but as the perÂson who recordÂed these stimÂuli, I’m here to tell you that conÂdenÂsaÂtion is no joke.
VioÂlet:
You realÂly can’t see my mouth very clearÂly in that thing. It’s a litÂtle gross and fogÂgy. And so what hapÂpens, it seems like the sound attenÂuÂaÂtion caused by that plasÂtic winÂdow is outÂweighÂing any benÂeÂfit you might get from being able to see the talkÂer’s mouth. And the last thing to note is that this patÂtern of results was simÂiÂlar across age groups. This is someÂwhat surÂprisÂing, but it might be partÂly because these oldÂer adults could adjust the volÂume on their outÂput devices. So the sigÂnal to noise ratio is the same, but if peoÂple are hard of hearÂing, they still might have turned up the volume.
VioÂlet:
All parÂticÂiÂpants had self-reportÂed norÂmal hearÂing. And these oldÂer adults, weren’t very old. The range was 59 to 71. And that’s because it’s hard to colÂlect data from much oldÂer peoÂple online. So this is an instance where it’s posÂsiÂble. We would have seen a difÂferÂent patÂtern of results in a more conÂtrolled lab setÂting, but the fact that oldÂer adults are able to do this task almost as well as young adults, again, means that maybe we don’t need to be quite as restricÂtive about who we samÂple for this kind of basic speech intelÂliÂgiÂbilÂiÂty study online.
VioÂlet:
I’d like to thank my colÂlabÂoÂraÂtors on this project, Kristin Van Engen, who’s my adviÂsor and Jonathan Peelle. I also want to thank GorilÂla of course, and ProÂlifÂic for makÂing online data colÂlecÂtion posÂsiÂble. The peoÂple who paid the bills and all of you for lisÂtenÂing. I also want to note that all of our stimÂuli data, code for analyÂsis and pre-regÂisÂtraÂtion are availÂable at that link, if you’re interÂestÂed. So I’m hapÂpy to take any quesÂtions you might have.
SpeakÂer 2:
ExcelÂlent VioÂlet, thank you so much. AttenÂdees, feel free to drop quesÂtions in the chat. And while we wait for that, I have one for you, VioÂlet, how did you go about ensurÂing ages? It seems like we might’ve used ProÂlifÂic filÂters. Did you have any secÂondary checks?
VioÂlet:
Yes, we used proÂlifÂic filÂters to only include cerÂtain age groups. And then we also had a quesÂtionÂnaire at the end. So that same quesÂtionÂnaire we includÂed where we said, what kind of outÂput device did you use? It will not affect your payÂment. We also ask them their age and some othÂer demoÂgraphÂic inforÂmaÂtion. Yeah. And we found one instance where someÂbody appeared to have lied. We just remove them.
SpeakÂer 2:
ExcelÂlent. And you had menÂtioned at the beginÂning that there might be some issues in syncÂing audio and video in web-based delivÂery comÂpared to in the lab. On the scale of minor to fatal, what was your expeÂriÂence with getÂting that to sync up?
VioÂlet:
That’s a realÂly good quesÂtion. I think everyÂthing worked out pretÂty well. I’ve talked to GorilÂla a litÂtle bit about this and it seems okay. I got some mesÂsages from parÂticÂiÂpants sayÂing that it would load for a long time, but I think what it’s doing is it’s loadÂing the video and then it plays it and it should be synced up. Also, the audio, it’s not like there is sepÂaÂrate audio and video files. They have been comÂbined beforeÂhand. So I think it should be okay. I mean, at least we’re seeÂing a sepÂaÂraÂtion across these masks and from the no mask conÂdiÂtion. So clearÂly it’s synced up enough that you’re getÂting visuÂal benÂeÂfit from that.
SpeakÂer 2:
Yeah. Great tip for that type of research to embed the audio and video sigÂnals as opposed to preÂsentÂing sepÂaÂrate ones. ExcelÂlent. There’s anothÂer quesÂtion here. Were there any issues with typos or spelling misÂtakes im the keyÂword typÂing, did this affect how you could interÂpret the results?
VioÂlet:
Yeah. PeoÂple are realÂly messy typers. We told them to just try to type the whole senÂtence and I wrote an R script that does some of the cleanÂing up. I know there’s an R packÂage that does that for you. And I didÂn’t use it for this just because this project was my baby and I’ve done this kind of thing before. So I wantÂed to stick with it before switchÂing to a new method, but we had pre-regÂisÂtered cerÂtain kinds of typos that we would change. So, like homoÂphones, comÂmon misÂspellings that are an addiÂtion deleÂtion subÂstiÂtuÂtion away from a word, as long as that doesÂn’t itself form anothÂer word. So if it’s a non-word and they said, if it was stick and they just missed the k, that’s fine, but these are all things we pre-regÂisÂtered and I went through and any incorÂrect response, just hand scanned and make sure that my R script didÂn’t miss anyÂthing. It was a lot of work and I realÂly need to switch to the othÂer method I think.
SpeakÂer 2:
ExcelÂlent. Thank you so much.


