Online speech per­cep­tion exper­i­ments: Democ­ra­tiz­ing sci­ence and teaching.

Christi­na Y. Tzeng, San José State University


Full Tran­script:

Christi­na Y. Tzeng:
All right. Thank you, Rachel, for the intro­duc­tion at the begin­ning of the ses­sion and to both you and Joshua for gath­er­ing us in this space. I also want to say thank you upfront for every­one who is still here at the last talk of the day. I am excit­ed to share about my expe­ri­ence con­duct­ing speech per­cep­tion exper­i­ments online, high­light­ing the pow­er for these online exper­i­ments to democ­ra­tize sci­ence and teach­ing. I’ll aim to achieve two objec­tives in my talk today. The first is to share some find­ings that add to what we now know is a grow­ing piece of evi­dence that online speech per­cep­tion exper­i­ments are high­ly effi­cient and do yield robust data. The sec­ond objec­tive is to share some thoughts on how online exper­i­ments, more broad­ly, can make sci­ence more acces­si­ble for both researchers and participants.

Christi­na Y. Tzeng:
In my work, I study how we, as lis­ten­ers, over­come the enor­mous amount of vari­a­tion that we encounter when we lis­ten to dif­fer­ent voic­es and utter­ances. Our exper­i­ments typ­i­cal­ly require par­tic­i­pants to lis­ten to audi­to­ry stim­uli and make sub­se­quent respons­es on a com­put­er to each one. For in-per­son or in-lab exper­i­ments, this would typ­i­cal­ly require what’s pic­tured on the left: a sound-atten­u­at­ed, dis­trac­tion-free booth, high-qual­i­ty head­phones, and spe­cial­ized soft­ware, as well as hardware.

Christi­na Y. Tzeng:
As a dis­claimer, I have to state that my first real dive into the world of online exper­i­ments was in late 2019, which makes me a rel­a­tive­ly nov­el user of these online exper­i­men­tal meth­ods, but this is when I start­ed to won­der, “Is such a high­ly con­trolled lis­ten­ing envi­ron­ment real­ly necessary?”

Christi­na Y. Tzeng:
In the inter­est of achiev­ing this first objec­tive, I’d like to share what are now pub­lished find­ings from my first for­ay into the online exper­i­ment world. This is work done in col­lab­o­ra­tion with my col­leagues, Dr. Lynne Nygaard and Dr. Rachel Theodore, where we exam­ined the time course of a phe­nom­e­non called lex­i­cal­ly guid­ed per­cep­tu­al learning.

Christi­na Y. Tzeng:
We know that lis­ten­ers use a whole host of cues to map the acoustics of the speech sig­nal onto lin­guis­tic units. One of these cues is lex­i­cal knowl­edge. Imag­ine hear­ing a frica­tive sound that’s between an S and an SH sound. If that ambigu­ous sound is embed­ded into this word on the left, the lis­ten­er hears that sound as an S as in dinosaur. But if that same ambigu­ous sound is instead embed­ded in the word on the right, the lis­ten­er hears that sound instead as an SH as in effi­cient. But if lis­ten­ers are exposed to these ambigu­ous sounds in sta­ble lex­i­cal con­texts, that bias them to hear either S or SH sound.

Christi­na Y. Tzeng:
What we then see are changes in the lis­ten­er’s rep­re­sen­ta­tions of their S and SH cat­e­go­ry. These changes in sound cat­e­go­ry rep­re­sen­ta­tion are what we call lex­i­cal­ly guid­ed per­cep­tu­al learn­ing. In both the online and in-per­son ver­sions of this task, the lex­i­cal­ly guid­ed per­cep­tu­al learn­ing par­a­digm takes about 20 min­utes to com­plete. So here, lis­ten­ers com­plete an expo­sure phase fol­lowed by a test phase. And in the expo­sure phase, they com­plete a lex­i­cal deci­sion task where they hear an ambigu­ous sound such as a frica­tive between S and SH. One group hears this ambigu­ous sound that’s embed­ded in words, bias­ing them to hear it as an S, where­as anoth­er group is biased to hear that same sound as an SH. So after expo­sure, the lis­ten­ers com­plete a pho­net­ic cat­e­go­riza­tion task where they iden­ti­fy ambigu­ous sounds on a non-word con­tin­u­um here, either as asi or ashi.

Christi­na Y. Tzeng:
We drew our sam­ples from Pro­lif­ic and exe­cut­ed the exper­i­ments in Goril­la. We com­plet­ed a total of six exper­i­ments in this pub­li­ca­tion, but in the inter­est of time, I’ll share the find­ings from one. What will appear here are the results of the pho­net­ic cat­e­go­riza­tion task at test, where­upon hear­ing ambigu­ous sound on the asi/ashi con­tin­u­um, we mea­sured the like­li­hood that par­tic­i­pants heard those sounds as asi. Here, we see robust evi­dence for lex­i­cal­ly guid­ed per­cep­tu­al learn­ing. As lis­ten­ers, we’re more like­ly to hear the ambigu­ous sounds as asi when they were biased to hear S dur­ing expo­sure indi­cat­ed by the red line, then when they were biased to hear the sounds as SH dur­ing expo­sure shown here by the green line.

Christi­na Y. Tzeng:
To show­case the high lev­el of data qual­i­ty that we see at the indi­vid­ual lev­el, here are sep­a­rate plots for each of the 70 par­tic­i­pants at test where we can see the expect­ed psy­cho­me­t­ric curves for every sin­gle par­tic­i­pant. We only exclud­ed 5% of our par­tic­i­pants across the six exper­i­ments due to fail­ure to per­form the task. We did have to exclude 16% of the total num­ber of par­tic­i­pants due to fail­ure to pass the woods at all, head­phone check that Dr. Theodore described at the begin­ning of the ses­sion. But this was a small price to pay, giv­en the speed of data col­lec­tion. So for exam­ple, we col­lect­ed data from the 70 par­tic­i­pants pre­sent­ed in Exper­i­ment 1 in under a sin­gle hour.

Christi­na Y. Tzeng:
I hope what I’ve shared has sup­port­ed the idea that online speech per­cep­tion exper­i­ments are high­ly effi­cient and yields robust find­ings even with audi­to­ry tasks that require fine-grained pho­net­ic dis­crim­i­na­tions like the one I presented.

Christi­na Y. Tzeng:
I now want to turn to the idea that online exper­i­ments can pro­vide us with two things in par­tic­u­lar: access to a larg­er and more diverse pool of par­tic­i­pants and also more user-friend­ly exper­i­ment build­ing inter­faces for our stu­dents and research mentees.

Christi­na Y. Tzeng:
This is the fig­ure I showed ear­li­er. We repli­cat­ed the find­ing with anoth­er end of 70 par­tic­i­pants using a sec­ond stim­u­lus set shown here on the right, mean­ing we ran a total of 150 par­tic­i­pants with­in the span of about an hour and a half, which using in-per­son meth­ods would have tak­en us weeks or even months.

Christi­na Y. Tzeng:
For his mas­ter’s the­sis, one of my stu­dent col­lab­o­ra­tors, Ulis­es Quin­tero, is inter­est­ed in recruit­ing par­tic­i­pants who speak Eng­lish and a sec­ond lan­guage. So in Pro­lif­ic, if we use our stan­dard inclu­sion cri­te­ria, includ­ing this cri­te­ri­on of speak­ing Eng­lish plus anoth­er lan­guage, we auto­mat­i­cal­ly have access to over 3,000 par­tic­i­pants, which is mag­ni­tudes greater than what we would have access to using in-per­son meth­ods. For his under­grad­u­ate hon­ors the­sis, Justin Au built a talk­er ID task in Goril­la on his own using pri­mar­i­ly the tuto­r­i­al sup­port that is on Goril­la’s web­site as a guide.

Christi­na Y. Tzeng:
And by address­ing the two ques­tions about audi­to­ry research more broad­ly that Rachel shared at the begin­ning of the ses­sion, the first is, “What do you think is the biggest chal­lenge for audi­to­ry research online, and how do you over­come it?” As Jason men­tioned, due to the pan­dem­ic, we have all been forced to some extent to embrace online meth­ods more read­i­ly, but I think we are still very much in the process of estab­lish­ing both the valid­i­ty and the reli­a­bil­i­ty of these meth­ods. And one way for us to do this is to run online and in-per­son exper­i­ments in par­al­lel so that we, not just as indi­vid­ual researchers but as a field, can be reas­sured that our tasks can be suc­cess­ful­ly trans­ferred across these dif­fer­ent platforms.

Christi­na Y. Tzeng:
And the sec­ond ques­tion, “What can audi­to­ry research gain most from online meth­ods?” My take on this is that, with how quick­ly, we can col­lect data from a whole num­ber of dif­fer­ent pop­u­la­tions. We’ve essen­tial­ly elim­i­nat­ed the data col­lec­tion bot­tle­neck. Adapt­ing in-per­son exper­i­ments to the online world takes a lot of tri­al and error, and I’m still very much in that learn­ing phase, but I think that the reduc­tion of this bot­tle­neck dras­ti­cal­ly changes the pace of audi­to­ry research and sci­ence more broadly.

Christi­na Y. Tzeng:
With that, I’d like to extend my grat­i­tude to my recent col­lab­o­ra­tors as well as to all of you for your atten­tion. I look for­ward to your ques­tions and comments.

Rachel Theodore:
Excel­lent, Christi­na. Thank you so much for those real­ly care­ful thoughts. Ques­tions. Yeah, here’s one for you, Christi­na. “I was won­der­ing if, in your work, you’ve observed the use of dif­fer­ent expo­sure phase meth­ods besides lex­i­cal deci­sion in an online world. How’s the sto­ry lis­ten­ing closed sen­tences and if you’ve noticed any dif­fer­ences at test as a func­tion of those expo­sure phase methods?”

Christi­na Y. Tzeng:
Thanks for that ques­tion. So again, com­ing back to this dis­claimer that I’m a rel­a­tive­ly nov­el user of online method­ol­o­gy in gen­er­al for audi­to­ry research, we’ve only done some pilot work using oth­er kinds of expo­sure method­ol­o­gy. We’re in the process of pilot­ing a talk­er iden­ti­fi­ca­tion task, where dur­ing the expo­sure phase, lis­ten­ers well hear utter­ances spo­ken by spe­cif­ic talk­ers and have to indi­cate which talk­er they think they’re hear­ing with the ulti­mate goal being able to iden­ti­fy the dif­fer­ent voic­es in the task. And so far, we haven’t seen any kind of notice­able dif­fer­ence in per­for­mance for in-per­son­/in-lab ver­sions of that and online ver­sions. What we do notice is that, some­times, par­tic­i­pants will take self-inflict­ed breaks. And so one les­son we’ve learned is that in addi­tion to keep­ing the task rel­a­tive­ly short, we will build in some breaks so that they’re not leav­ing the com­put­er for an extend­ed peri­od of time. But the short response to that ques­tion is at least with talk­er iden­ti­fi­ca­tion tasks and this lex­i­cal­ly guid­ed per­cep­tu­al learn­ing task, we haven’t seen rea­sons to not trans­fer these into the online world.


