Using web-based test­ing to reeval­u­ate the left ear, right hemi­sphere pro­cess­ing advan­tage for pho­net­ic cues to talk­er identification

Lee Drown, Uni­ver­si­ty of Con­necti­cut
@LeeDrown

YouTube

By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

Full Tran­script:

Lee Drown:
All right, good after­noon or morn­ing or evening every­one. Again, my name is Lee Drown, and I’m a PhD stu­dent at the Uni­ver­si­ty of Con­necti­cut. So, it’s well-known that speech sig­nals con­tain both index­i­cal and pho­net­ic cues, which allow peo­ple to rec­og­nize voic­es and mean­ing from the same sig­nal. How­ev­er, a strict delin­eation between index­i­cal and pho­net­ic cues isn’t pos­si­ble, giv­en that talk­ers show sys­tem­at­ic dif­fer­ences in their pho­net­ic cues, and that lis­ten­ers are sen­si­tive to these differences.

Lee Drown:
Today, I’m going to dis­cuss how pho­net­ic cues, such as voice-onset time, to iden­ti­fy talk­ers, as well as present reeval­u­a­tion of evi­dence that sug­gests that learn­ing to use pho­net­ic cues induces a right-hemi­sphere pro­cess­ing advan­tage for talk­er identification.

Lee Drown:
The cur­rent study is a repli­ca­tion and exten­sion of work by Fran­cis and Driscoll in 2006. Their study trained par­tic­i­pants to use voice-onset time, or VOT, as a cue to iden­ti­fy talk­ers. VOT is a tem­po­ral prop­er­ty of stop con­so­nants, and it’s indi­cat­ed by this red line here. This cue lets lis­ten­ers deci­pher the word “gain,” which is pro­duced with short VOTs from “cane,” which is pro­duced by rel­a­tive­ly longer VOTs.

Lee Drown:
The rel­a­tive VOT does mark voice dis­tinc­tion; talk­ers show sta­ble indi­vid­ual dif­fer­ences in their char­ac­ter­is­tic VOTs, even for the same stop con­so­nant. So some talk­ers have longer VOTS than oth­ers, and lis­ten­ers are sen­si­tive to these differences.

Lee Drown:
Fran­cis and Driscoll also exam­ined whether a left ear, right hemi­sphere advan­tage would emerge for par­tic­i­pants who were suc­cess­ful­ly able to learn to use VOT as a mark­er for talk­er iden­ti­fi­ca­tion. They used a dichot­ic stim­u­lus manip­u­la­tion to exam­ine hemi­sphere con­tri­bu­tions to task per­for­mance, build­ing on neu­roimag­ing research that sug­gests hemi­spher­ic opti­miza­tion for dif­fer­ent aspects of sig­nal pro­cess­ing, with right hemi­sphere tem­po­ral regions dom­i­nant for voice processing.

Lee Drown:
To inves­ti­gate whether hemi­spher­ic con­tri­bu­tions to using pho­net­ic cues for talk­er iden­ti­fi­ca­tion, Fran­cis and Driscoll set up this talk­er iden­ti­fi­ca­tion task. So lis­ten­ers heard two talk­ers, and were asked to iden­ti­fy which talk­er they heard. So the lis­ten­ers heard Jared, who pro­duced tokens with VOTs in the 30-mil­lisec­ond range, and they heard Dave, who pro­duced tokens in the 50-mil­lisec­ond VOT range. So there was only a 20 mil­lisec­ond dif­fer­ence between the short and long VOT char­ac­ter­is­tics of these two talkers.

Lee Drown:
In all actu­al­i­ty, all these tokens were pro­duced by the same talk­er. So the par­tic­i­pants in this exper­i­ment heard the same fun­da­men­tal fre­quen­cy, and oth­er index­i­cal prop­er­ties asso­ci­at­ed with the talk­er’s voice. So these two talk­ers only dif­fered in their char­ac­ter­is­tic VOTs. This exper­i­ment con­sist­ed of a pre-test, a train­ing phase, and a post-test phase. All the tasks in the phase was the talk­er iden­ti­fi­ca­tion task, and feed­back was pro­vid­ed dur­ing the train­ing phase, but not the test phases.

Lee Drown:
As Fran­cis and Driscoll were inter­est­ed in exam­in­ing hemi­spher­ic con­tri­bu­tions to talk­er iden­ti­fi­ca­tion, a dichot­ic lis­ten­ing task was employed. Dur­ing the pre-test and post-test phas­es, stim­uli were pre­sent­ed to either the left or right ear on each tri­al. Stim­uli were pre­sent­ed bin­au­ral­ly dur­ing test­ing. Fran­cis & Driscoll found evi­dence for learn­ing between pre and post-tests for eight sub­jects. For these sub­jects, they also iden­ti­fied a left ear, right-hemi­spher­ic advan­tage at the group lev­el, in the talk­er iden­ti­fi­ca­tion task at post-test, but not at pre-test; which does sug­gest that learn­ing to process VOT as a cue to talk­er iden­ti­ty induced re-lat­er­al­iza­tion of hemi­sphere dominance.

Lee Drown:
How­ev­er, the sam­ple size of the par­tic­i­pants in this exper­i­ment was small, being only 18 par­tic­i­pants, and only rough­ly 50% of the par­tic­i­pants were able to meet this learn­ing cri­te­ria, defined as a 5% improve­ment in talk­er iden­ti­fi­ca­tion accu­ra­cy between pre and post-test. Addi­tion­al­ly, the sta­tis­ti­cal evi­dence for this inter­ac­tion between phase and ear was weak, at p = 0.04.

Lee Drown:
For these rea­sons, we decid­ed to con­duct a repli­ca­tion and an exten­sion of this study. Specif­i­cal­ly, the goal of our cur­rent work was to answer two ques­tions. First, do the results of the Fran­cis and Driscoll study repli­cate with a larg­er sam­ple? And sec­ond, what makes some­one a bet­ter vs. Poor­er learn­er in this talk­er iden­ti­fi­ca­tion task?

Lee Drown:
To answer these ques­tions, lis­ten­ers par­tic­i­pat­ed in two exper­i­men­tal ses­sions. Ses­sion 1 was a repli­ca­tion of the Fran­cis and Driscoll talk­er iden­ti­fi­ca­tion task, and Ses­sion 2 con­sist­ed of four indi­vid­ual dif­fer­ence mea­sures, intend­ed to give insight into what mea­sures pre­dict suc­cess in using pho­net­ic cues for talk­er iden­ti­fi­ca­tion. Both ses­sions were deployed using Goril­la. Par­tic­i­pants were recruit­ed using the Pro­lif­ic Par­tic­i­pant Par­tic­i­pa­tion Pool, and in Pro­lif­ic, we recruit­ed par­tic­i­pants to match the sam­ple demo­graph­ics to the orig­i­nal study.

Lee Drown:
Head­phone com­pli­ance for this task was para­mount, due to the dichot­ic lis­ten­ing task required for the talk­er iden­ti­fi­ca­tion task; there­fore, par­tic­i­pants were required to pass three head­phones screens, all of which were pro­grammed and deployed via Goril­la. These tasks includ­ed the Woods and Col­leagues, and the Milne and Col­leagues tasks, that have already been described by Dr. Theodore at the begin­ning of this pan­el. These two head­phones screens, how­ev­er can­not deter­mine whether the par­tic­i­pant has actu­al­ly placed the left head­phone chan­nel on the left ear, and vice ver­sa. There­fore, we cre­at­ed a nov­el chan­nel detec­tion task, to ensure that the left head­phone was in the left ear, and vice ver­sa for the right head­phone. Lis­ten­ers had to show ceil­ing per­for­mance in all three of these head­phones screens to be includ­ed in the study.

Lee Drown:
The stim­uli for the first ses­sion were drawn from two VOT con­tin­ua: one that ranged from gain to cane, and one that ranged from goal to coal. Both con­tin­ua were cre­at­ed from nat­ur­al pro­duc­tions of the voice end­point elicit­ed from a sin­gle female mono­lin­gual Eng­lish speak­er of Amer­i­can Eng­lish. The token dura­tion for every stim­uli were equated.

Lee Drown:
In order to increase the por­tion of the sam­ple size able to com­plete this task, we increase the dif­fer­ence between the short and long VOTs from 20 mil­lisec­onds, as was found in the orig­i­nal Fran­cis and Driscoll study, to 80 mil­lisec­onds. By doing this, we aimed to increase the num­ber of par­tic­i­pants who could learn to use VOT as a mark­er for talk­er identification.

Lee Drown:
We named the long VOT talk­er Sheila, and the short VOT talk­er Joanne. Both of these talk­ers have three unique tokens in each of their respec­tive long and short VOT ranges. Lis­ten­ers heard three tokens in these VOT spaces through­out the exper­i­ment. Specif­i­cal­ly, they heard two tokens from both Joanne and Sheila for each word dur­ing train­ing, and a dif­fer­ent token for each talk­er for each word dur­ing pre- and post-test.

Lee Drown:
Just as in Fran­cis and Driscoll, in our Ses­sion 1, lis­ten­ers first com­plet­ed a pre-test, fol­lowed by a train­ing phase, and a post-test. Only lis­ten­ers who met inclu­sion cri­te­ria for Ses­sion 1 were then invit­ed to par­tic­i­pate in Ses­sion 2. The cri­te­ria were: first, that they passed all three head­phones screens, thus show­ing head­phone com­pli­ance; and sec­ond, that they per­formed above chance dur­ing the train­ing ses­sion. And it should be not­ed that dur­ing the train­ing ses­sion, lis­ten­ers received feed­back on respons­es, so there­fore, a per­for­mance above chance indi­cates ade­quate effort to the task. It’s impor­tant to note as well, that we did not exclude par­tic­i­pants who did not meet the Fran­cis and Driscoll cri­te­ria for learn­ing, as we were inter­est­ed in exam­in­ing how indi­vid­ual dif­fer­ence mea­sures tracked with talk­er iden­ti­fi­ca­tion for all listeners.

Lee Drown:
Of the 140 par­tic­i­pants test­ed in Ses­sion 1, 28 were exclud­ed on the first cri­te­ri­on, and 15 were exclud­ed on the sec­ond cri­te­ri­on, leav­ing a final sam­ple of 97 par­tic­i­pants in Ses­sion 1, who were invit­ed back to com­plete Ses­sion 2. Again, Ses­sion 2 exam­ined indi­vid­ual dif­fer­ence mea­sures, to delin­eate what made cer­tain lis­ten­ers good at the Fran­cis and Driscoll talk­er iden­ti­fi­ca­tion task. Since the Fran­cis and Driscoll study did not exam­ine indi­vid­ual dif­fer­ence amongst par­tic­i­pants, and only showed per­for­mance at group lev­el, it is unknown what fac­tors con­tribute to a per­son­’s abil­i­ty to use pho­net­ic cues, such as voice-onset time, for talk­er identification.

Lee Drown:
The four indi­vid­ual dif­fer­ence con­structs are shown here, as well as the task used to assess these con­structs, and how we quan­ti­fied an indi­vid­u­al’s behav­ior in these tasks. So a flanker task was used to mea­sure an indi­vid­u­al’s inhi­bi­tion. In the pitch per­cep­tion task, lis­ten­ers heard two tone sequences, and were asked to iden­ti­fy if the tone sequences were the same or dif­fer­ent. For the cat­e­go­ry iden­ti­fi­ca­tion task, lis­ten­ers cat­e­go­rized the first sound of a VOT con­tin­u­um as either “g” or “c.” And for the with­in-cat­e­go­ry dis­crim­i­na­tion task, lis­ten­ers heard pairs of tokens from a VOT con­tin­u­um, and iden­ti­fied whether the two tokens were the same or different.

Lee Drown:
Crit­i­cal­ly, the VOT con­tin­u­um used for the cat­e­go­ry iden­ti­fi­ca­tion task and the with­in-cat­e­go­ry dis­crim­i­na­tion task was pro­duced by a dif­fer­ent talk­er than used in Ses­sion 1, in order to min­i­mize any trans­fer of learn­ing between Ses­sion 1 and Ses­sion 2. We used these mea­sures based off of past work that sug­gests that these con­structs maybe linked to an indi­vid­u­al’s abil­i­ty to rec­og­nize talkers.

Lee Drown:
Here, I high­light our main find­ings from Ses­sion 1. There was a sig­nif­i­cant increase in accu­ra­cy between pre- and post-test, as shown in Pan­el A, and peo­ple were faster at post-test com­pared to pre-test, as shown in Pan­el B. How­ev­er, we found no evi­dence of a left ear, right-hemi­sphere advan­tage for this task.

Lee Drown:
The same pat­terns held when we exam­ined only lis­ten­ers who showed learn­ing in this task. Here is per­for­mance in the four indi­vid­ual dif­fer­ence mea­sures in Ses­sion 2 for the 59 par­tic­i­pants who returned for this ses­sion. As you can see by the box plots for each task, we did elic­it a wide range of indi­vid­ual vari­a­tion for each construct.

Lee Drown:
Now to the main ques­tion, which is, “What indi­vid­ual dif­fer­ence fac­tors pre­dict per­for­mance in the talk­er iden­ti­fi­ca­tion task?” To answer this ques­tion, we cor­re­lat­ed per­for­mance in each indi­vid­ual dif­fer­ence task, with four mea­sures of talk­er iden­ti­fi­ca­tion: accu­ra­cy dur­ing train­ing, accu­ra­cy at pre-test, accu­ra­cy at post-test, and the dif­fer­ence in accu­ra­cy between post- and pre-tests, with high­er val­ues indi­cat­ing greater learning.

Lee Drown:
So first, inhi­bi­tion was not relat­ed to any mea­sure of talk­er iden­ti­fi­ca­tion. But in con­trast, pitch per­cep­tion was pos­i­tive­ly asso­ci­at­ed with talk­er iden­ti­fi­ca­tion accu­ra­cy at pre-test, train­ing, and post test, but it did not pre­dict the mag­ni­tude of learn­ing. Cat­e­go­ry iden­ti­fi­ca­tion slope was not relat­ed to any mea­sure of talk­er iden­ti­fi­ca­tion per­for­mance, but with­in-cat­e­go­ry dis­crim­i­na­tion was pos­i­tive­ly asso­ci­at­ed with talk­er iden­ti­fi­ca­tion at pre-test, train­ing, and post-test.

Lee Drown:
Over­all, although we did not repli­cate the orig­i­nal Fran­cis and Driscoll study, we were able to extend the orig­i­nal study to include indi­vid­ual dif­fer­ence mea­sures, in order to bet­ter under­stand the mech­a­nisms behind using pho­net­ic cues, such VOT, for talk­er iden­ti­fi­ca­tion. Specif­i­cal­ly, pitch per­cep­tion and with­in-cat­e­go­ry dis­crim­i­na­tion were found to be pre­dic­tors of per­for­mance on pre-train­ing, pre-test train­ing, and post-test, but not learn­ing over­all. So these find­ings sug­gest that a per­son­’s audi­to­ry acu­ity plays a strong role in their abil­i­ty to use pho­net­ic vari­a­tion as a cue to talk­er identity.

Lee Drown:
To con­clude, I want to high­light some best prac­tices we employed for web-based test­ing. First, we only invit­ed par­tic­i­pants back for exper­i­ment two, who showed that they were fol­low­ing task instruc­tions. Specif­i­cal­ly, we test­ed 140 peo­ple over­all, but only 97 met head­phone and train­ing accu­ra­cy cri­te­ria. There­fore, we saved valu­able lab resources by only test­ing com­pli­ant par­tic­i­pants in Ses­sion 2.

Lee Drown:
Last­ly, we employed mul­ti­ple checks to dis­cour­age the pres­ence of auto­mat­ed enroll­ment in online stud­ies by soft­ware appli­ca­tions, oth­er­wise known as bots. In this study, no sus­pect­ed bots remained after we exclud­ed par­tic­i­pants based on head­phone com­pli­ance and train­ing accuracy.

Lee Drown:
So I’d like to acknowl­edge my col­lab­o­ra­tors and fund­ing sources for this work. And I will also direct you to our OSF Repos­i­to­ry for addi­tion­al resources. Thank you so much for your atten­tion. And I will now address any imme­di­ate questions.

Speak­er 2:
Excel­lent, Lee. Thank you so much. Atten­dees? Ah, yes. Lee, there’s a ques­tion here. Did you redo head­phone checks at Ses­sion 2?

Lee Drown:
We did. Yep. So we retest­ed head­phone com­pli­ance as well, and it actu­al­ly was a great indi­ca­tor that includ­ing par­tic­i­pants in Ses­sion 2 who met head­phone com­pli­ance in Ses­sion 1 was a great idea, as the vast major­i­ty of our par­tic­i­pants, I believe, out of 57, only one did not meet cri­te­ria for head­phone com­pli­ance in Ses­sion 2. Which shows that if an indi­vid­ual is com­pli­ant with head­phones to begin with, that will most like­ly per­pet­u­ate fol­low­ing future ses­sions. So yes, we did re-exam­ine head­phone com­pli­ance, and it showed that we were mak­ing good deci­sions, as far as includ­ing those who were com­pli­ant to begin with.

Speak­er 2:
Excel­lent. There’s a few more ques­tions com­ing in. Lee, you can address these in the chat, or we can keep this dia­logue going on, and be online.

 

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

With thanks to our sponsors!