Using web-based test­ing to reeval­u­ate the left ear, right hemi­sphere pro­cess­ing advan­tage for pho­net­ic cues to talk­er identification

Lee Drown, Uni­ver­si­ty of Con­necti­cut


Full Tran­script:

Lee Drown:
All right, good after­noon or morn­ing or evening every­one. Again, my name is Lee Drown, and I’m a PhD stu­dent at the Uni­ver­si­ty of Con­necti­cut. So, it’s well-known that speech sig­nals con­tain both index­i­cal and pho­net­ic cues, which allow peo­ple to rec­og­nize voic­es and mean­ing from the same sig­nal. How­ev­er, a strict delin­eation between index­i­cal and pho­net­ic cues isn’t pos­si­ble, giv­en that talk­ers show sys­tem­at­ic dif­fer­ences in their pho­net­ic cues, and that lis­ten­ers are sen­si­tive to these differences.

Lee Drown:
Today, I’m going to dis­cuss how pho­net­ic cues, such as voice-onset time, to iden­ti­fy talk­ers, as well as present reeval­u­a­tion of evi­dence that sug­gests that learn­ing to use pho­net­ic cues induces a right-hemi­sphere pro­cess­ing advan­tage for talk­er identification.

Lee Drown:
The cur­rent study is a repli­ca­tion and exten­sion of work by Fran­cis and Driscoll in 2006. Their study trained par­tic­i­pants to use voice-onset time, or VOT, as a cue to iden­ti­fy talk­ers. VOT is a tem­po­ral prop­er­ty of stop con­so­nants, and it’s indi­cat­ed by this red line here. This cue lets lis­ten­ers deci­pher the word “gain,” which is pro­duced with short VOTs from “cane,” which is pro­duced by rel­a­tive­ly longer VOTs.

Lee Drown:
The rel­a­tive VOT does mark voice dis­tinc­tion; talk­ers show sta­ble indi­vid­ual dif­fer­ences in their char­ac­ter­is­tic VOTs, even for the same stop con­so­nant. So some talk­ers have longer VOTS than oth­ers, and lis­ten­ers are sen­si­tive to these differences.

Lee Drown:
Fran­cis and Driscoll also exam­ined whether a left ear, right hemi­sphere advan­tage would emerge for par­tic­i­pants who were suc­cess­ful­ly able to learn to use VOT as a mark­er for talk­er iden­ti­fi­ca­tion. They used a dichot­ic stim­u­lus manip­u­la­tion to exam­ine hemi­sphere con­tri­bu­tions to task per­for­mance, build­ing on neu­roimag­ing research that sug­gests hemi­spher­ic opti­miza­tion for dif­fer­ent aspects of sig­nal pro­cess­ing, with right hemi­sphere tem­po­ral regions dom­i­nant for voice processing.

Lee Drown:
To inves­ti­gate whether hemi­spher­ic con­tri­bu­tions to using pho­net­ic cues for talk­er iden­ti­fi­ca­tion, Fran­cis and Driscoll set up this talk­er iden­ti­fi­ca­tion task. So lis­ten­ers heard two talk­ers, and were asked to iden­ti­fy which talk­er they heard. So the lis­ten­ers heard Jared, who pro­duced tokens with VOTs in the 30-mil­lisec­ond range, and they heard Dave, who pro­duced tokens in the 50-mil­lisec­ond VOT range. So there was only a 20 mil­lisec­ond dif­fer­ence between the short and long VOT char­ac­ter­is­tics of these two talkers.

Lee Drown:
In all actu­al­i­ty, all these tokens were pro­duced by the same talk­er. So the par­tic­i­pants in this exper­i­ment heard the same fun­da­men­tal fre­quen­cy, and oth­er index­i­cal prop­er­ties asso­ci­at­ed with the talk­er’s voice. So these two talk­ers only dif­fered in their char­ac­ter­is­tic VOTs. This exper­i­ment con­sist­ed of a pre-test, a train­ing phase, and a post-test phase. All the tasks in the phase was the talk­er iden­ti­fi­ca­tion task, and feed­back was pro­vid­ed dur­ing the train­ing phase, but not the test phases.

Lee Drown:
As Fran­cis and Driscoll were inter­est­ed in exam­in­ing hemi­spher­ic con­tri­bu­tions to talk­er iden­ti­fi­ca­tion, a dichot­ic lis­ten­ing task was employed. Dur­ing the pre-test and post-test phas­es, stim­uli were pre­sent­ed to either the left or right ear on each tri­al. Stim­uli were pre­sent­ed bin­au­ral­ly dur­ing test­ing. Fran­cis & Driscoll found evi­dence for learn­ing between pre and post-tests for eight sub­jects. For these sub­jects, they also iden­ti­fied a left ear, right-hemi­spher­ic advan­tage at the group lev­el, in the talk­er iden­ti­fi­ca­tion task at post-test, but not at pre-test; which does sug­gest that learn­ing to process VOT as a cue to talk­er iden­ti­ty induced re-lat­er­al­iza­tion of hemi­sphere dominance.

Lee Drown:
How­ev­er, the sam­ple size of the par­tic­i­pants in this exper­i­ment was small, being only 18 par­tic­i­pants, and only rough­ly 50% of the par­tic­i­pants were able to meet this learn­ing cri­te­ria, defined as a 5% improve­ment in talk­er iden­ti­fi­ca­tion accu­ra­cy between pre and post-test. Addi­tion­al­ly, the sta­tis­ti­cal evi­dence for this inter­ac­tion between phase and ear was weak, at p = 0.04.

Lee Drown:
For these rea­sons, we decid­ed to con­duct a repli­ca­tion and an exten­sion of this study. Specif­i­cal­ly, the goal of our cur­rent work was to answer two ques­tions. First, do the results of the Fran­cis and Driscoll study repli­cate with a larg­er sam­ple? And sec­ond, what makes some­one a bet­ter vs. Poor­er learn­er in this talk­er iden­ti­fi­ca­tion task?

Lee Drown:
To answer these ques­tions, lis­ten­ers par­tic­i­pat­ed in two exper­i­men­tal ses­sions. Ses­sion 1 was a repli­ca­tion of the Fran­cis and Driscoll talk­er iden­ti­fi­ca­tion task, and Ses­sion 2 con­sist­ed of four indi­vid­ual dif­fer­ence mea­sures, intend­ed to give insight into what mea­sures pre­dict suc­cess in using pho­net­ic cues for talk­er iden­ti­fi­ca­tion. Both ses­sions were deployed using Goril­la. Par­tic­i­pants were recruit­ed using the Pro­lif­ic Par­tic­i­pant Par­tic­i­pa­tion Pool, and in Pro­lif­ic, we recruit­ed par­tic­i­pants to match the sam­ple demo­graph­ics to the orig­i­nal study.

Lee Drown:
Head­phone com­pli­ance for this task was para­mount, due to the dichot­ic lis­ten­ing task required for the talk­er iden­ti­fi­ca­tion task; there­fore, par­tic­i­pants were required to pass three head­phones screens, all of which were pro­grammed and deployed via Goril­la. These tasks includ­ed the Woods and Col­leagues, and the Milne and Col­leagues tasks, that have already been described by Dr. Theodore at the begin­ning of this pan­el. These two head­phones screens, how­ev­er can­not deter­mine whether the par­tic­i­pant has actu­al­ly placed the left head­phone chan­nel on the left ear, and vice ver­sa. There­fore, we cre­at­ed a nov­el chan­nel detec­tion task, to ensure that the left head­phone was in the left ear, and vice ver­sa for the right head­phone. Lis­ten­ers had to show ceil­ing per­for­mance in all three of these head­phones screens to be includ­ed in the study.

Lee Drown:
The stim­uli for the first ses­sion were drawn from two VOT con­tin­ua: one that ranged from gain to cane, and one that ranged from goal to coal. Both con­tin­ua were cre­at­ed from nat­ur­al pro­duc­tions of the voice end­point elicit­ed from a sin­gle female mono­lin­gual Eng­lish speak­er of Amer­i­can Eng­lish. The token dura­tion for every stim­uli were equated.

Lee Drown:
In order to increase the por­tion of the sam­ple size able to com­plete this task, we increase the dif­fer­ence between the short and long VOTs from 20 mil­lisec­onds, as was found in the orig­i­nal Fran­cis and Driscoll study, to 80 mil­lisec­onds. By doing this, we aimed to increase the num­ber of par­tic­i­pants who could learn to use VOT as a mark­er for talk­er identification.

Lee Drown:
We named the long VOT talk­er Sheila, and the short VOT talk­er Joanne. Both of these talk­ers have three unique tokens in each of their respec­tive long and short VOT ranges. Lis­ten­ers heard three tokens in these VOT spaces through­out the exper­i­ment. Specif­i­cal­ly, they heard two tokens from both Joanne and Sheila for each word dur­ing train­ing, and a dif­fer­ent token for each talk­er for each word dur­ing pre- and post-test.

Lee Drown:
Just as in Fran­cis and Driscoll, in our Ses­sion 1, lis­ten­ers first com­plet­ed a pre-test, fol­lowed by a train­ing phase, and a post-test. Only lis­ten­ers who met inclu­sion cri­te­ria for Ses­sion 1 were then invit­ed to par­tic­i­pate in Ses­sion 2. The cri­te­ria were: first, that they passed all three head­phones screens, thus show­ing head­phone com­pli­ance; and sec­ond, that they per­formed above chance dur­ing the train­ing ses­sion. And it should be not­ed that dur­ing the train­ing ses­sion, lis­ten­ers received feed­back on respons­es, so there­fore, a per­for­mance above chance indi­cates ade­quate effort to the task. It’s impor­tant to note as well, that we did not exclude par­tic­i­pants who did not meet the Fran­cis and Driscoll cri­te­ria for learn­ing, as we were inter­est­ed in exam­in­ing how indi­vid­ual dif­fer­ence mea­sures tracked with talk­er iden­ti­fi­ca­tion for all listeners.

Lee Drown:
Of the 140 par­tic­i­pants test­ed in Ses­sion 1, 28 were exclud­ed on the first cri­te­ri­on, and 15 were exclud­ed on the sec­ond cri­te­ri­on, leav­ing a final sam­ple of 97 par­tic­i­pants in Ses­sion 1, who were invit­ed back to com­plete Ses­sion 2. Again, Ses­sion 2 exam­ined indi­vid­ual dif­fer­ence mea­sures, to delin­eate what made cer­tain lis­ten­ers good at the Fran­cis and Driscoll talk­er iden­ti­fi­ca­tion task. Since the Fran­cis and Driscoll study did not exam­ine indi­vid­ual dif­fer­ence amongst par­tic­i­pants, and only showed per­for­mance at group lev­el, it is unknown what fac­tors con­tribute to a per­son­’s abil­i­ty to use pho­net­ic cues, such as voice-onset time, for talk­er identification.

Lee Drown:
The four indi­vid­ual dif­fer­ence con­structs are shown here, as well as the task used to assess these con­structs, and how we quan­ti­fied an indi­vid­u­al’s behav­ior in these tasks. So a flanker task was used to mea­sure an indi­vid­u­al’s inhi­bi­tion. In the pitch per­cep­tion task, lis­ten­ers heard two tone sequences, and were asked to iden­ti­fy if the tone sequences were the same or dif­fer­ent. For the cat­e­go­ry iden­ti­fi­ca­tion task, lis­ten­ers cat­e­go­rized the first sound of a VOT con­tin­u­um as either “g” or “c.” And for the with­in-cat­e­go­ry dis­crim­i­na­tion task, lis­ten­ers heard pairs of tokens from a VOT con­tin­u­um, and iden­ti­fied whether the two tokens were the same or different.

Lee Drown:
Crit­i­cal­ly, the VOT con­tin­u­um used for the cat­e­go­ry iden­ti­fi­ca­tion task and the with­in-cat­e­go­ry dis­crim­i­na­tion task was pro­duced by a dif­fer­ent talk­er than used in Ses­sion 1, in order to min­i­mize any trans­fer of learn­ing between Ses­sion 1 and Ses­sion 2. We used these mea­sures based off of past work that sug­gests that these con­structs maybe linked to an indi­vid­u­al’s abil­i­ty to rec­og­nize talkers.

Lee Drown:
Here, I high­light our main find­ings from Ses­sion 1. There was a sig­nif­i­cant increase in accu­ra­cy between pre- and post-test, as shown in Pan­el A, and peo­ple were faster at post-test com­pared to pre-test, as shown in Pan­el B. How­ev­er, we found no evi­dence of a left ear, right-hemi­sphere advan­tage for this task.

Lee Drown:
The same pat­terns held when we exam­ined only lis­ten­ers who showed learn­ing in this task. Here is per­for­mance in the four indi­vid­ual dif­fer­ence mea­sures in Ses­sion 2 for the 59 par­tic­i­pants who returned for this ses­sion. As you can see by the box plots for each task, we did elic­it a wide range of indi­vid­ual vari­a­tion for each construct.

Lee Drown:
Now to the main ques­tion, which is, “What indi­vid­ual dif­fer­ence fac­tors pre­dict per­for­mance in the talk­er iden­ti­fi­ca­tion task?” To answer this ques­tion, we cor­re­lat­ed per­for­mance in each indi­vid­ual dif­fer­ence task, with four mea­sures of talk­er iden­ti­fi­ca­tion: accu­ra­cy dur­ing train­ing, accu­ra­cy at pre-test, accu­ra­cy at post-test, and the dif­fer­ence in accu­ra­cy between post- and pre-tests, with high­er val­ues indi­cat­ing greater learning.

Lee Drown:
So first, inhi­bi­tion was not relat­ed to any mea­sure of talk­er iden­ti­fi­ca­tion. But in con­trast, pitch per­cep­tion was pos­i­tive­ly asso­ci­at­ed with talk­er iden­ti­fi­ca­tion accu­ra­cy at pre-test, train­ing, and post test, but it did not pre­dict the mag­ni­tude of learn­ing. Cat­e­go­ry iden­ti­fi­ca­tion slope was not relat­ed to any mea­sure of talk­er iden­ti­fi­ca­tion per­for­mance, but with­in-cat­e­go­ry dis­crim­i­na­tion was pos­i­tive­ly asso­ci­at­ed with talk­er iden­ti­fi­ca­tion at pre-test, train­ing, and post-test.

Lee Drown:
Over­all, although we did not repli­cate the orig­i­nal Fran­cis and Driscoll study, we were able to extend the orig­i­nal study to include indi­vid­ual dif­fer­ence mea­sures, in order to bet­ter under­stand the mech­a­nisms behind using pho­net­ic cues, such VOT, for talk­er iden­ti­fi­ca­tion. Specif­i­cal­ly, pitch per­cep­tion and with­in-cat­e­go­ry dis­crim­i­na­tion were found to be pre­dic­tors of per­for­mance on pre-train­ing, pre-test train­ing, and post-test, but not learn­ing over­all. So these find­ings sug­gest that a per­son­’s audi­to­ry acu­ity plays a strong role in their abil­i­ty to use pho­net­ic vari­a­tion as a cue to talk­er identity.

Lee Drown:
To con­clude, I want to high­light some best prac­tices we employed for web-based test­ing. First, we only invit­ed par­tic­i­pants back for exper­i­ment two, who showed that they were fol­low­ing task instruc­tions. Specif­i­cal­ly, we test­ed 140 peo­ple over­all, but only 97 met head­phone and train­ing accu­ra­cy cri­te­ria. There­fore, we saved valu­able lab resources by only test­ing com­pli­ant par­tic­i­pants in Ses­sion 2.

Lee Drown:
Last­ly, we employed mul­ti­ple checks to dis­cour­age the pres­ence of auto­mat­ed enroll­ment in online stud­ies by soft­ware appli­ca­tions, oth­er­wise known as bots. In this study, no sus­pect­ed bots remained after we exclud­ed par­tic­i­pants based on head­phone com­pli­ance and train­ing accuracy.

Lee Drown:
So I’d like to acknowl­edge my col­lab­o­ra­tors and fund­ing sources for this work. And I will also direct you to our OSF Repos­i­to­ry for addi­tion­al resources. Thank you so much for your atten­tion. And I will now address any imme­di­ate questions.

Speak­er 2:
Excel­lent, Lee. Thank you so much. Atten­dees? Ah, yes. Lee, there’s a ques­tion here. Did you redo head­phone checks at Ses­sion 2?

Lee Drown:
We did. Yep. So we retest­ed head­phone com­pli­ance as well, and it actu­al­ly was a great indi­ca­tor that includ­ing par­tic­i­pants in Ses­sion 2 who met head­phone com­pli­ance in Ses­sion 1 was a great idea, as the vast major­i­ty of our par­tic­i­pants, I believe, out of 57, only one did not meet cri­te­ria for head­phone com­pli­ance in Ses­sion 2. Which shows that if an indi­vid­ual is com­pli­ant with head­phones to begin with, that will most like­ly per­pet­u­ate fol­low­ing future ses­sions. So yes, we did re-exam­ine head­phone com­pli­ance, and it showed that we were mak­ing good deci­sions, as far as includ­ing those who were com­pli­ant to begin with.

Speak­er 2:
Excel­lent. There’s a few more ques­tions com­ing in. Lee, you can address these in the chat, or we can keep this dia­logue going on, and be online.


