Lee Drown, UniÂverÂsiÂty of ConÂnectiÂcut
@LeeDrown
Full TranÂscript:
Lee Drown:
All right, good afterÂnoon or mornÂing or evening everyÂone. Again, my name is Lee Drown, and I’m a PhD stuÂdent at the UniÂverÂsiÂty of ConÂnectiÂcut. So, it’s well-known that speech sigÂnals conÂtain both indexÂiÂcal and phoÂnetÂic cues, which allow peoÂple to recÂogÂnize voicÂes and meanÂing from the same sigÂnal. HowÂevÂer, a strict delinÂeation between indexÂiÂcal and phoÂnetÂic cues isn’t posÂsiÂble, givÂen that talkÂers show sysÂtemÂatÂic difÂferÂences in their phoÂnetÂic cues, and that lisÂtenÂers are senÂsiÂtive to these differences.
Lee Drown:
Today, I’m going to disÂcuss how phoÂnetÂic cues, such as voice-onset time, to idenÂtiÂfy talkÂers, as well as present reevalÂuÂaÂtion of eviÂdence that sugÂgests that learnÂing to use phoÂnetÂic cues induces a right-hemiÂsphere proÂcessÂing advanÂtage for talkÂer identification.
Lee Drown:
The curÂrent study is a repliÂcaÂtion and extenÂsion of work by FranÂcis and Driscoll in 2006. Their study trained parÂticÂiÂpants to use voice-onset time, or VOT, as a cue to idenÂtiÂfy talkÂers. VOT is a temÂpoÂral propÂerÂty of stop conÂsoÂnants, and it’s indiÂcatÂed by this red line here. This cue lets lisÂtenÂers deciÂpher the word “gain,” which is proÂduced with short VOTs from “cane,” which is proÂduced by relÂaÂtiveÂly longer VOTs.
Lee Drown:
The relÂaÂtive VOT does mark voice disÂtincÂtion; talkÂers show staÂble indiÂvidÂual difÂferÂences in their charÂacÂterÂisÂtic VOTs, even for the same stop conÂsoÂnant. So some talkÂers have longer VOTS than othÂers, and lisÂtenÂers are senÂsiÂtive to these differences.
Lee Drown:
FranÂcis and Driscoll also examÂined whether a left ear, right hemiÂsphere advanÂtage would emerge for parÂticÂiÂpants who were sucÂcessÂfulÂly able to learn to use VOT as a markÂer for talkÂer idenÂtiÂfiÂcaÂtion. They used a dichotÂic stimÂuÂlus manipÂuÂlaÂtion to examÂine hemiÂsphere conÂtriÂbuÂtions to task perÂforÂmance, buildÂing on neuÂroimagÂing research that sugÂgests hemiÂspherÂic optiÂmizaÂtion for difÂferÂent aspects of sigÂnal proÂcessÂing, with right hemiÂsphere temÂpoÂral regions domÂiÂnant for voice processing.
Lee Drown:
To invesÂtiÂgate whether hemiÂspherÂic conÂtriÂbuÂtions to using phoÂnetÂic cues for talkÂer idenÂtiÂfiÂcaÂtion, FranÂcis and Driscoll set up this talkÂer idenÂtiÂfiÂcaÂtion task. So lisÂtenÂers heard two talkÂers, and were asked to idenÂtiÂfy which talkÂer they heard. So the lisÂtenÂers heard Jared, who proÂduced tokens with VOTs in the 30-milÂlisecÂond range, and they heard Dave, who proÂduced tokens in the 50-milÂlisecÂond VOT range. So there was only a 20 milÂlisecÂond difÂferÂence between the short and long VOT charÂacÂterÂisÂtics of these two talkers.
Lee Drown:
In all actuÂalÂiÂty, all these tokens were proÂduced by the same talkÂer. So the parÂticÂiÂpants in this experÂiÂment heard the same funÂdaÂmenÂtal freÂquenÂcy, and othÂer indexÂiÂcal propÂerÂties assoÂciÂatÂed with the talkÂer’s voice. So these two talkÂers only difÂfered in their charÂacÂterÂisÂtic VOTs. This experÂiÂment conÂsistÂed of a pre-test, a trainÂing phase, and a post-test phase. All the tasks in the phase was the talkÂer idenÂtiÂfiÂcaÂtion task, and feedÂback was proÂvidÂed durÂing the trainÂing phase, but not the test phases.
Lee Drown:
As FranÂcis and Driscoll were interÂestÂed in examÂinÂing hemiÂspherÂic conÂtriÂbuÂtions to talkÂer idenÂtiÂfiÂcaÂtion, a dichotÂic lisÂtenÂing task was employed. DurÂing the pre-test and post-test phasÂes, stimÂuli were preÂsentÂed to either the left or right ear on each triÂal. StimÂuli were preÂsentÂed binÂauÂralÂly durÂing testÂing. FranÂcis & Driscoll found eviÂdence for learnÂing between pre and post-tests for eight subÂjects. For these subÂjects, they also idenÂtiÂfied a left ear, right-hemiÂspherÂic advanÂtage at the group levÂel, in the talkÂer idenÂtiÂfiÂcaÂtion task at post-test, but not at pre-test; which does sugÂgest that learnÂing to process VOT as a cue to talkÂer idenÂtiÂty induced re-latÂerÂalÂizaÂtion of hemiÂsphere dominance.
Lee Drown:
HowÂevÂer, the samÂple size of the parÂticÂiÂpants in this experÂiÂment was small, being only 18 parÂticÂiÂpants, and only roughÂly 50% of the parÂticÂiÂpants were able to meet this learnÂing criÂteÂria, defined as a 5% improveÂment in talkÂer idenÂtiÂfiÂcaÂtion accuÂraÂcy between pre and post-test. AddiÂtionÂalÂly, the staÂtisÂtiÂcal eviÂdence for this interÂacÂtion between phase and ear was weak, at p = 0.04.
Lee Drown:
For these reaÂsons, we decidÂed to conÂduct a repliÂcaÂtion and an extenÂsion of this study. SpecifÂiÂcalÂly, the goal of our curÂrent work was to answer two quesÂtions. First, do the results of the FranÂcis and Driscoll study repliÂcate with a largÂer samÂple? And secÂond, what makes someÂone a betÂter vs. PoorÂer learnÂer in this talkÂer idenÂtiÂfiÂcaÂtion task?
Lee Drown:
To answer these quesÂtions, lisÂtenÂers parÂticÂiÂpatÂed in two experÂiÂmenÂtal sesÂsions. SesÂsion 1 was a repliÂcaÂtion of the FranÂcis and Driscoll talkÂer idenÂtiÂfiÂcaÂtion task, and SesÂsion 2 conÂsistÂed of four indiÂvidÂual difÂferÂence meaÂsures, intendÂed to give insight into what meaÂsures preÂdict sucÂcess in using phoÂnetÂic cues for talkÂer idenÂtiÂfiÂcaÂtion. Both sesÂsions were deployed using GorilÂla. ParÂticÂiÂpants were recruitÂed using the ProÂlifÂic ParÂticÂiÂpant ParÂticÂiÂpaÂtion Pool, and in ProÂlifÂic, we recruitÂed parÂticÂiÂpants to match the samÂple demoÂgraphÂics to the origÂiÂnal study.
Lee Drown:
HeadÂphone comÂpliÂance for this task was paraÂmount, due to the dichotÂic lisÂtenÂing task required for the talkÂer idenÂtiÂfiÂcaÂtion task; thereÂfore, parÂticÂiÂpants were required to pass three headÂphones screens, all of which were proÂgrammed and deployed via GorilÂla. These tasks includÂed the Woods and ColÂleagues, and the Milne and ColÂleagues tasks, that have already been described by Dr. Theodore at the beginÂning of this panÂel. These two headÂphones screens, howÂevÂer canÂnot deterÂmine whether the parÂticÂiÂpant has actuÂalÂly placed the left headÂphone chanÂnel on the left ear, and vice verÂsa. ThereÂfore, we creÂatÂed a novÂel chanÂnel detecÂtion task, to ensure that the left headÂphone was in the left ear, and vice verÂsa for the right headÂphone. LisÂtenÂers had to show ceilÂing perÂforÂmance in all three of these headÂphones screens to be includÂed in the study.
Lee Drown:
The stimÂuli for the first sesÂsion were drawn from two VOT conÂtinÂua: one that ranged from gain to cane, and one that ranged from goal to coal. Both conÂtinÂua were creÂatÂed from natÂurÂal proÂducÂtions of the voice endÂpoint elicitÂed from a sinÂgle female monoÂlinÂgual EngÂlish speakÂer of AmerÂiÂcan EngÂlish. The token duraÂtion for every stimÂuli were equated.
Lee Drown:
In order to increase the porÂtion of the samÂple size able to comÂplete this task, we increase the difÂferÂence between the short and long VOTs from 20 milÂlisecÂonds, as was found in the origÂiÂnal FranÂcis and Driscoll study, to 80 milÂlisecÂonds. By doing this, we aimed to increase the numÂber of parÂticÂiÂpants who could learn to use VOT as a markÂer for talkÂer identification.
Lee Drown:
We named the long VOT talkÂer Sheila, and the short VOT talkÂer Joanne. Both of these talkÂers have three unique tokens in each of their respecÂtive long and short VOT ranges. LisÂtenÂers heard three tokens in these VOT spaces throughÂout the experÂiÂment. SpecifÂiÂcalÂly, they heard two tokens from both Joanne and Sheila for each word durÂing trainÂing, and a difÂferÂent token for each talkÂer for each word durÂing pre- and post-test.
Lee Drown:
Just as in FranÂcis and Driscoll, in our SesÂsion 1, lisÂtenÂers first comÂpletÂed a pre-test, folÂlowed by a trainÂing phase, and a post-test. Only lisÂtenÂers who met incluÂsion criÂteÂria for SesÂsion 1 were then invitÂed to parÂticÂiÂpate in SesÂsion 2. The criÂteÂria were: first, that they passed all three headÂphones screens, thus showÂing headÂphone comÂpliÂance; and secÂond, that they perÂformed above chance durÂing the trainÂing sesÂsion. And it should be notÂed that durÂing the trainÂing sesÂsion, lisÂtenÂers received feedÂback on responsÂes, so thereÂfore, a perÂforÂmance above chance indiÂcates adeÂquate effort to the task. It’s imporÂtant to note as well, that we did not exclude parÂticÂiÂpants who did not meet the FranÂcis and Driscoll criÂteÂria for learnÂing, as we were interÂestÂed in examÂinÂing how indiÂvidÂual difÂferÂence meaÂsures tracked with talkÂer idenÂtiÂfiÂcaÂtion for all listeners.
Lee Drown:
Of the 140 parÂticÂiÂpants testÂed in SesÂsion 1, 28 were excludÂed on the first criÂteÂriÂon, and 15 were excludÂed on the secÂond criÂteÂriÂon, leavÂing a final samÂple of 97 parÂticÂiÂpants in SesÂsion 1, who were invitÂed back to comÂplete SesÂsion 2. Again, SesÂsion 2 examÂined indiÂvidÂual difÂferÂence meaÂsures, to delinÂeate what made cerÂtain lisÂtenÂers good at the FranÂcis and Driscoll talkÂer idenÂtiÂfiÂcaÂtion task. Since the FranÂcis and Driscoll study did not examÂine indiÂvidÂual difÂferÂence amongst parÂticÂiÂpants, and only showed perÂforÂmance at group levÂel, it is unknown what facÂtors conÂtribute to a perÂsonÂ’s abilÂiÂty to use phoÂnetÂic cues, such as voice-onset time, for talkÂer identification.
Lee Drown:
The four indiÂvidÂual difÂferÂence conÂstructs are shown here, as well as the task used to assess these conÂstructs, and how we quanÂtiÂfied an indiÂvidÂuÂal’s behavÂior in these tasks. So a flanker task was used to meaÂsure an indiÂvidÂuÂal’s inhiÂbiÂtion. In the pitch perÂcepÂtion task, lisÂtenÂers heard two tone sequences, and were asked to idenÂtiÂfy if the tone sequences were the same or difÂferÂent. For the catÂeÂgoÂry idenÂtiÂfiÂcaÂtion task, lisÂtenÂers catÂeÂgoÂrized the first sound of a VOT conÂtinÂuÂum as either “g” or “c.” And for the withÂin-catÂeÂgoÂry disÂcrimÂiÂnaÂtion task, lisÂtenÂers heard pairs of tokens from a VOT conÂtinÂuÂum, and idenÂtiÂfied whether the two tokens were the same or different.
Lee Drown:
CritÂiÂcalÂly, the VOT conÂtinÂuÂum used for the catÂeÂgoÂry idenÂtiÂfiÂcaÂtion task and the withÂin-catÂeÂgoÂry disÂcrimÂiÂnaÂtion task was proÂduced by a difÂferÂent talkÂer than used in SesÂsion 1, in order to minÂiÂmize any transÂfer of learnÂing between SesÂsion 1 and SesÂsion 2. We used these meaÂsures based off of past work that sugÂgests that these conÂstructs maybe linked to an indiÂvidÂuÂal’s abilÂiÂty to recÂogÂnize talkers.
Lee Drown:
Here, I highÂlight our main findÂings from SesÂsion 1. There was a sigÂnifÂiÂcant increase in accuÂraÂcy between pre- and post-test, as shown in PanÂel A, and peoÂple were faster at post-test comÂpared to pre-test, as shown in PanÂel B. HowÂevÂer, we found no eviÂdence of a left ear, right-hemiÂsphere advanÂtage for this task.
Lee Drown:
The same patÂterns held when we examÂined only lisÂtenÂers who showed learnÂing in this task. Here is perÂforÂmance in the four indiÂvidÂual difÂferÂence meaÂsures in SesÂsion 2 for the 59 parÂticÂiÂpants who returned for this sesÂsion. As you can see by the box plots for each task, we did elicÂit a wide range of indiÂvidÂual variÂaÂtion for each construct.
Lee Drown:
Now to the main quesÂtion, which is, “What indiÂvidÂual difÂferÂence facÂtors preÂdict perÂforÂmance in the talkÂer idenÂtiÂfiÂcaÂtion task?” To answer this quesÂtion, we corÂreÂlatÂed perÂforÂmance in each indiÂvidÂual difÂferÂence task, with four meaÂsures of talkÂer idenÂtiÂfiÂcaÂtion: accuÂraÂcy durÂing trainÂing, accuÂraÂcy at pre-test, accuÂraÂcy at post-test, and the difÂferÂence in accuÂraÂcy between post- and pre-tests, with highÂer valÂues indiÂcatÂing greater learning.
Lee Drown:
So first, inhiÂbiÂtion was not relatÂed to any meaÂsure of talkÂer idenÂtiÂfiÂcaÂtion. But in conÂtrast, pitch perÂcepÂtion was posÂiÂtiveÂly assoÂciÂatÂed with talkÂer idenÂtiÂfiÂcaÂtion accuÂraÂcy at pre-test, trainÂing, and post test, but it did not preÂdict the magÂniÂtude of learnÂing. CatÂeÂgoÂry idenÂtiÂfiÂcaÂtion slope was not relatÂed to any meaÂsure of talkÂer idenÂtiÂfiÂcaÂtion perÂforÂmance, but withÂin-catÂeÂgoÂry disÂcrimÂiÂnaÂtion was posÂiÂtiveÂly assoÂciÂatÂed with talkÂer idenÂtiÂfiÂcaÂtion at pre-test, trainÂing, and post-test.
Lee Drown:
OverÂall, although we did not repliÂcate the origÂiÂnal FranÂcis and Driscoll study, we were able to extend the origÂiÂnal study to include indiÂvidÂual difÂferÂence meaÂsures, in order to betÂter underÂstand the mechÂaÂnisms behind using phoÂnetÂic cues, such VOT, for talkÂer idenÂtiÂfiÂcaÂtion. SpecifÂiÂcalÂly, pitch perÂcepÂtion and withÂin-catÂeÂgoÂry disÂcrimÂiÂnaÂtion were found to be preÂdicÂtors of perÂforÂmance on pre-trainÂing, pre-test trainÂing, and post-test, but not learnÂing overÂall. So these findÂings sugÂgest that a perÂsonÂ’s audiÂtoÂry acuÂity plays a strong role in their abilÂiÂty to use phoÂnetÂic variÂaÂtion as a cue to talkÂer identity.
Lee Drown:
To conÂclude, I want to highÂlight some best pracÂtices we employed for web-based testÂing. First, we only invitÂed parÂticÂiÂpants back for experÂiÂment two, who showed that they were folÂlowÂing task instrucÂtions. SpecifÂiÂcalÂly, we testÂed 140 peoÂple overÂall, but only 97 met headÂphone and trainÂing accuÂraÂcy criÂteÂria. ThereÂfore, we saved valuÂable lab resources by only testÂing comÂpliÂant parÂticÂiÂpants in SesÂsion 2.
Lee Drown:
LastÂly, we employed mulÂtiÂple checks to disÂcourÂage the presÂence of autoÂmatÂed enrollÂment in online studÂies by softÂware appliÂcaÂtions, othÂerÂwise known as bots. In this study, no susÂpectÂed bots remained after we excludÂed parÂticÂiÂpants based on headÂphone comÂpliÂance and trainÂing accuracy.
Lee Drown:
So I’d like to acknowlÂedge my colÂlabÂoÂraÂtors and fundÂing sources for this work. And I will also direct you to our OSF ReposÂiÂtoÂry for addiÂtionÂal resources. Thank you so much for your attenÂtion. And I will now address any immeÂdiÂate questions.
SpeakÂer 2:
ExcelÂlent, Lee. Thank you so much. AttenÂdees? Ah, yes. Lee, there’s a quesÂtion here. Did you redo headÂphone checks at SesÂsion 2?
Lee Drown:
We did. Yep. So we retestÂed headÂphone comÂpliÂance as well, and it actuÂalÂly was a great indiÂcaÂtor that includÂing parÂticÂiÂpants in SesÂsion 2 who met headÂphone comÂpliÂance in SesÂsion 1 was a great idea, as the vast majorÂiÂty of our parÂticÂiÂpants, I believe, out of 57, only one did not meet criÂteÂria for headÂphone comÂpliÂance in SesÂsion 2. Which shows that if an indiÂvidÂual is comÂpliÂant with headÂphones to begin with, that will most likeÂly perÂpetÂuÂate folÂlowÂing future sesÂsions. So yes, we did re-examÂine headÂphone comÂpliÂance, and it showed that we were makÂing good deciÂsions, as far as includÂing those who were comÂpliÂant to begin with.
SpeakÂer 2:
ExcelÂlent. There’s a few more quesÂtions comÂing in. Lee, you can address these in the chat, or we can keep this diaÂlogue going on, and be online.


