Lee Drown, University of Connecticut
@LeeDrown
Full Transcript:
Lee Drown:
All right, good afternoon or morning or evening everyone. Again, my name is Lee Drown, and I’m a PhD student at the University of Connecticut. So, it’s well-known that speech signals contain both indexical and phonetic cues, which allow people to recognize voices and meaning from the same signal. However, a strict delineation between indexical and phonetic cues isn’t possible, given that talkers show systematic differences in their phonetic cues, and that listeners are sensitive to these differences.
Lee Drown:
Today, I’m going to discuss how phonetic cues, such as voice-onset time, to identify talkers, as well as present reevaluation of evidence that suggests that learning to use phonetic cues induces a right-hemisphere processing advantage for talker identification.
Lee Drown:
The current study is a replication and extension of work by Francis and Driscoll in 2006. Their study trained participants to use voice-onset time, or VOT, as a cue to identify talkers. VOT is a temporal property of stop consonants, and it’s indicated by this red line here. This cue lets listeners decipher the word “gain,” which is produced with short VOTs from “cane,” which is produced by relatively longer VOTs.
Lee Drown:
The relative VOT does mark voice distinction; talkers show stable individual differences in their characteristic VOTs, even for the same stop consonant. So some talkers have longer VOTS than others, and listeners are sensitive to these differences.
Lee Drown:
Francis and Driscoll also examined whether a left ear, right hemisphere advantage would emerge for participants who were successfully able to learn to use VOT as a marker for talker identification. They used a dichotic stimulus manipulation to examine hemisphere contributions to task performance, building on neuroimaging research that suggests hemispheric optimization for different aspects of signal processing, with right hemisphere temporal regions dominant for voice processing.
Lee Drown:
To investigate whether hemispheric contributions to using phonetic cues for talker identification, Francis and Driscoll set up this talker identification task. So listeners heard two talkers, and were asked to identify which talker they heard. So the listeners heard Jared, who produced tokens with VOTs in the 30-millisecond range, and they heard Dave, who produced tokens in the 50-millisecond VOT range. So there was only a 20 millisecond difference between the short and long VOT characteristics of these two talkers.
Lee Drown:
In all actuality, all these tokens were produced by the same talker. So the participants in this experiment heard the same fundamental frequency, and other indexical properties associated with the talker’s voice. So these two talkers only differed in their characteristic VOTs. This experiment consisted of a pre-test, a training phase, and a post-test phase. All the tasks in the phase was the talker identification task, and feedback was provided during the training phase, but not the test phases.
Lee Drown:
As Francis and Driscoll were interested in examining hemispheric contributions to talker identification, a dichotic listening task was employed. During the pre-test and post-test phases, stimuli were presented to either the left or right ear on each trial. Stimuli were presented binaurally during testing. Francis & Driscoll found evidence for learning between pre and post-tests for eight subjects. For these subjects, they also identified a left ear, right-hemispheric advantage at the group level, in the talker identification task at post-test, but not at pre-test; which does suggest that learning to process VOT as a cue to talker identity induced re-lateralization of hemisphere dominance.
Lee Drown:
However, the sample size of the participants in this experiment was small, being only 18 participants, and only roughly 50% of the participants were able to meet this learning criteria, defined as a 5% improvement in talker identification accuracy between pre and post-test. Additionally, the statistical evidence for this interaction between phase and ear was weak, at p = 0.04.
Lee Drown:
For these reasons, we decided to conduct a replication and an extension of this study. Specifically, the goal of our current work was to answer two questions. First, do the results of the Francis and Driscoll study replicate with a larger sample? And second, what makes someone a better vs. Poorer learner in this talker identification task?
Lee Drown:
To answer these questions, listeners participated in two experimental sessions. Session 1 was a replication of the Francis and Driscoll talker identification task, and Session 2 consisted of four individual difference measures, intended to give insight into what measures predict success in using phonetic cues for talker identification. Both sessions were deployed using Gorilla. Participants were recruited using the Prolific Participant Participation Pool, and in Prolific, we recruited participants to match the sample demographics to the original study.
Lee Drown:
Headphone compliance for this task was paramount, due to the dichotic listening task required for the talker identification task; therefore, participants were required to pass three headphones screens, all of which were programmed and deployed via Gorilla. These tasks included the Woods and Colleagues, and the Milne and Colleagues tasks, that have already been described by Dr. Theodore at the beginning of this panel. These two headphones screens, however cannot determine whether the participant has actually placed the left headphone channel on the left ear, and vice versa. Therefore, we created a novel channel detection task, to ensure that the left headphone was in the left ear, and vice versa for the right headphone. Listeners had to show ceiling performance in all three of these headphones screens to be included in the study.
Lee Drown:
The stimuli for the first session were drawn from two VOT continua: one that ranged from gain to cane, and one that ranged from goal to coal. Both continua were created from natural productions of the voice endpoint elicited from a single female monolingual English speaker of American English. The token duration for every stimuli were equated.
Lee Drown:
In order to increase the portion of the sample size able to complete this task, we increase the difference between the short and long VOTs from 20 milliseconds, as was found in the original Francis and Driscoll study, to 80 milliseconds. By doing this, we aimed to increase the number of participants who could learn to use VOT as a marker for talker identification.
Lee Drown:
We named the long VOT talker Sheila, and the short VOT talker Joanne. Both of these talkers have three unique tokens in each of their respective long and short VOT ranges. Listeners heard three tokens in these VOT spaces throughout the experiment. Specifically, they heard two tokens from both Joanne and Sheila for each word during training, and a different token for each talker for each word during pre- and post-test.
Lee Drown:
Just as in Francis and Driscoll, in our Session 1, listeners first completed a pre-test, followed by a training phase, and a post-test. Only listeners who met inclusion criteria for Session 1 were then invited to participate in Session 2. The criteria were: first, that they passed all three headphones screens, thus showing headphone compliance; and second, that they performed above chance during the training session. And it should be noted that during the training session, listeners received feedback on responses, so therefore, a performance above chance indicates adequate effort to the task. It’s important to note as well, that we did not exclude participants who did not meet the Francis and Driscoll criteria for learning, as we were interested in examining how individual difference measures tracked with talker identification for all listeners.
Lee Drown:
Of the 140 participants tested in Session 1, 28 were excluded on the first criterion, and 15 were excluded on the second criterion, leaving a final sample of 97 participants in Session 1, who were invited back to complete Session 2. Again, Session 2 examined individual difference measures, to delineate what made certain listeners good at the Francis and Driscoll talker identification task. Since the Francis and Driscoll study did not examine individual difference amongst participants, and only showed performance at group level, it is unknown what factors contribute to a person’s ability to use phonetic cues, such as voice-onset time, for talker identification.
Lee Drown:
The four individual difference constructs are shown here, as well as the task used to assess these constructs, and how we quantified an individual’s behavior in these tasks. So a flanker task was used to measure an individual’s inhibition. In the pitch perception task, listeners heard two tone sequences, and were asked to identify if the tone sequences were the same or different. For the category identification task, listeners categorized the first sound of a VOT continuum as either “g” or “c.” And for the within-category discrimination task, listeners heard pairs of tokens from a VOT continuum, and identified whether the two tokens were the same or different.
Lee Drown:
Critically, the VOT continuum used for the category identification task and the within-category discrimination task was produced by a different talker than used in Session 1, in order to minimize any transfer of learning between Session 1 and Session 2. We used these measures based off of past work that suggests that these constructs maybe linked to an individual’s ability to recognize talkers.
Lee Drown:
Here, I highlight our main findings from Session 1. There was a significant increase in accuracy between pre- and post-test, as shown in Panel A, and people were faster at post-test compared to pre-test, as shown in Panel B. However, we found no evidence of a left ear, right-hemisphere advantage for this task.
Lee Drown:
The same patterns held when we examined only listeners who showed learning in this task. Here is performance in the four individual difference measures in Session 2 for the 59 participants who returned for this session. As you can see by the box plots for each task, we did elicit a wide range of individual variation for each construct.
Lee Drown:
Now to the main question, which is, “What individual difference factors predict performance in the talker identification task?” To answer this question, we correlated performance in each individual difference task, with four measures of talker identification: accuracy during training, accuracy at pre-test, accuracy at post-test, and the difference in accuracy between post- and pre-tests, with higher values indicating greater learning.
Lee Drown:
So first, inhibition was not related to any measure of talker identification. But in contrast, pitch perception was positively associated with talker identification accuracy at pre-test, training, and post test, but it did not predict the magnitude of learning. Category identification slope was not related to any measure of talker identification performance, but within-category discrimination was positively associated with talker identification at pre-test, training, and post-test.
Lee Drown:
Overall, although we did not replicate the original Francis and Driscoll study, we were able to extend the original study to include individual difference measures, in order to better understand the mechanisms behind using phonetic cues, such VOT, for talker identification. Specifically, pitch perception and within-category discrimination were found to be predictors of performance on pre-training, pre-test training, and post-test, but not learning overall. So these findings suggest that a person’s auditory acuity plays a strong role in their ability to use phonetic variation as a cue to talker identity.
Lee Drown:
To conclude, I want to highlight some best practices we employed for web-based testing. First, we only invited participants back for experiment two, who showed that they were following task instructions. Specifically, we tested 140 people overall, but only 97 met headphone and training accuracy criteria. Therefore, we saved valuable lab resources by only testing compliant participants in Session 2.
Lee Drown:
Lastly, we employed multiple checks to discourage the presence of automated enrollment in online studies by software applications, otherwise known as bots. In this study, no suspected bots remained after we excluded participants based on headphone compliance and training accuracy.
Lee Drown:
So I’d like to acknowledge my collaborators and funding sources for this work. And I will also direct you to our OSF Repository for additional resources. Thank you so much for your attention. And I will now address any immediate questions.
Speaker 2:
Excellent, Lee. Thank you so much. Attendees? Ah, yes. Lee, there’s a question here. Did you redo headphone checks at Session 2?
Lee Drown:
We did. Yep. So we retested headphone compliance as well, and it actually was a great indicator that including participants in Session 2 who met headphone compliance in Session 1 was a great idea, as the vast majority of our participants, I believe, out of 57, only one did not meet criteria for headphone compliance in Session 2. Which shows that if an individual is compliant with headphones to begin with, that will most likely perpetuate following future sessions. So yes, we did re-examine headphone compliance, and it showed that we were making good decisions, as far as including those who were compliant to begin with.
Speaker 2:
Excellent. There’s a few more questions coming in. Lee, you can address these in the chat, or we can keep this dialogue going on, and be online.