GazeScorer — Automated gaze direction scoring from videos collected online through conventional webcam.

Alex Fraser, University of Oxford

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Always unblock YouTube

As online research has become more prevalent, researchers are investigating the possibility of replicating techniques that go beyond simple behavioural measurements. One method that has captured the imagination of researchers is leveraging the webcam to collect eye-tracking data. Several packages have been developed for collecting such data but have significant limitations due to extensive and potentially frustrating calibration procedures. Unfortunately, this can limit the accessibility of these packages when collecting data with specific populations, such as children and participants with neuro-developmental difficulties. To overcome this, we have looked at how gaze detection studies are conducted with infants, where researchers will manually score gaze direction from videos to minimise data loss.

Using these methods, we have developed GazeScorer, an automated gaze scoring package that can distinguish a Left, Right, and Central gaze location using basic image processing. Using videos collected through a Gorilla-hosted experiment, we have demonstrated a good level of inter-rater reliability between GazeScorer and a manual scorer. This opens the possibility of a hybrid-scoring system with minimal manual intervention in the short term. Future development will focus on utilising live webcam footage for data collection through the browser. This software would provide a potential resource for researchers who would benefit from gaze-based responses, but do not require high spatial resolution.

Full Transcript:

Alex Fraser:
Great. So thanks for having me. And I’m really excited to tell you about the project that we’ve been working on for the last year and a bit. But I thought I’d start by discussing my journey into online research and where I started.

Alex Fraser:
So in 2017 my department shut down without any real notice, and we lost access to lab space right in the middle of my PhD. I moved a lot of my research online and we were really impressed by the amount of data we were able to collect in such a short period of time. But we also wanted to look at how we could take this data into participants’ homes, of those who are not able to get to the lab quite so easily, so small children and diverse populations.

Alex Fraser:
But we were also interested in how we could move beyond the reaction times and the accuracy scores that we were getting very reliably in the browser. And so we contacted Gorilla and they had been working on this at the same time, and they built us an experiment for us to work with. And we started doing some piloting, but we found that the calibration was quite long and involved and it made it quite difficult for us to collect … Get the kind of eye tracking that we’d hoped to be able to do with these populations.

Alex Fraser:
So we took a bit of a shot in the dark and we started a new project with the Oxford Research Software Engineering team to see what we could do about making our own pipeline for analyzing webcam data. And when we sat down with them and we established what we wanted. The main thing that we needed was something that had a very limited calibration. So we actually really wanted to minimize the calibration to as little as possible to make it as simple for us to collect data.

Alex Fraser:
And as we discussed it more, we started thinking more about what we wanted. And so we were focused so much on trying to replicate an eye tracker, and a lab-based eye tracker, but we thought we may take a step back and actually consider doing something maybe a bit more simplistic, but maybe more reliable. And we looked to the manual scoring that is often done in infant research, and we wondered how we could do that a bit more efficiently and a bit less labor intensive. And so we decided to actually focus more on gaze orientation and so codifying a left and right look, compared to actually trying for a precise gaze location.

Alex Fraser:
So to do this means to collect a lot of video footage of participants following the target stimuli. And Sylvia discussed how she did that with children in the previous talk, but with the adults it was a lot more simple. We could just send them the same procedure and they generally were able to comply themselves and we didn’t have to supervise them as they were doing it. And we did this in Gorilla and collected a lot of footage online and we ended up with a series of videos like this. And what we were able to do is we could trim these videos down to and synchronize them with the target stimuli. So we know approximately where they’re looking as they are watching the stimuli.

Alex Fraser:
Then we needed to break down the images into individual frames. And once we had those individual frames, we were able to do more image processing than [inaudible 00:03:12] we were using the webcam footage independently. But also, we needed to set down a baseline, ground truth, that we could compare to our automatic scorer. So to do this we went back to the manual scoring that we were trying to replicate. And so we put all of these images online into another Gorilla experiment, and we had an independent naive researcher who came in and manually scored all of the videos for their gaze orientation. Which took a fair amount of time and a lot of effort, but we got that done.

Alex Fraser:
And this meant that we were able to do a good comparison to our automated scorer. And looking at how we actually did our automated scoring, the first thing we need to do is identify the face in the image. So once we detect the face, we were able to cut it down and we could plot landmarks on to each image of every face. And specifically what we needed was the eye location, and we could actually isolate the eye and work with that independently.

Alex Fraser:
And as you can see, the eye itself is actually very small, only about 30 to 40 pixels, so there’s not a lot of space for us to work with. But what we were able to do is we were able to identify the iris by looking for essentially the darkest space that was within the target. And when we processed this we ended up with a shape like this. And when we have this shape we can then identify the middle of the shape, and we classify this as being the middle of the Iris.

Alex Fraser:
Now I said we [inaudible 00:04:39] wanted to try and minimize any calibration, and this is where we replace what a traditional eye tracking calibration would be. So instead of doing a traditional calibration that you would expect, we rather, we just specify where a central gaze is. So we know where the participant is looking when they’re looking straight ahead. We do this within the first frame of any video, and so then we can assign a buffer around the center of that image. And this is what we do instead of our calibration, any movement outside of the buffer region would be considered a codified look towards the left and the right.

Alex Fraser:
And so now we have automated scoring and we have a manual scoring, we can actually compare the two. So in these visual plots, we can see how the top row, which is the manual scorer, is codifying the gaze, and the bottom score is also codifying at the same time. There’s a little bit of a lag, but generally they are following the same, they’re converging in their gaze orientation. So to quantify this a little bit more we did a Cohen’s kappa comparison between the two. And we set a minimum value that we wanted to accept as 0.6, which is a general consensus for Cohen’s kappa scores.

Alex Fraser:
And what we found when we look at the data among the adults is that the vast majority of participants got above a Cohen’s kappa value of 0.6. And if anything, a lot of the participants are scoring well above 0.8, and almost approaching near-perfect agreement. This is only focusing on the static frames where the target is at its most extreme position, but this is still showing very good agreement. There are a couple of participants where we see one eye under-performed compared to the other, but in general we are doing very well. I’m looking at the sample that Sylvia collected before me. We see in the children, we see very similar patterns of results, where the majority are getting very good agreement in both eyes, but there is a couple where the agreement is lower in one over the other. But this being that we’re still seeing very high agreement in these optimal conditions we’re looking to work with.

Alex Fraser:
So to give you a bit of a summary in what we hope to do with this moving forward. Basically, how did we perform? I think we did very well. There was generally quite good agreement between the automatic and the manual scorer in these optimal conditions that we put down. When we look at more difficult bits where the eye is actually in movement because it’s following a target, the performance isn’t quite as good. And we’re looking at how we can improve that and what elements may impact the movement. And hopefully we can improve the algorithm. But it’s just basically just a first pass at the problem in the first instance. And hopefully we’ll be able to improve this before we can give access to people in the near future.

Alex Fraser:
But the other thing I wanted to discuss is where do we fit in within the current online research? And we heard a lot of great work done yesterday using WebGazer and the team working with mouse viewer. And in fact, we saw Kat Ellis who showed, she managed to get some good results with kids with Fragile X, who were able to go through the calibration that we were having trouble with. And so this is all very impressive, and we’re hoping that we can just be another resource that will fit into this new environment of online research. And hopefully people will be able to do good things with this in the future.

Alex Fraser:
Yeah, thank you to everyone on the team who’s worked with us. And our pre-print is available with a bit more detail about the data that I presented here. Feel free to email me if you have any questions.

Speaker 2:
Thanks very much, Alex. We have got a couple of questions in the chats, so I’m going to ask you one of them.

Alex Fraser:
Okay.

Speaker 2:
But then you go down to them in the Q&A, that’d be really kind. Thank you. So the first one is from Catherine Ellis and she says, “How still do participants have to be?” So for example, if you were working with children, how careful would you need to be about that?

Alex Fraser:
We still to maintain relatively little movement, so we do need to minimize movement as much as possible. But because we’re capturing all the features of the face as we are, one of the future things that we hope to be able to do is compensate for movement more.

Alex Fraser:
So yeah, as I said, this is still the very early, very preliminary stuff. Once we can work more with the face landmarks and accounting for movement, we’ll be able to establish ways of compensating for movement a little bit better. So that’s the goal in the future.

Speaker 2:
Brilliant. Thank you very much.

Alex Fraser

University of Oxford

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

Register Now

GazeScor­er — Auto­mat­ed gaze direc­tion scor­ing from videos col­lect­ed online through con­ven­tion­al webcam.

Full Tran­script:

Alex Fraser

University of Oxford

Get on the Registration List

With thanks to our sponsors!

GazeScorer — Automated gaze direction scoring from videos collected online through conventional webcam.

Full Transcript: