GazeScor­er — Auto­mat­ed gaze direc­tion scor­ing from videos col­lect­ed online through con­ven­tion­al webcam.

Alex Fras­er, Uni­ver­si­ty of Oxford

YouTube

By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

As online research has become more preva­lent, researchers are inves­ti­gat­ing the pos­si­bil­i­ty of repli­cat­ing tech­niques that go beyond sim­ple behav­iour­al mea­sure­ments. One method that has cap­tured the imag­i­na­tion of researchers is lever­ag­ing the web­cam to col­lect eye-track­ing data. Sev­er­al pack­ages have been devel­oped for col­lect­ing such data but have sig­nif­i­cant lim­i­ta­tions due to exten­sive and poten­tial­ly frus­trat­ing cal­i­bra­tion pro­ce­dures. Unfor­tu­nate­ly, this can lim­it the acces­si­bil­i­ty of these pack­ages when col­lect­ing data with spe­cif­ic pop­u­la­tions, such as chil­dren and par­tic­i­pants with neu­ro-devel­op­men­tal dif­fi­cul­ties. To over­come this, we have looked at how gaze detec­tion stud­ies are con­duct­ed with infants, where researchers will man­u­al­ly score gaze direc­tion from videos to min­imise data loss.

Using these meth­ods, we have devel­oped GazeScor­er, an auto­mat­ed gaze scor­ing pack­age that can dis­tin­guish a Left, Right, and Cen­tral gaze loca­tion using basic image pro­cess­ing. Using videos col­lect­ed through a Goril­la-host­ed exper­i­ment, we have demon­strat­ed a good lev­el of inter-rater reli­a­bil­i­ty between GazeScor­er and a man­u­al scor­er. This opens the pos­si­bil­i­ty of a hybrid-scor­ing sys­tem with min­i­mal man­u­al inter­ven­tion in the short term. Future devel­op­ment will focus on util­is­ing live web­cam footage for data col­lec­tion through the brows­er. This soft­ware would pro­vide a poten­tial resource for researchers who would ben­e­fit from gaze-based respons­es, but do not require high spa­tial resolution.

Full Tran­script:

Alex Fras­er:
Great. So thanks for hav­ing me. And I’m real­ly excit­ed to tell you about the project that we’ve been work­ing on for the last year and a bit. But I thought I’d start by dis­cussing my jour­ney into online research and where I started.

Alex Fras­er:
So in 2017 my depart­ment shut down with­out any real notice, and we lost access to lab space right in the mid­dle of my PhD. I moved a lot of my research online and we were real­ly impressed by the amount of data we were able to col­lect in such a short peri­od of time. But we also want­ed to look at how we could take this data into par­tic­i­pants’ homes, of those who are not able to get to the lab quite so eas­i­ly, so small chil­dren and diverse populations.

Alex Fras­er:
But we were also inter­est­ed in how we could move beyond the reac­tion times and the accu­ra­cy scores that we were get­ting very reli­ably in the brows­er. And so we con­tact­ed Goril­la and they had been work­ing on this at the same time, and they built us an exper­i­ment for us to work with. And we start­ed doing some pilot­ing, but we found that the cal­i­bra­tion was quite long and involved and it made it quite dif­fi­cult for us to col­lect … Get the kind of eye track­ing that we’d hoped to be able to do with these populations.

Alex Fras­er:
So we took a bit of a shot in the dark and we start­ed a new project with the Oxford Research Soft­ware Engi­neer­ing team to see what we could do about mak­ing our own pipeline for ana­lyz­ing web­cam data. And when we sat down with them and we estab­lished what we want­ed. The main thing that we need­ed was some­thing that had a very lim­it­ed cal­i­bra­tion. So we actu­al­ly real­ly want­ed to min­i­mize the cal­i­bra­tion to as lit­tle as pos­si­ble to make it as sim­ple for us to col­lect data.

Alex Fras­er:
And as we dis­cussed it more, we start­ed think­ing more about what we want­ed. And so we were focused so much on try­ing to repli­cate an eye track­er, and a lab-based eye track­er, but we thought we may take a step back and actu­al­ly con­sid­er doing some­thing maybe a bit more sim­plis­tic, but maybe more reli­able. And we looked to the man­u­al scor­ing that is often done in infant research, and we won­dered how we could do that a bit more effi­cient­ly and a bit less labor inten­sive. And so we decid­ed to actu­al­ly focus more on gaze ori­en­ta­tion and so cod­i­fy­ing a left and right look, com­pared to actu­al­ly try­ing for a pre­cise gaze location.

Alex Fras­er:
So to do this means to col­lect a lot of video footage of par­tic­i­pants fol­low­ing the tar­get stim­uli. And Sylvia dis­cussed how she did that with chil­dren in the pre­vi­ous talk, but with the adults it was a lot more sim­ple. We could just send them the same pro­ce­dure and they gen­er­al­ly were able to com­ply them­selves and we did­n’t have to super­vise them as they were doing it. And we did this in Goril­la and col­lect­ed a lot of footage online and we end­ed up with a series of videos like this. And what we were able to do is we could trim these videos down to and syn­chro­nize them with the tar­get stim­uli. So we know approx­i­mate­ly where they’re look­ing as they are watch­ing the stimuli.

Alex Fras­er:
Then we need­ed to break down the images into indi­vid­ual frames. And once we had those indi­vid­ual frames, we were able to do more image pro­cess­ing than [inaudi­ble 00:03:12] we were using the web­cam footage inde­pen­dent­ly. But also, we need­ed to set down a base­line, ground truth, that we could com­pare to our auto­mat­ic scor­er. So to do this we went back to the man­u­al scor­ing that we were try­ing to repli­cate. And so we put all of these images online into anoth­er Goril­la exper­i­ment, and we had an inde­pen­dent naive researcher who came in and man­u­al­ly scored all of the videos for their gaze ori­en­ta­tion. Which took a fair amount of time and a lot of effort, but we got that done.

Alex Fras­er:
And this meant that we were able to do a good com­par­i­son to our auto­mat­ed scor­er. And look­ing at how we actu­al­ly did our auto­mat­ed scor­ing, the first thing we need to do is iden­ti­fy the face in the image. So once we detect the face, we were able to cut it down and we could plot land­marks on to each image of every face. And specif­i­cal­ly what we need­ed was the eye loca­tion, and we could actu­al­ly iso­late the eye and work with that independently.

Alex Fras­er:
And as you can see, the eye itself is actu­al­ly very small, only about 30 to 40 pix­els, so there’s not a lot of space for us to work with. But what we were able to do is we were able to iden­ti­fy the iris by look­ing for essen­tial­ly the dark­est space that was with­in the tar­get. And when we processed this we end­ed up with a shape like this. And when we have this shape we can then iden­ti­fy the mid­dle of the shape, and we clas­si­fy this as being the mid­dle of the Iris.

Alex Fras­er:
Now I said we [inaudi­ble 00:04:39] want­ed to try and min­i­mize any cal­i­bra­tion, and this is where we replace what a tra­di­tion­al eye track­ing cal­i­bra­tion would be. So instead of doing a tra­di­tion­al cal­i­bra­tion that you would expect, we rather, we just spec­i­fy where a cen­tral gaze is. So we know where the par­tic­i­pant is look­ing when they’re look­ing straight ahead. We do this with­in the first frame of any video, and so then we can assign a buffer around the cen­ter of that image. And this is what we do instead of our cal­i­bra­tion, any move­ment out­side of the buffer region would be con­sid­ered a cod­i­fied look towards the left and the right.

Alex Fras­er:
And so now we have auto­mat­ed scor­ing and we have a man­u­al scor­ing, we can actu­al­ly com­pare the two. So in these visu­al plots, we can see how the top row, which is the man­u­al scor­er, is cod­i­fy­ing the gaze, and the bot­tom score is also cod­i­fy­ing at the same time. There’s a lit­tle bit of a lag, but gen­er­al­ly they are fol­low­ing the same, they’re con­verg­ing in their gaze ori­en­ta­tion. So to quan­ti­fy this a lit­tle bit more we did a Cohen’s kap­pa com­par­i­son between the two. And we set a min­i­mum val­ue that we want­ed to accept as 0.6, which is a gen­er­al con­sen­sus for Cohen’s kap­pa scores.

Alex Fras­er:
And what we found when we look at the data among the adults is that the vast major­i­ty of par­tic­i­pants got above a Cohen’s kap­pa val­ue of 0.6. And if any­thing, a lot of the par­tic­i­pants are scor­ing well above 0.8, and almost approach­ing near-per­fect agree­ment. This is only focus­ing on the sta­t­ic frames where the tar­get is at its most extreme posi­tion, but this is still show­ing very good agree­ment. There are a cou­ple of par­tic­i­pants where we see one eye under-per­formed com­pared to the oth­er, but in gen­er­al we are doing very well. I’m look­ing at the sam­ple that Sylvia col­lect­ed before me. We see in the chil­dren, we see very sim­i­lar pat­terns of results, where the major­i­ty are get­ting very good agree­ment in both eyes, but there is a cou­ple where the agree­ment is low­er in one over the oth­er. But this being that we’re still see­ing very high agree­ment in these opti­mal con­di­tions we’re look­ing to work with.

Alex Fras­er:
So to give you a bit of a sum­ma­ry in what we hope to do with this mov­ing for­ward. Basi­cal­ly, how did we per­form? I think we did very well. There was gen­er­al­ly quite good agree­ment between the auto­mat­ic and the man­u­al scor­er in these opti­mal con­di­tions that we put down. When we look at more dif­fi­cult bits where the eye is actu­al­ly in move­ment because it’s fol­low­ing a tar­get, the per­for­mance isn’t quite as good. And we’re look­ing at how we can improve that and what ele­ments may impact the move­ment. And hope­ful­ly we can improve the algo­rithm. But it’s just basi­cal­ly just a first pass at the prob­lem in the first instance. And hope­ful­ly we’ll be able to improve this before we can give access to peo­ple in the near future.

Alex Fras­er:
But the oth­er thing I want­ed to dis­cuss is where do we fit in with­in the cur­rent online research? And we heard a lot of great work done yes­ter­day using WebGaz­er and the team work­ing with mouse view­er. And in fact, we saw Kat Ellis who showed, she man­aged to get some good results with kids with Frag­ile X, who were able to go through the cal­i­bra­tion that we were hav­ing trou­ble with. And so this is all very impres­sive, and we’re hop­ing that we can just be anoth­er resource that will fit into this new envi­ron­ment of online research. And hope­ful­ly peo­ple will be able to do good things with this in the future.

Alex Fras­er:
Yeah, thank you to every­one on the team who’s worked with us. And our pre-print is avail­able with a bit more detail about the data that I pre­sent­ed here. Feel free to email me if you have any questions.

Speak­er 2:
Thanks very much, Alex. We have got a cou­ple of ques­tions in the chats, so I’m going to ask you one of them.

Alex Fras­er:
Okay.

Speak­er 2:
But then you go down to them in the Q&A, that’d be real­ly kind. Thank you. So the first one is from Cather­ine Ellis and she says, “How still do par­tic­i­pants have to be?” So for exam­ple, if you were work­ing with chil­dren, how care­ful would you need to be about that?

Alex Fras­er:
We still to main­tain rel­a­tive­ly lit­tle move­ment, so we do need to min­i­mize move­ment as much as pos­si­ble. But because we’re cap­tur­ing all the fea­tures of the face as we are, one of the future things that we hope to be able to do is com­pen­sate for move­ment more.

Alex Fras­er:
So yeah, as I said, this is still the very ear­ly, very pre­lim­i­nary stuff. Once we can work more with the face land­marks and account­ing for move­ment, we’ll be able to estab­lish ways of com­pen­sat­ing for move­ment a lit­tle bit bet­ter. So that’s the goal in the future.

Speak­er 2:
Bril­liant. Thank you very much.

 

Get on the Waitlist

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below and sign up for our newsletter. You will be the first to know when we release new content and open applications for BeOnline 2022.

With thanks to our sponsors!

GazeScor­er — Auto­mat­ed gaze direc­tion scor­ing from videos col­lect­ed online through con­ven­tion­al webcam.