Repro­ducibil­i­ty 2.0

BeOnline Pan­el

YouTube

By load­ing the video, you agree to YouTube’s pri­va­cy pol­i­cy.
Learn more

Load video

Host: Pro­fes­sor Sophie Scott
Jo Ever­shed, Goril­la
Pro­fes­sor Uri Simon­sohn, aspre­dict­ed
Pro­fes­sor Mar­cus Manu­fo, Uni­ver­si­ty of Bris­tol (UKRN)
Dr Eka­te­ri­na Damer, Pro­lif­ic
Dr David Roth­schild, Microsoft Research

Full Tran­script:

Jo Ever­shed:
Now we’re going to move on to the final ses­sion of the day, Repro­ducibil­i­ty 2.0, and I’m going to invite Sophie Scott back, who opened our ses­sion today. Hi, Sophie. She’s going to be chair­ing our dis­cus­sion. And also here we have got Katya, David, Uri and Marcus.

Jo Ever­shed:
So from that, I’ll hand over to Sophie.

Sophie Scott:
No prob­lem at all. Thank you very much, Jo.

Sophie Scott:
So very gen­er­al­ly, we have a pan­el of experts who are going to talk about dif­fer­ent aspects of repro­ducibil­i­ty. There’s going to be a great dis­cus­sion about the future of repro­ducible sci­ence. Every­body’s going to speak for five min­utes, and we aren’t going to take ques­tions in between that, and then we’ll move on to a dis­cus­sion, and do put your ques­tions in the chat because I will bring those into the dis­cus­sion when we get to the group dis­cus­sion at the end.

Sophie Scott:
Is that okay?

Sophie Scott:
So our first speak­er, who’s going to give us a nice short pre­sen­ta­tion, is Dr. Eka­te­ri­na Damer from Pro­lif­ic. Hi Eka­te­ri­na, are you here?

Eka­te­ri­na Damer:
Yes.

Sophie Scott:
Excel­lent, over to you for your five minutes.

Eka­te­ri­na Damer:
I can­not turn on the video. Oh, here we go.

Eka­te­ri­na Damer:
Okay.

Eka­te­ri­na Damer:
Hi. I did­n’t know I’d go first, but okay, I’ll go now.

Eka­te­ri­na Damer:
All right. Hi, every­one. Today I’m going to argue that sci­ence needs rev­o­lu­tion, not reform.

Eka­te­ri­na Damer:
10 years ago the repli­ca­tion cri­sis start­ed and it was pro­pelled by a paper by Sim­mons, Nel­son, and Simon­sohn from 2011 that showed that any­thing can be pre­sent­ed as sta­tis­ti­cal­ly sig­nif­i­cant if only the sci­en­tist wants to do so. That means that sci­ence can be cheat­ed very eas­i­ly and it’s very game­able. And you can cheat on so many dif­fer­ent lev­els, from how you devel­op your the­o­ry and hypoth­e­sis, how you design your study, how you col­lect the data, how you ana­lyze the data, how you inter­pret it and how you write up and frame your paper.

Eka­te­ri­na Damer:
So how and why is this pos­si­ble? Well, the paper by Sim­mons and col­leagues showed how com­mon ques­tion­able research prac­tices are; things like P‑hacking or option­al stop­ping. But I’m actu­al­ly going to say that there’s a deep­er lay­er to this prob­lem, which is around incen­tives, fun­da­men­tal­ly because in acad­e­mia incen­tives aren’t aligned. You’re hav­ing essen­tial­ly a bro­ken sys­tem, so it’s pub­lish or per­ish. And the peo­ple who pub­lish the best are the ones who make it to the very top typically.

Eka­te­ri­na Damer:
So we can, of course, con­tin­ue build­ing. Also, there’s tools, you know, like reg­is­tered reports or infra­struc­ture like the Open Sci­ence Frame­work, or even inter­na­tion­al col­lab­o­ra­tions like the Psy­cho­log­i­cal Sci­ence Accel­er­a­tor. But as long as the sys­tem is incen­tiviz­ing the wrong behav­iors, our efforts are basi­cal­ly point­less in my opinion.

Eka­te­ri­na Damer:
So here’s my pitch. I think we need to rethink and reimag­ine acad­e­mia from scratch. We need acad­e­mia 2.0, and we need a prop­er cred­i­bil­i­ty rev­o­lu­tion. This is a term that Sime­on [Viss­er 00:03:28] from the Uni­ver­si­ty of Mel­bourne has coined.

Eka­te­ri­na Damer:
So I think the grad­ual progress that we’ve seen in the past 10 years has been way too slow. We’re wast­ing tax­pay­er mon­ey, we’re wast­ing our own time and ener­gy, and we con­tin­ue pub­lish­ing rub­bish research.

Eka­te­ri­na Damer:
So what’s the dif­fer­ence between reform and rev­o­lu­tion? A reform is typ­i­cal­ly grad­ual improve­ment, rev­o­lu­tion is a more kind of fun­da­men­tal, pro­found or sud­den change.

Eka­te­ri­na Damer:
So how do we rev­o­lu­tion­ize acad­e­mia? I’m going to bor­row some ideas from the start­up world. We need to approach it from a first-prin­ci­ples. So we need to start with the basic build­ing blocks. How do you build a sys­tem that works?

Eka­te­ri­na Damer:
So this is how I would do it if I had the choice. So for acad­e­mia 2.0, we would need, one, the right rewards and incen­tives. So for exam­ple, we’d have to offer tenure based on vig­or of research, not based on a pres­tige of the jour­nals or pub­li­ca­tions; two, we need bet­ter account­abil­i­ty and feed­back mech­a­nisms. For exam­ple, there should be a per­for­mance review process for pro­fes­sors. Oth­er­wise, they’ll become com­pla­cent and just pub­lish papers that might not even be rig­or­ous; and three, we need a much stronger and more trans­par­ent pub­lish­ing and peer review sys­tem. For exam­ple, preprints are now emerg­ing as real­ly good alter­na­tives to jour­nal arti­cles. And I also think that peer review should be paid. You know, why are so many sci­en­tists doing work for free? And it should also be a lot more transparent.

Eka­te­ri­na Damer:
Can we accom­plish all of this through reform? I don’t think so. I pre­dict that a start­up will come along in the future and rebuild acad­e­mia 2.0 from scratch. And in fact, we’re already see­ing some­thing like that in edu­ca­tion. There’s a start­up called Lamb­da School that is dis­rupt­ing the way edu­ca­tion is financed.

Eka­te­ri­na Damer:
So this is my pitch.

Sophie Scott:
Thank you very much.

Sophie Scott:
Big news for every­body who does­n’t real­ize pro­fes­sors are annu­al­ly reviewed, cer­tain­ly at UCL.

Sophie Scott:
We are now going to our next speak­er, and our next speak­er is going to be Uri Simon­sohn from Barcelona.

Sophie Scott:
Uri, do we have you?

Uri Simon­sohn:
Yes, I’m try­ing to get it right.

Sophie Scott:
Hi, Uri. Hi.

Uri Simon­sohn:
Hi. Just sort­ing out my screen.

Sophie Scott:
Over to you.

Uri Simon­sohn:
I do have some slides. Can you con­firm if you see them?

David Roth­schild:
Yes. Yeah.

Uri Simon­sohn:
Okay. Great.

Uri Simon­sohn:
So I’m Uri Simon­sohn. I’m in Barcelona and also have a foot still at Whar­ton, where I was for many years. They told me to speak, at most, five min­utes. As lit­tle as pos­si­ble. One minute would be great. So I’ll keep it short.

Uri Simon­sohn:
And this is an unusu­al pre­sen­ta­tion for me. It’s kind of like an ad for stuff that I have been doing when I’m not doing my research and we’re build­ing infra­struc­ture for con­duct­ing research in our focus on research bugs, the newest one in our set. But to give you some back­ground for the Cred­i­bil­i­ty Lab, our goal is to make it eas­i­er for peo­ple to con­duct more cred­i­ble research. And so far we have three prod­ucts, to give them a name.

Uri Simon­sohn:
AsPre­dict­ed, which is for pre-reg­is­tra­tion. To give you a sense of how com­mon pre-reg­is­tra­tion has become, we were going to have a con­fer­ence in Barcelona in 2020, but COVID got in the way, but not before we got all the sub­mis­sions, and about half the sub­mis­sions that were sent were pre-reg­is­tered. So this is about 307 sub­mis­sions of empir­i­cal work, most exper­i­ments. And about half of them were pre-reg­is­tered. Most of them were AsPredicted.

Uri Simon­sohn:
This is the aca­d­e­m­ic world that’s clos­est to me. So I sus­pect that’s why we have a high mar­ket share. I sus­pect as we go fur­ther from judg­ment deci­sion-mak­ing, OSF will be more impor­tant, grad­u­al­ly speak­ing. I don’t think this is rep­re­sen­ta­tive. But that half the sub­mis­sions are pre-reg­is­tered would have been unthink­able five years ago. And this shows the growth, since we launched AsPre­dict­ed, how many new pre-reg­is­tra­tions we’re receiv­ing per month. We’re get­ting about 2000 per month now, which is incred­i­ble. When we launched it, we decid­ed if we’d get a hun­dred a year, we would keep it running.

Uri Simon­sohn:
Our sec­ond prod­uct that we launched recent­ly is sim­ply an R pack­age that makes repro­ducible R code more easy.

Uri Simon­sohn:
So there’s this prob­lem, and I won’t talk about details now, but the prob­lem with …

Uri Simon­sohn:
Oh, do you have me? I guess my video is off. Sor­ry about that.

Uri Simon­sohn:
A prob­lem that we have with R is that the pack­ages get updat­ed, and when they get updat­ed your exist­ing code can break. So we cre­at­ed Ground­hog so that your code will run like in the movie. It will always be the day that you write down. And all you have to do to make your arti­cle repro­ducible is, instead of using the library com­mand, you now use Ground­hog Library and this pack­age will always be loaded with what was avail­able that day.

Uri Simon­sohn:
This is just a pre­am­ble to the third prod­uct, so to speak, which is Research­Box, and I’ll give you a lit­tle bit more infor­ma­tion about it. It seeks to make open research eas­i­er by mak­ing fast struc­tures stan­dard­ized and find-able.

Uri Simon­sohn:
So to give you an exam­ple of that, here’s a box. You can do this now while I’m speak­ing if you’ve got Research­Box 15. That’s a box that I cre­at­ed and it shows all the files that are avail­able in this stan­dard­ized table. And it’s also a struc­ture table. We call them it bin­go table because it resem­bles a bin­go card. And that is a — it should be easy, as in Amer­i­can Eng­lish, you say bin­go when you find some­thing, to find any­thing you’re look­ing for.

Uri Simon­sohn:
So these are all the files, and I think this will work. Can you see my brows­er as well? Do you see it?

David Roth­schild:
Yeah.

Uri Simon­sohn:
Okay. So it has instan­ta­neous pre­views and it’s very easy to nav­i­gate. So for exam­ple, let’s say you want to look at the data for study one, if you click here, it imme­di­ate­ly opens up pre­view very quick­ly. And what I think we do is that, every sin­gle data file has a code book. The web­site helps you cre­ate the code book for it. So if you want to know what each vari­able is, because for exam­ple, what does check mean? It may be hard to fig­ure out. Or what rent order means, et cetera. You can just click the book and it shows you the code book for each of the variables.

Uri Simon­sohn:
And every sin­gle dataset and Research­Box has this struc­ture for code books. You can also pre­view code in the same easy man­ner. So it’s all instan­ta­neous. And to down­load it, if you want to down­load, you can select spe­cif­ic files that you want to down­load, or you can down­load every­thing in one click. So we seek to make that as easy as possible.

Uri Simon­sohn:
So I’m return­ing to my presentation.

Uri Simon­sohn:
So in terms of the big­ger vision, this was a pitch for this new prod­uct of ours, but we were asked to think, replic­a­bil­i­ty 2.0, how is it going to be dif­fer­ent? I think the main vision that drove us cre­at­ing Research­Box in com­par­i­son to the OSF or Data­verse or oth­er solu­tions, is that right now, it’s rel­a­tive­ly easy to sort of dump your files some­where. So if you want to be open, you can just dump them and peo­ple can, if they go through an effort for it, they can find them. But I have a view that, if we make the files that we search easy enough to use, a whole lot of poten­tial opens up that right now is not real­ly being tapped.

Uri Simon­sohn:
So I’ll just give you con­crete exam­ples. Often in review­ing a paper, there has a link to open mate­ri­als or open data, but it is sight­ly so dif­fi­cult to actu­al­ly find what you’re look­ing for that it just impos­es an extra bur­den. As Katya was say­ing, we’re not being paid to be review­ers. So any­thing we can do to make it eas­i­er for them would be good. So the premise is, if we make it easy to look at the open research files, peo­ple will actu­al­ly look at them with­out any work.

Uri Simon­sohn:
Then some­times a method­ol­o­gist writes papers on how to do bet­ter analy­sis of data, and a lot of method­ol­o­gy papers rely on sim­u­lat­ed data that may or may not reflect real data, and they’re solv­ing prob­lems that real researchers may or may not be fac­ing. If it’s very easy for method­ol­o­gists to look at data and find it and see what peo­ple are doing and how they’re being allies, we believe this open­ness will lead to more rel­e­vant method­olog­i­cal work.

Uri Simon­sohn:
Anoth­er thing is, if you think about all the effort that goes into gen­er­at­ing data, for it to just be stored some­where and nev­er used again, if it’s easy to find … For exam­ple, on Research­Box, you can search datasets by vari­able descrip­tions. So you can eas­i­ly find any data that uses hap­pi­ness or reac­tion time or a par­tic­u­lar stim­uli. You can look for code. Any post­ed box that has a par­tic­u­lar pack­age or a func­tion with­in the pack­age, you could find it and use it. This should give more val­ue to all the work that we’re producing.

Uri Simon­sohn:
If you are build­ing on exist­ing work, noth­ing beats being able to eas­i­ly repro­duce or see­ing the kind of data they got, the mate­ri­als they have.

Uri Simon­sohn:
And last, but not least, in terms of learn­ing how to run par­tic­u­lar research stud­ies or how to ana­lyze data [inaudi­ble 00:13:11], if every­thing is very easy to find … Imag­ine you’re try­ing to learn a new func­tion that you don’t know how to use it. You can find a paper that’s rel­e­vant to you and you can just search for that func­tion to find it. We believe it’s going to dra­mat­i­cal­ly increase the ben­e­fits of mak­ing things public.

Uri Simon­sohn:
And in my assess­ment, it’s of course a bias, I’m [inaudi­ble 00:13:30] to believe Research­Box is bet­ter, but I believe no exist­ing plat­form allows for any of this poten­tial to be mate­ri­al­ized. All we have is just some­what dif­fi­cult to obtain stored files. And we’re hop­ing that this new plat­form will reach the poten­tial that we believe open sci­ence has.

Uri Simon­sohn:
And that’s my presentation.

Sophie Scott:
Thank you very much. Thank you, Uri.

Sophie Scott:
Now, our next speak­er is David Roth­schild from Microsoft Research. Have we got you, David?

David Roth­schild:
Yep. Can you guys hear me?

Sophie Scott:
I can indeed. Over to you.

David Roth­schild:
Okay. So what I’m going to talk about is anoth­er con­cept of replic­a­bil­i­ty, which is think­ing about this in terms of exter­nal validity.

David Roth­schild:
So one thing is to be able to repli­cate some­thing in the lab­o­ra­to­ry area or, one thinks, to be able to repli­cate it with what­ev­er tools you’re using; anoth­er thing is, does it actu­al­ly repli­cate in the out­come space that you care about?

David Roth­schild:
And I’ll start with a warn­ing. So this is from a paper that’s cur­rent­ly under review, where we just sim­ply asked peo­ple in a bunch of these var­i­ous tools in which peo­ple get respon­dents, “How much do the respon­dents actu­al­ly use these var­i­ous ser­vices? Fre­quen­cy of hour spent on these types of sites?” And you’re going to see, this is a lot of time respond­ing. And of course, Pro­lif­ic and MTurk and CR is a fil­tered ver­sion of Mechan­i­cal Turk. These are places where peo­ple are going to kind of work and do tasks. In ear­li­er work, we’d also ask this on Qualtrics and oth­er online pan­els where you still have the major­i­ty of peo­ple spend­ing sev­er­al hours per week answer­ing ques­tions in vari­ance audiences.

David Roth­schild:
And so that should kind of base­line affect your under­stand­ing of the [inaudi­ble 00:15:22] and under­stand­ing of var­i­ous respon­dents that may be used in online lab­o­ra­to­ry exper­i­ments or sur­veys in order to bet­ter under­stand the world.

David Roth­schild:
And I can move for­ward into kind of my main area of work. I’ll just kind of jump through dif­fer­ent hoops on it. One is ad effec­tive­ness. And this is from a paper that is going to be sent out soon. I apol­o­gize. I had to, at the last moment, cov­er the names of the com­pa­nies that are in there. But the point is, we’re look­ing at ad effec­tive­ness in 50 brands. Pret­ty big brands, but not crazy.

David Roth­schild:
And if you just go to the far left pan­el, main brand impres­sions per house­hold. So what we noticed is, we were doing obser­va­tion­al work here, fol­lowed up by some lab­o­ra­to­ry work, and that on our obser­va­tion­al data, the medi­an brand was hit­ting about 39 to 40 times for any house­hold in our study. So we’re look­ing at how much any indi­vid­ual ad is affect­ing peo­ple, but the aver­age house­hold was get­ting hit by 39 ads from a giv­en brand. And so that’s a lot of base­line expo­sure if you’re try­ing to look at the mar­gin­al impact of a giv­en ad.

David Roth­schild:
Work that we’re doing right now in pol­i­tics, we’re look­ing at this kind of para­dox about the inef­fec­tive­ness of adver­tis­ing in some ways. But what we’re going to report soon is that the aver­age Amer­i­can, even though they con­sume very lit­tle news, is get­ting hit by more earned media. So even just con­tin­u­ing on TV, they’re get­ting hit by more actu­al just news than they are on ads. A lot more. And so if you’re wor­ried about the mar­gin­al effect, you have to wor­ry about where peo­ple are com­ing from originally.

David Roth­schild:
And this real­ly plays a lot into some real­ly inter­est­ing research around vac­cine hes­i­tan­cy. Again, a lot of lab­o­ra­to­ry exper­i­ments in how you can shift peo­ple’s minds on tak­ing vac­cines, but you have to put it into the con­text of the mas­sive amount of under­stand­ing and thought peo­ple have already put into the process.

David Roth­schild:
I’ll jump quick­ly into ques­tions around mar­ket design, which I spend a lot of time on as well. This is a very com­mon ques­tion. It’s ask­ing for com­pe­tence inter­vals. A very heav­i­ly repli­cat­ed result that peo­ple are over­con­fi­dent when they’re answer­ing this type of ques­tion where they’re try­ing to get their 80% range of when some­thing’s going to hap­pen. Work­ing with Dan Gold­stein and oth­ers, we cre­at­ed a lot of user inter­faces that can essen­tial­ly just elim­i­nate that type of error. Peo­ple are even able to repro­duce crazy dis­tri­b­u­tions of num­bers they saw and things like that.

David Roth­schild:
But then the ques­tion is, who’s right and who’s wrong? We don’t real­ly know here. It depends on think­ing a lot about, under what con­text do we care about this type of inter­ac­tion or this type of thought process about con­fi­dence? Sure­ly, if the type of work we’re doing is mak­ing peo­ple [inaudi­ble 00:18:08]-

David Roth­schild:
… Help­ing with the user inter­face. And at the end of the day, we looked at a sam­ple ques­tion, which said, “How are peo­ple mak­ing mon­ey effec­tive­ly?” And basi­cal­ly, the short of it is, is that peo­ple who under­stood the user inter­face and actu­al­ly made the most cost-effec­tive trade for any trade they were doing were mak­ing more mon­ey than oth­er peo­ple, because the vast major­i­ty of peo­ple weren’t even doing the trade prop­er­ly. They were below the 45 degree line. It means here that they are, at any giv­en time, pur­chas­ing the exact same asset for more mon­ey than they could have if they ful­ly under­stood the interface.

David Roth­schild:
We test these things in lab, we taught peo­ple too well. Peo­ple are much quick­er and lazier, busier in the real world, and actu­al­ly con­tin­ue to make mis­takes, which we were able to prove over and over again were effec­tive in a lab­o­ra­to­ry setting.

David Roth­schild:
I’ll jump quick­ly to pub­lic opin­ion. This is a real­ly cool table on a paper by Jon Kros­nick. And what I real­ly love about it is that this kind of shows in the oppo­site direc­tion. He used this table, specif­i­cal­ly the fourth line of data, to show, “Hey, the errors on these non-prob­a­bil­i­ty inter­net sam­ples are one to two per­cent­age points high­er than if you use prob­a­bil­i­ty samples.”

David Roth­schild:
So this is say­ing, if you go on the inter­net and you get peo­ple com­ing from pan­els, you’re not going to get as accu­rate in describ­ing basi­cal­ly cen­sus data as you do on telephones.

David Roth­schild:
Inter­est­ing, mean­ing­ful maybe. But the point being is that, one to two per­cent­age points is actu­al­ly not that bad for a lot of things peo­ple care about. And actu­al­ly, it’s well with­in the range that most peo­ple would accept for some­thing that was a lot cheaper.

David Roth­schild:
And there’s a lot of ques­tions that go into pub­lic opin­ion where there’s no under­ly­ing val­ue. So it’s very tricky to kind of under­stand what a few per­cent­age points real­ly mean, espe­cial­ly when you start kind of com­par­ing. Even a bunch of real­ly hard­core ground truth has a lot of error when it comes to our dif­fer­ences, when it comes to sen­ti­ment. And so if you move away from ques­tions like, “What’s your age, gen­der, mar­riage sta­tus?”, to ques­tions of what you care about in pub­lic pol­i­cy, well, peo­ple don’t have very sta­ble opin­ions. And so, to under­stand what that means in an exter­nal­ly valid state is super tricky.

David Roth­schild:
We know that peo­ple love infra­struc­ture and basi­cal­ly the demo­c­ra­t­ic agen­da. We know that the vote’s a lot tighter than that when peo­ple go out and vote. And so it’s a ques­tion of, maybe if they were answer­ing this ques­tion truth­ful­ly, maybe we actu­al­ly have a very replic­a­ble thing, but maybe it does­n’t trans­late for var­i­ous rea­sons that we’re still try­ing to learn.

David Roth­schild:
And I’ll leave you with one more thought, which is that there’s a lot of stud­ies about the effect of news, and peo­ple want to under­stand how much does being treat­ed with news affect your base­line pub­lic opin­ion? And we can repli­cate over and over and over again peo­ple say­ing that some­where around 35 to 40% of peo­ple say they’re reg­u­lar Fox News view­ers, but what the bot­tom line here shows is that, num­ber one, about 14% of peo­ple ever con­sume Fox News for a six minute spell in a giv­en month.

David Roth­schild:
And so we can repli­cate in polling and lab­o­ra­to­ry peo­ple claim­ing to be Fox News view­ers, but they’re actu­al­ly not if you look at the data. And so it caus­es all sorts of ques­tions about, what does it mean when peo­ple who watch Fox News are dif­fer­ent than peo­ple who don’t watch Fox News? And we can go into that lat­er, but the point being is some ques­tions are just too hard and so we need to use pas­sive data. But also they ful­ly repli­cate over and over again. It does­n’t make them nec­es­sar­i­ly true.

David Roth­schild:
Thank you.

Sophie Scott:
Thank you very much. And I have to say, thank you for bring­ing your top bow tie game to this talk.

David Roth­schild:
I thought it was a for­mal presentation.

Sophie Scott:
It’s very good. I like it.

Sophie Scott:
And now our next speak­er is Mar­cus Munafo from the Uni­ver­si­ty of Bristol.

Sophie Scott:
Mar­cus, have we got you?

Mar­cus Munafò:
We have, hope­ful­ly. I hope you can hear me all right and see my slides.

Mar­cus Munafò:
Thanks for the invi­ta­tion to speak. Thanks, Sophie, for intro­duc­ing me.

Mar­cus Munafò:
There’s a lot going on at the moment in the UK but nation­al­ly around research cul­ture and research incen­tives. And Katya spoke to that. We don’t have to think too hard to see the ways in which our cul­ture is, in many ways, very old in terms of the ways in which we work and the ten­sion that that’s cre­at­ing in terms of the incen­tives that exist and shape our behavior.

Mar­cus Munafò:
For exam­ple, the way in which we dis­sem­i­nate knowl­edge via jour­nal arti­cles is still pred­i­cat­ed on paper being expen­sive. All of those con­straints that you can meet that mean you can only have so many words and tables and fig­ures and so on are all a ves­tige of that way of dis­sem­i­nat­ing knowl­edge on dead trees.

Mar­cus Munafò:
So it’s an impor­tant ques­tion to ask, how can we do bet­ter? How can we improve the cul­ture with­in which we work, the incen­tives that shape our behav­ior, and as a result, the qual­i­ty of the work that we produce?

Mar­cus Munafò:
And that last com­ment that David made I think is worth bear­ing in mind. Replic­a­ble does not mean that we get the right answer. We can end up with very large stud­ies that give us very pre­cise and very pre­cise­ly wrong answers to our under­ly­ing ques­tion if we’re not care­ful. So we need to think about more than just replicability.

Mar­cus Munafò:
Katya men­tioned this study, which is very impor­tant in terms of flex­i­bil­i­ty that we have in our stud­ies and the extent to which we can lever­age that flex­i­bil­i­ty to gen­er­ate a spu­ri­ous find­ing, and can be led astray by our own cog­ni­tive bias­es because we want to find something.

Mar­cus Munafò:
But that’s not news. In 1988, Richard Peto pub­lished a tri­al of aspirin and the pro­tec­tive effects on heart dis­ease. He was asked by review­ers to include a post-hoc sub­group analy­sis, and he said, “I’ll only do it if I can add my own post-hoc sub­group analy­sis to demon­strate the ease with which you can gen­er­ate spu­ri­ous results if you do that.” And with his own sub­group analy­sis showed that, if you’re born under Capri­corn, the effect of aspirin on heart dis­ease risk is much more ben­e­fi­cial. In oth­er words, he used astro­log­i­cal sci­ence for his sub­group analy­sis to demon­strate how rolling the dice mul­ti­ple times gets you the wrong answer.

Mar­cus Munafò:
I think the real mes­sage of this paper is that, when you read some­thing in the pub­lished lit­er­a­ture, you have no way of know­ing whether you’re read­ing a full account of every­thing that hap­pened, which is the full abstract that’s shown here, or a redact­ed, curat­ed sto­ry­telling ver­sion that’s intend­ed to sell the paper, which is the com­pact ver­sion shown in bold. You sim­ply don’t know, because part of our cul­ture leads us to a mod­el of research which relies on trust; trust­ing indi­vid­ual researchers to give a full and com­plete account of every­thing that they did. And what we need to move toward, which many oth­ers have said, includ­ing Sime­on Viss­er, but also David Spiegel­hal­ter at Cam­bridge, is that we need to build a sys­tem, a process that is inher­ent­ly trust­wor­thy rather than one that relies on trust in indi­vid­u­als, because peo­ple are fal­li­ble, have their own cog­ni­tive bias­es, their behav­ior is shaped by the incen­tive struc­tures that we work with­in. We’re all human, in oth­er words.

Mar­cus Munafò:
So how can we intro­duce approach­es to work­ing that cre­ate a trust­wor­thy sys­tem? One insight, which was tak­en by Edwards Dem­ing, the sta­tis­ti­cian to the Japan­ese auto­mo­bile indus­try in the 1970s, is that if you intro­duce qual­i­ty con­trol checks through­out a process, then of course you cre­ate high qual­i­ty out­puts. The Japan­ese auto­mo­bile indus­try start­ed pro­duc­ing reli­able cars for the first time, because pre­vi­ous­ly cars were unre­li­able, and dom­i­nat­ed the mar­ket, and still has a rep­u­ta­tion for reli­a­bil­i­ty today.

Mar­cus Munafò:
So the anal­o­gy is that we pro­duce sci­en­tif­ic papers to be fixed lat­er, a bit like the auto­mo­bile indus­try in the US in the 1970s pro­duced cars to be fixed lat­er; the era of the lemon, the irre­deemably bad­ly-built car. But the less intu­itive insight that Dem­ing’s had was that, yes, if you focus on qual­i­ty through­out a process, you pro­duce high­er qual­i­ty out­puts at the end of that process, but you also improve effi­cien­cy because you’re not invest­ing time fix­ing cars that broke down lat­er, or cor­rect­ing claimed find­ings that turn out to be false. So if we want to advance knowl­edge effi­cient­ly, if we want to trans­fer that knowl­edge to soci­etal impact and trans­late it into soci­etal impact more rapid­ly, we need to focus on quality.

Mar­cus Munafò:
So how can we improve qual­i­ty? The prob­lem that we have is that we have an inter­con­nect­ed sys­tem with lots of stake­hold­ers, each of which have a part to play. So this arti­cle describes some of the threats to the sci­en­tif­ic process, some of the poten­tial changes to that process that could improve the way in which we work and improve the qual­i­ty of what we pro­duce, but this requires the coor­di­na­tion of jour­nals, fun­ders, insti­tu­tions, and researchers themselves.

Mar­cus Munafò:
One area where researchers can do a great deal is by work­ing more trans­par­ent­ly, by mak­ing as much of their research process avail­able to scruti­ny as pos­si­ble. To cre­ate more exter­nal qual­i­ty con­trol, peo­ple will spot the mis­takes that we make in our work, because we will make mis­takes in any human endeav­or. We’ve all read jour­nal arti­cles that have been read by mul­ti­ple authors, that have been through peer review, that have been read by an edi­tor, that have been copy edit­ed, that had been proof-read, and they still have typos in them. So our code will, and our data will, and we need to cre­ate process­es that allow those hon­est errors to be caught and cor­rect­ed to improve our qual­i­ty and to improve our efficiency.

Mar­cus Munafò:
Exter­nal check­ing is part of that, but by mak­ing our work­flows avail­able for scruti­ny, in the knowl­edge that oth­ers may check our work, that also cre­ates an incen­tive for greater inter­nal qual­i­ty con­trol. In oth­er words, you check your data set four or five times before you’d post it rather than two or three times, because you don’t want some­one to spot an error.

Mar­cus Munafò:
But that coor­di­na­tion is key. You can have fun­ders that are man­dat­ing data shar­ing, for exam­ple, and you can have researchers that want to do that, but you need the incen­tives to moti­vate them to do it, because not all researchers will be equal­ly moti­vat­ed, and you need the infra­struc­ture to sup­port it. So that’s why we’ve set up the UK Repro­ducibil­i­ty Net­work, which is a peer-led orga­ni­za­tion that has, at its base, local net­works of researchers, self-orga­niz­ing groups of researchers, moti­vat­ed to engage with these issues, but we also have insti­tu­tions that have joined.

Mar­cus Munafò:
So in the UK at the moment, we have 57 local net­works at dif­fer­ent insti­tu­tions. We have 20 insti­tu­tions them­selves that have joined, work­ing at a dif­fer­ent lev­el, work­ing at the lev­el of things like pro­mo­tion and hir­ing cri­te­ria, and how you can use those to incen­tivize open research prac­tices and oth­er changes that we might want to incen­tivize. And then we have the exter­nal stake­hold­ers, the fun­ders, the pub­lish­ers, the learn­ing soci­eties, the pro­fes­sion­al bod­ies and the oth­er sec­tor orga­ni­za­tions, because we need to be cre­at­ing those link­ages and mak­ing sure that we coor­di­nate our efforts across and between those dif­fer­ent levels.

Mar­cus Munafò:
And of course, this is an inter­na­tion­al effort. The sci­ence is glob­al. And so we’re now start­ing to see repro­ducibil­i­ty net­works mod­eled on that same struc­ture, which gives that flex­i­bil­i­ty to tai­lor solu­tions local­ly while still coor­di­nat­ing across and between lev­els. We’re start­ing to see these emerge in oth­er coun­tries. So we have them in Aus­tralia, Ger­many, Switzer­land, and Slo­va­kia at the moment, sev­er­al oth­er coun­tries that are inter­est­ed in devel­op­ing their own repro­ducibil­i­ty net­works. And so that coor­di­na­tion can then extend to a glob­al scale.

Sophie Scott:
Thank you very much, Marcus.

Sophie Scott:
And now we are over to our last speak­er, who’s Jo; now not shar­ing a ses­sion, but giv­ing us a talk.

Sophie Scott:
Hi, Jo.

Jo Ever­shed:
Hel­lo. Thank you. Can you see my screen?

Sophie Scott:
Yes.

Jo Ever­shed:
Yes. Great.

Jo Ever­shed:
Hi, I’m Jo Ever­shed from Goril­la, and we help behav­ioral sci­en­tists cre­ate and host online exper­i­ments quick­ly and easily.

Jo Ever­shed:
I want to talk about where this jour­ney towards bet­ter repro­ducibil­i­ty leads. As the Cheshire cat says, “If you don’t know where you’re going, it does­n’t much mat­ter which way you go.” And I think it does very much matter.

Jo Ever­shed:
Repro­ducibil­i­ty of find­ings is impor­tant to ensure that sci­ence is robust. I believe we’ve come a long way in the last 10 years. We’ve ana­lyzed the prob­lem, we’ve pro­posed and devel­oped and test­ed sev­er­al solu­tions, and now what’s left is to get the incen­tives right and imple­ment these solu­tions. From that per­spec­tive, repro­ducibil­i­ty is increas­ing­ly a solved prob­lem. Although there is still plen­ty of work to be done to imple­ment the solution.

Jo Ever­shed:
As an exam­ple, when I was a stu­dent 10 years ago, under­pow­ered stud­ies were the norm. Online behav­ioral research has changed all that. With Goril­la Open Mate­ri­als, you can read a paper, access their pro­to­col and clone it for repli­ca­tion in just three mouse clicks. And with the Recruit­ment Sur­face, you can launch it online and get a rep­re­sen­ta­tive sam­ple of 500 peo­ple in a lunch break. And with Research­Box, you can then store the data analy­sis for pos­ter­i­ty. That’s dra­mat­ic progress. I’m not sure exact­ly where we are on this jour­ney, but I think we can see the path ahead.

Jo Ever­shed:
But repro­ducibil­i­ty is not every­thing. It’s only part of the jour­ney. So what’s the ulti­mate des­ti­na­tion? In oth­er words, what’s at the top of the moun­tain? Know­ing what we’re aim­ing for will ensure we get the right process­es and safe­guards in place.

Jo Ever­shed:
So I think this is the full jour­ney. We want our sci­ence to be repro­ducible, so that our find­ings are robust; and then gen­er­al­iz­able so that we’re con­fi­dent that they work in nov­el con­text; and then impact­ful, so they’re con­fi­dent that they can be used in the real world.

Jo Ever­shed:
And then we have the prize: evi­dence-based prod­ucts and ser­vices of the future. I’d like to see the behav­ioral sci­ences inform­ing the prod­ucts and ser­vices of the future and improv­ing lives. At Goril­la, we talk about how the behav­ioral sci­ences can be lever­aged to improve health, wealth, hap­pi­ness, and edu­ca­tion. For me, that’s what’s at the top of the moun­tain. And to do this, we need more path­ways out of acad­e­mia and into innovation.

Jo Ever­shed:
For every­one in the audi­ence that can see a prob­lem in the world with a behav­ioral solu­tion, we want to give you the tools to research that prob­lem, devel­op and test that solu­tion, and take that with you out of acad­e­mia and make it happen.

Jo Ever­shed:
Per­haps sur­pris­ing­ly, behav­ioral sci­ence aca­d­e­mics make ide­al entre­pre­neurs. We under­stand human behav­ior, we’re good with num­bers, we’re tech savvy. A lot of the skills devel­oped in acad­e­mia trans­late well to star­tups. Pitch­ing for grant fund­ing isn’t that dif­fer­ent to pitch­ing for invest­ment, run­ning a lab isn’t that dif­fer­ent to run­ning a start­up; in both, you’re impro­vis­ing and exper­i­ment­ing and think­ing deeply about what works. And as we’ve seen from today, aca­d­e­mics are great for stand­ing up in front of a crowd and shar­ing what they know.

Jo Ever­shed:
But entre­pre­neurs have the added bonus of being able to cre­ate a sus­tain­able fund­ing mod­el. And as a case in point, mov­ing into entre­pre­neur­ship is what me and Katya from Pro­lif­ic did instinctively.

Jo Ever­shed:
So we’re try­ing to help build this pipeline so that the impact sec­tion of your grant pro­pos­al isn’t the end, but the spring­board for the next step of pro­duc­tiz­ing your research. Right now, we stop at the third cir­cle and then go back to the start, but we’ve come all this way. Why are we stop­ping now?

Jo Ever­shed:
At Goril­la, we’ve been devel­op­ing new tools to make it eas­i­er to design and test prod­ucts and ser­vices of the future by mak­ing them more eco­log­i­cal­ly valid. With Game Builder, we hope to inspire a gen­er­a­tion of edu­ca­tion and devel­op­ment researchers to cre­ate games that can be used in class­rooms to train and devel­op stu­dents, and with Shop Builder, we’re giv­ing researchers the tools need­ed to nudge behav­ior. These tools are designed to enable much more roll­out ready find­ings so that the leap to the fourth cir­cle here is smaller.

Jo Ever­shed:
I’m sad­dened when I see prod­ucts or poli­cies based on ide­olo­gies or per­son­al opin­ion, but that’s what risks hap­pen­ing if we leave a void. Instead, we could cre­ate path­ways from acad­e­mia to indus­try so we can take our find­ings and roll them out respon­si­bly. That way we can ensure that the sci­ence isn’t cor­rupt­ed as it’s trans­lat­ed into practice.

Jo Ever­shed:
One excit­ing idea that I want to leave you with is that prod­uct devel­op­ment does­n’t stop once you launch it. Imag­ine what you can find out when you have 2000 peo­ple using your prod­uct every day. What about with 200,000? Imag­ine the pow­er of that study. Imag­ine how much more robust your find­ings will be when you can run micro exper­i­ments every day and see what works.

Jo Ever­shed:
Once tak­en to mar­ket, an edu­ca­tion­al [maths 00:34:20] game, like Diana’s game that we saw yes­ter­day, could and should con­tin­ue to ana­lyze play­er learn­ing and behav­ior to fur­ther improve the game. This sort of fly­wheel, where sci­en­tif­ic advance­ment for the bet­ter­ment of soci­ety is self-fund­ing, will mean bet­ter prod­ucts and ser­vices. It’s a dif­fer­ent way of dis­sem­i­nat­ing knowl­edge than print­ing it on bits of dead wood. And over the last two days, we’ve heard lots of ideas that could turn into prod­ucts or ser­vices, from games to life-sav­ing interventions.

Jo Ever­shed:
To me, this is what it would mean to be a behav­ioral sci­en­tist in indus­try and to work on prod­ucts with a strong evi­dence base. Yes, there are com­plex­i­ties around how to use end user data respon­si­bly, but I believe we can get there.

Jo Ever­shed:
Many of us want to have a pos­i­tive impact on the world and leave a lega­cy. Giv­en that so many of the chal­lenges that face soci­ety are behav­ioral, you’re ide­al­ly posi­tioned to do this. We’re mak­ing the tools to make that pos­si­ble, but we need you to want it and dream it and to then go out into the world and do it.

Sophie Scott:
Thank you very much, Jo. And thank you to all the speakers.

Sophie Scott:
We’ve now got about 25 min­utes for ques­tions and dis­cus­sion. So if you have any­thing that you’d like to say, please put it in the Q&A and I will bring those to the panel.

Sophie Scott:
In the mean­time, I sup­pose I’d like to ask the oth­er mem­bers of the pan­el, what did you think about Jo’s point about, it’s not just where we’ve come from and where we are, but where we’re going to? Could I start with you, Katya?

Eka­te­ri­na Damer:
I could not agree more. I don’t real­ly know what else to say. I mean, we should be think­ing about the future, right? What do we want to see in the future? Yeah, I just could­n’t agree more basically.

Sophie Scott:
Thank you.

Sophie Scott:
David?

David Roth­schild:
Well, I think it touch­es on what a lot of peo­ple have been talk­ing about, which is the evo­lu­tion of aca­d­e­m­ic pub­lish­ing. And we’ve talked a lot on the repli­ca­tion side about the data that goes in.

David Roth­schild:
I think one thing that was not not­ed enough and I think is worth not­ing is there’s an incred­i­ble cost on researchers in order to reach the repli­ca­tion stan­dard that we want to do, and there’s also a lot of ques­tions about pri­va­cy and data that just sim­ply can’t be shared for var­i­ous rea­sons. And I think it’s a real­ly cost­ly time for those of us who are try­ing to meet the stan­dards because we’re in a tran­si­tion phase. And to Jo’s point and oth­er’s points, I real­ly look towards star­tups. I look towards peo­ple who can mon­e­tize in many ways in order to make it a more effi­cient sys­tem so that there’s more ver­ti­cal inte­gra­tion and that the costs aren’t so incred­i­bly high, because it is a super frus­trat­ing expe­ri­ence for many of us.

David Roth­schild:
And on the out­put side, I look for­ward to the move­ment past dead trees. You know, we have been work­ing on build­ing HTML ver­sions of a lot of our papers, live ver­sions where data con­tin­ues to flow, easy repli­ca­tion, but again, super, super cost­ly. Com­ing from an aca­d­e­m­ic lab, we’re not going to design the soft­ware that’s ulti­mate­ly going to take over, and so we’re build­ing exper­i­men­ta­tion on it, again, in order to look towards indus­try to kind of get that right in order to low­er the costs to make it happen.

David Roth­schild:
And I just want to give a shout out to espe­cial­ly the younger researchers who are just slammed because of this tran­si­tion peri­od. It’s tough. It’s tough to make repli­ca­tion stan­dards on a lot of papers, and I know that there’s a lot of late nights in order to do it all right, but then to also make it so that it reach­es the stan­dards which we all want to be at.

Sophie Scott:
Thank you.

Sophie Scott:
Uri, did you have any comments?

Uri Simon­sohn:
No. Actu­al­ly I want­ed to ask just a quick fol­low up to David. When you say you’re exper­i­ment­ing with the HTML papers, is it just for your own papers, not for a plat­form for [crosstalk 00:38:18]?

David Roth­schild:
Yeah. That’s cor­rect. So work­ing out of the lab with Dun­can [Watts 00:38:25] at Uni­ver­si­ty of Penn­syl­va­nia, we’re putting up new dash­boards and we’re tak­ing every one of our papers and try­ing to build kind of HTML ver­sions, where we can take the kind of flat charts that are in the paper and build inter­ac­tive charts where you can click on them and you can see var­i­ous sub­groups, et cetera, et cetera.

David Roth­schild:
And so that for data which is still flow­ing in, and this is espe­cial­ly true for obser­va­tion­al data, that the charts could just con­tin­ue to grow. So if we did a study peri­od from 2016 to 2020 and the data is still flow­ing in, we can actu­al­ly make it so that there’s a set ver­sion of it, but then there’s also a live dash­board ver­sion of it that just con­tin­ues to grow and peo­ple can make their own com­ments and kind of expand from there.

David Roth­schild:
But the idea is to make those. Obvi­ous­ly we’re hir­ing up and hir­ing data engi­neers, but it’s a process.

Sophie Scott:
Thank you.

Sophie Scott:
Mar­cus, did you have any thoughts about Jo’s … ?

Mar­cus Munafò:
Just to agree. I think if you want to get any­where, you have to have a clear sense of where you’re going, and if we were to design from scratch a sec­tor that would deliv­er the things that we ask acad­e­mia to deliv­er, I’m not sure it would look exact­ly like what we have at the moment. And that means that we then need to con­front some of the chal­lenges. Like the fact that we have an over­pro­duc­tion prob­lem, I would say, in acad­e­mia, that we don’t resource what we do well enough. So the whole busi­ness mod­el of acad­e­mia is sort of pred­i­cat­ed on the assump­tion that aca­d­e­mics will work evenings and weekends.

Mar­cus Munafò:
Those are the things that we real­ly need to grap­ple with when we think about what we want the future to look like. And then we can map a path from where we are to where we want to be.

Sophie Scott:
Thank you.

Sophie Scott:
There’s a ques­tion that’s come in on the chat, which I thought was quite inter­est­ing. And it was some­thing that’s kind of been at the back of my mind through a lot of dis­cus­sions around this area.

Sophie Scott:
So there’s all very excel­lent ideas about how to sort of improve many dif­fer­ent aspects of what we do in sci­ence, but this is from Katya [inaudi­ble 00:40:17], I’m sor­ry, I man­gled your name there, but she’s wor­ried about where the the­o­ry is going in this, where the­o­ry sits. And it is, if we place every­thing on rig­or, we lose a lot or poten­tial­ly could lose a lot in terms of … You know, you made the point Mar­cus about what it is we’re actu­al­ly doing, and it was there in David’s talk as well. So I think, where do you see that sit­ting with­in this and how can we main­tain the same stan­dards of rig­or around the­o­riz­ing and mod­el­ing around our work?

Eka­te­ri­na Damer:
I could com­ment on that.

Sophie Scott:
Please do.

Eka­te­ri­na Damer:
I would say it’s all about col­lab­o­ra­tion and rep­re­sen­ta­tion. If you only have one per­son who’s a thought leader and they get a pres­ti­gious award at a con­fer­ence, I don’t think that’s going to get us any­where. What we need is groups of peo­ple that are diverse work­ing togeth­er in devel­op­ing the­o­ries, and it needs to be a process where they debate and rec­on­cile. And I’m not sure I see this suf­fi­cient­ly. There are still all sorts of pro­fes­sors and senior tenured aca­d­e­mics who just push their pet the­o­ries. I think it’s just total­ly insuf­fi­cient. That’s my perspective.

Jo Ever­shed:
I have a thought on this. It’s not a real­ly well put togeth­er thought, so I’d love to hear what oth­er peo­ple take from this. But I was struck by a sep­a­ra­tion in physics between the­o­ret­i­cal physics and exper­i­men­tal physicists.

Jo Ever­shed:
So they have exper­i­men­tal­ists going out and col­lect­ing data and mak­ing that data avail­able, and these are peo­ple who are experts at design­ing stud­ies and get­ting real­ly inter­est­ing data, and then you have the­o­ret­i­cal physi­cists who come and take those data sets and look at it, and then think deeply about the the­o­ries and see what might fit these data sets. That’s as much as I under­stand the physics. And I rather won­der whether we need to see some of that prac­tice come to behav­ioral sci­ence so that we can look at both sides and be in con­ver­sa­tion with each oth­er, because maybe it’s got too much for one per­son to be able to do all of it, but maybe it does all need to still be in some­body’s head.

Jo Ever­shed:
So it’s real­ly a ques­tion. It was a thought.

Sophie Scott:
I’m going to jump in here and pre­tend to be a pan­elist, not just a chair, then I’ll shut up.

Jo Ever­shed:
Please do.

Sophie Scott:
Even design­ing your study, you’ll make the­o­ret­i­cal assump­tions. Every­thing we’ve heard about the­o­ry was baked in, even if peo­ple don’t know that that was a the­o­ry they were work­ing from. So I think it’s not that easy to sep­a­rate it. When you’re run­ning an exper­i­ment, you are the­o­riz­ing, you’re work­ing from the the­o­ry behind it. And actu­al­ly, more dis­cus­sion of that rather than less is some­thing that I would like to see.

Sophie Scott:
Any­way, we’re not here to hear from me, but would any­body else like to [crosstalk 00:42:55]?

Jo Ever­shed:
No, I think that’s exact­ly what the physi­cists do. It means you’ve got two groups of peo­ple work­ing on it togeth­er and pass­ing it round between them­selves. But yeah, it might be that it still all needs to be in one per­son­’s head at the moment. I don’t know. But yeah …

Sophie Scott:
One of the things I’m always struck by in psy­chol­o­gy and cog­ni­tive neu­ro­science is how lit­tle we spell out our assump­tions when we’re design­ing a study. Actu­al­ly the degree to which we are work­ing from a com­plete­ly unspo­ken set of assump­tions, which are pure the­o­ry, is a real issue.

Sophie Scott:
Any­way, enough from me. Any­body else want to talk about the­o­ry? Come and shut me up.

David Roth­schild:
[crosstalk 00:43:39], which is, and this’ll be a nod to Uri, and AsPre­dict­ed and pre-reg­is­tra­tion, which is that, I’ve noticed in my career, going from a point where things were very expen­sive right when I start­ed out, peo­ple were still using labs, and my advi­sor real­ly made me write out every­thing that I was plan­ning to do and write out all the tables for syn­thet­ic data and have it all per­fect, being like, “You got one shot because this is going to blow your entire budget.”

David Roth­schild:
Then we moved into this time where it’s kind of fast and loose and every­thing felt so cheap, MTurk and oth­er things. You were just run­ning stuff and then you’ll see what hap­pens, and you’re run­ning things.

David Roth­schild:
And then you get to this point now with pre-reg­is­tra­tion. To me, the p‑hacking ques­tion aside, the key thing is it gives me a real­ly good way to teach my grad­u­ate stu­dents to write down the hypoth­e­sis, fill out those tables again. And that makes the main point in pre-reg­is­tra­tion, beyond the p‑hacking part, is to actu­al­ly lay out the the­o­ry and lay out your assump­tions, your hypoth­e­sis in a very clean and clear way so that it pro­vides a nice iter­a­tion. And maybe some of the stuff makes it into the paper, maybe some of it doesn’t.

David Roth­schild:
And to Jo’s point, I’d also say that I’m an econ­o­mist. Peo­ple still love the­o­ry over there, it still dom­i­nates. So I’m not wor­ried about the­o­ry dis­ap­pear­ing from papers, as much as the the­o­ry of peo­ple talk­ing to the empir­i­cal peo­ple. That remains a prob­lem, but there’s still a lot of folks out there in acad­e­mia who love their theories.

Eka­te­ri­na Damer:
Psy­chol­o­gy too. Loves their the­o­ries. Every­one loves their the­o­ries. I know.

Uri Simon­sohn:
So to what David was say­ing, this is actu­al­ly a ser­vice that we have done. It’s a dif­fer­ent group of researchers ask­ing for ben­e­fits of peo­ple who do pre-registration.

Uri Simon­sohn:
The num­ber one ben­e­fit they men­tion, some­thing like 75% of researchers, is that it allows them to think ahead bet­ter to the research they’re going to be doing, even more than any sort of open sci­ence or lack of p‑hacking.

Uri Simon­sohn:
And anoth­er thought on that is, there is a con­cern that there’s a fetishism with method­ol­o­gy in this cred­i­bil­i­ty con­cern, that we just want to repro­duce spe­cif­ic exper­i­ments that may not speak to a gen­er­al theory.

Uri Simon­sohn:
And I think there’s some truth to that con­cern, but the flip side of that is that, if you want to test a the­o­ry that you may fal­si­fy, noth­ing beats a pre-reg­is­tra­tion that clear­ly stip­u­lates that pre­dic­tion comes from the­o­ry. And then when you get a null effect, it’s a lot eas­i­er to per­suade peo­ple that it’s worth pub­lish­ing if it was both pre-reg­is­tered and pre­dict­ed by a clear theory.

Uri Simon­sohn:
So even though there’s some com­pe­ti­tion between the­o­ry and sort of the fetisha­tion of effects, there’s also some synergy.

Eka­te­ri­na Damer:
This reminds me, what you just said, what’s the goal of the­o­ry? Is it expla­na­tion or pre­dic­tion? That’s maybe a ques­tion that’s rel­e­vant. I’d say pre­dic­tion in the end.

Uri Simon­sohn:
Me too.

Mar­cus Munafò:
Sophie, I think you make an impor­tant point that there’s a dis­tinc­tion between for­mal the­o­ry and infor­mal the­o­ry. And a lot of our the­o­riz­ing is rel­a­tive­ly infor­mal, I think, and we cer­tain­ly could spell that out a bit bet­ter, includ­ing stuff that you may or may not think of as the­o­ry, just are sort of the assump­tions that we bring to our work.

Mar­cus Munafò:
And actu­al­ly, you know, most of us here, I think, do quan­ti­ta­tive work. One thing I’ve learned is that every dis­ci­pline does some­thing well, and one area where qual­i­ta­tive research maybe has some­thing to bring is that they real­ly embrace the sub­jec­tiv­i­ty that we bring to our research. And we all do that because our ques­tions come from somewhere.

Mar­cus Munafò:
I think the­o­ry and obser­va­tion go hand in hand. You build the­o­ries on robust obser­va­tions and then you use the­o­ry to make pre­dic­tions that you then test. So you iterate.

Mar­cus Munafò:
You know, a lot of the his­to­ry of physics is just the ever more pre­cise esti­ma­tion of spe­cif­ic para­me­ters. A lot of med­ical research is still effec­tive­ly serendip­i­tous. A lot of the things that the­o­ries were built around-

Eka­te­ri­na Damer:
Real­ly?

Mar­cus Munafò:
Oh, absolute­ly.

Eka­te­ri­na Damer:
Why?

Mar­cus Munafò:
Because it’s messy, because actu­al­ly it’s incred­i­bly hard to the­o­rize about bio­log­i­cal sys­tems because they’re so incred­i­bly com­plex and noisy. And most stuff, we find by chance.

Mar­cus Munafò:
We give a drug to some peo­ple and it’s intend­ed for one thing, but the side effect means we end up with Via­gra. So serendip­i­ty is a huge part of how we progress, and we need to also just rec­og­nize that, I think. And when we have obser­va­tions that are very pre­cise­ly wrong, peo­ple build edi­fices on the back of that, and we end up with a sit­u­a­tion that we’ve had with HDL and LDL cho­les­terol, for exam­ple, or the evi­dence that a small amount of alco­hol con­sump­tion is good for you. Those find­ings are almost cer­tain­ly wrong.

Eka­te­ri­na Damer:
No, I dis­agree. I want to be con­trary now. I think you’re very cyn­i­cal. I think there are plen­ty of very good clin­i­cal tri­als. I mean, just look at-

Mar­cus Munafò:
Oh, the tri­als are good. No, the tri­als are excel­lent. The tri­als are to a real­ly high stan­dard, but what you put into the tri­al or how you arrive at the com­pound is often just by chance. Peo­ple did­n’t theorize-

Eka­te­ri­na Damer:
Oh, I see.

Mar­cus Munafò:
… To get to Viagra.

Eka­te­ri­na Damer:
That’s nor­mal. But that’s nor­mal. It’s nor­mal. That’s science.

Mar­cus Munafò:
That’s a great exam­ple. It was a side effect. So a real­ly good tri­al mon­i­tor­ing side effects showed a sig­nal for some­thing unex­pect­ed, that then went into its own tri­al and it was shown to do this oth­er thing as well. The tri­als were real­ly robust, but arriv­ing at that was serendipity.

Eka­te­ri­na Damer:
But that is part of sci­ence. Sci­ence is [crosstalk 00:48:58].

Mar­cus Munafò:
True. Exact­ly. So we don’t want to obsess about one or the oth­er. We just need to rec­og­nize that there’s this kind of dance between real­ly good obser­va­tions, real­ly good the­o­ry, and mov­ing them for­ward together.

Eka­te­ri­na Damer:
So basi­cal­ly, both explorato­ry and con­fir­ma­to­ry. That’s what Bri­an Nosek always talks about.

Mar­cus Munafò:
And I think one of the prob­lems we have with our cul­ture at the moment is we have to pre­tend every­thing was con­fir­ma­to­ry. We have to tell a sto­ry as if we hypoth­e­sized, which is the great kind of satir­i­cal arti­cle about, you know, why all psy­chol­o­gists must have extra sen­so­ry per­cep­tion or pre-cog­ni­tion, because every­thing we hypoth­e­size turns out to be true. 95% of pub­lished arti­cles in psy­chol­o­gy show what we claimed they would show. Some of that is pub­li­ca­tion bias, but there are oth­er things at play. And actu­al­ly, being clear about, you know, “This is explorato­ry research. I’m not going to put a P val­ue on it. I’m just going to describe what I saw,” I think there’s a place for that.

Sophie Scott:
And actu­al­ly, this goes back to Jo’s point of … I remem­ber years ago when I used to teach for The Open Uni­ver­si­ty, they had a real­ly good piece of mate­ri­als. This is an Open Uni­ver­si­ty, any­one could sort of sign up to do a degree. And it was just an inter­view with Don­ald Broad­bent, who was sort of the guy who real­ly got selec­tive atten­tion stud­ies going in the UK. And all of his orig­i­nal work was done from real­ly applied work with the air force, look­ing at peo­ple … The orig­i­nal head­phones would play one mes­sage into one ear and the oth­er mes­sage to anoth­er ear, and peo­ple kept crash­ing their planes. And Broad­bent start­ed look­ing at this, and a great deal of high­ly influ­en­tial and high­ly replic­a­ble work on atten­tion stemmed from that.

Sophie Scott:
And he said it has to be, “You can take your prob­lems from the real world, you can feed that back into your the­o­ry, and then ide­al­ly you take it back out into the real world.”

Sophie Scott:
It’s actu­al­ly some­thing that, you know, it spins back out to where Jo was suggesting.

Sophie Scott:
We’ve got a few more minutes.

Jo Ever­shed:
Can I ask-

Sophie Scott:
Sor­ry. Go on.

Jo Ever­shed:
Can I ask Mar­cus a ques­tion on that spin­ning it out, because why are things not going from acad­e­mia and into indus­tries? Is there a gap in the fund­ing mod­el? Because there’s fund­ing for the dis­cov­ery, and there seems to be fund­ing by UKRI for the entre­pre­neur­ial bit, but the roll up, the scale up bits … We see so many of our clients who have got some­thing that’s near prod­uct ready, but it needs that tran­si­tion­al ele­ment. Is there some­thing missing?

Mar­cus Munafò:
What kind of inter­ac­tion are you talk­ing about specif­i­cal­ly? Are you talk­ing about aca­d­e­mics going into indus­try or are you talk­ing about indus­try com­ing to academia?

Jo Ever­shed:
It’s aca­d­e­mics going into indus­try. So aca­d­e­mics who maybe have designed and test­ed and run an RCT to cre­ate an inter­ven­tion that works in edu­ca­tion. And they’re like, “Well, I’ve done it and I can write my paper and it works. What do I do now?”

Mar­cus Munafò:
Yeah. I mean, that’s a big ques­tion. I think part of it is that our cul­ture leads us to a mod­el where we’re sort of appren­tic­ing under­grad­u­ates, then PhD stu­dents through to becom­ing pro­fes­sors. And most of them won’t, but you nev­er think it’s going to be you that falls off the Ponzi scheme, if you like. And so we define suc­cess in those terms, which then makes peo­ple feel very uncom­fort­able about doing some­thing oth­er than the thing that we’ve defined as suc­cess. I think that’s one of the prob­lems. We need to have a much rich­er vision of what suc­cess looks like, if you have aca­d­e­m­ic skills, research skills and want to pur­sue those.

Mar­cus Munafò:
So I think part of it is that, and snob­bery. I think part of it is the feel­ing that, if you have made it with­in acad­e­mia and you step too far out, it’s kind of a one-way door. It’s very hard to come back in again, which puts peo­ple off. I think we need to, again, think about ways in which we can cre­ate more of a revolv­ing door with peo­ple from indus­try com­ing into acad­e­mia, peo­ple from acad­e­mia going out into indus­try, and back again, so that we exchange knowl­edge in that way.

Mar­cus Munafò:
And I think there’s also, again, you could think of it as a kind of snob­bery thing, to some extent, there are hier­ar­chies every­where and one of the hier­ar­chies is between more fun­da­men­tal research ver­sus more applied research. And in every field that kind of hier­ar­chy exists, I think.

Mar­cus Munafò:
So I think there are lots of rea­sons, but it’s cer­tain­ly some­thing that we could do a lot bet­ter. And one of the things that’s hap­pen­ing in the UK that I think it’s healthy is that there’s this peo­ple and cul­ture strat­e­gy that’s devel­op­ing, linked into all of this oth­er activ­i­ty around research cul­ture and so on, which is hap­pen­ing at a gov­ern­ment lev­el. And I think that’s going to look at many of these things.

Mar­cus Munafò:
One of the chal­lenges is that aca­d­e­mics often don’t appre­ci­ate the stan­dard that’s required in indus­try, cer­tain­ly in the phar­ma­ceu­ti­cal indus­try. And I think the phar­ma­ceu­ti­cal indus­try is a real­ly inter­est­ing micro­cosm of incen­tives when you go from the dis­cov­ery end, where there’s much more of an incen­tive to get the right answer, and the mar­ket­ing end, where there’s a huge sunk cost bias.

Mar­cus Munafò:
So that’s not real­ly answer­ing your ques­tion, except to say it’s com­pli­cat­ed, there are lots of bits to it, and I think each one of those bits is impor­tant and deserves some attention.

Jo Ever­shed:
No. And I thought what was real­ly inter­est­ing that you said was how dif­fer­ent indus­tries do it, because if I’m right, doc­tors in the UK quite often have dual prac­tice and aca­d­e­m­ic posi­tions, right? They do have one foot in each camp. And that mas­sive­ly helps the trans­fer of inno­va­tion from the­o­ry into prac­tice and the oth­er way. And maybe that’s some of what we’re miss­ing in the behav­ioral sci­ences and edu­ca­tion, is these dual appoint­ments where you’re expect­ed to do both.

Mar­cus Munafò:
And you are start­ing to see those. There’s one at Oxford, for exam­ple. Again, it’s more in the sort of bio­med­ical space, but I think that’s right. I think that could be [inaudi­ble 00:54:34].

David Roth­schild:
I was going to add that I think, on a good note, that over the last few years there’s been a mas­sive growth in real sci­ence being done in the tech indus­try, when just 10 or 15 years ago, there real­ly weren’t options for rea­son­able kind of aca­d­e­m­ic stan­dard, or at least aca­d­e­m­ic style research, out­side of acad­e­mia. Peo­ple were mov­ing away into places like con­sult­ing, et cetera, et cetera. That’s open­ing up.

David Roth­schild:
I think Mar­cus’ point, which I think is super inter­est­ing, I had­n’t heard it put this way, is the revolv­ing door prob­lem; being able to go back and forth. It real­ly is a one-way street though. There’s real­ly no ques­tion about that.

David Roth­schild:
And I think part of it is peo­ple’s in acad­e­mi­a’s mis­un­der­stand­ing of the type of research that can be done, the lack of trans­paren­cy in the research that’s done, but a part of it has to do with sim­ply, at least in the US, the hir­ing process is still con­strained with inside depart­ments. And so, even when acad­e­mia rec­og­nizes that inter­dis­ci­pli­nary and prac­ti­cal types of peo­ple are extreme­ly use­ful to the uni­ver­si­ty or schools, there aren’t lines avail­able. And I hope to see with the more empha­sis on kind of com­pu­ta­tion­al social sci­ence in gen­er­al, which cuts across depart­ments and oth­er types of kind of new­er infor­ma­tion schools, there may be more of that coming.

David Roth­schild:
But that seems to be the main con­straint. You have both these con­straints fight­ing against each oth­er, but def­i­nite­ly this main con­straint of hir­ing lies with­in the depart­ments real­ly con­strain who can be hired back into academia.

Eka­te­ri­na Damer:
There’s one more point I want to make before we run out of time.

Eka­te­ri­na Damer:
Some­body touched upon, I don’t remem­ber who it was, some­thing about prof­itabil­i­ty and fund­ing. I’m just going to share some­thing that I don’t think I’ve shared pub­licly before.

Eka­te­ri­na Damer:
In the very ear­ly days of Pro­lif­ic, that was about sev­en years ago, I active­ly thought about, “Should I make it a for-prof­it or a non-prof­it?” And after some think­ing, I was very sure that I want­ed to be a for-prof­it orga­ni­za­tion, because I nev­er ever want to be beg­ging fun­ders for mon­ey. This is why aca­d­e­mics have a prob­lem. They have to keep beg­ging for mon­ey and apply­ing for grants, and it takes for­ev­er and it leads nowhere. And the deci­sion process is arbi­trary any­way. It’s not even a good process.

Eka­te­ri­na Damer:
So I’d rather be for-prof­it and then put the right checks and bal­ances in place, hav­ing the right kind of board of direc­tors or what­ev­er, and then invest our own rev­enue back into build­ing infra­struc­ture. That seems like a much more pru­dent strat­e­gy than hav­ing to beg The Arnold Foun­da­tion, which is the foun­da­tion that’s putting a lot of mon­ey into the Open Sci­ence Framework.

Sophie Scott:
Thank you.

Sophie Scott:
I think we’re going to need to start to wrap up. So I just want to say thank you to all the speak­ers. I thought that was a real­ly inter­est­ing ses­sion and a lot of dif­fer­ent points to go back into. In fact, I’m going to go find the record­ing of this on YouTube and watch it again.

Sophie Scott:
So I’d just like to fin­ish by say­ing thank you to Uri, to Katya, to Mar­cus, to David, and I’m going to say thank you to Jo, and hand over to Jo, because I think we’re going to wrap up.

 

Jo Evershed

CEO and cofounder of Gorilla Experiment Builder

Jo Evershed loves providing behavioural scientists with tools to liberate their work from the lab and accelerate the creation of evidence-tested interventions. She holds a BSc in Psychology from UCL and a BSc in Combined Studies (Economics and Business) from Oxford Brookes and is an Innovate UK Women in Innovation Award Winner.

Get on the Waitlist

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below and sign up for our newsletter. You will be the first to know when we release new content and open applications for BeOnline 2022.

With thanks to our sponsors!

Repro­ducibil­i­ty 2.0