Reproducibility 2.0

BeOnline Panel

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Always unblock YouTube

Host: Professor Sophie Scott
Jo Evershed, Gorilla
Professor Uri Simonsohn, aspredicted
Professor Marcus Manufo, University of Bristol (UKRN)
Dr Ekaterina Damer, Prolific
Dr David Rothschild, Microsoft Research

Full Transcript:

Jo Evershed:
Now we’re going to move on to the final session of the day, Reproducibility 2.0, and I’m going to invite Sophie Scott back, who opened our session today. Hi, Sophie. She’s going to be chairing our discussion. And also here we have got Katya, David, Uri and Marcus.

Jo Evershed:
So from that, I’ll hand over to Sophie.

Sophie Scott:
No problem at all. Thank you very much, Jo.

Sophie Scott:
So very generally, we have a panel of experts who are going to talk about different aspects of reproducibility. There’s going to be a great discussion about the future of reproducible science. Everybody’s going to speak for five minutes, and we aren’t going to take questions in between that, and then we’ll move on to a discussion, and do put your questions in the chat because I will bring those into the discussion when we get to the group discussion at the end.

Sophie Scott:
Is that okay?

Sophie Scott:
So our first speaker, who’s going to give us a nice short presentation, is Dr. Ekaterina Damer from Prolific. Hi Ekaterina, are you here?

Ekaterina Damer:
Yes.

Sophie Scott:
Excellent, over to you for your five minutes.

Ekaterina Damer:
I cannot turn on the video. Oh, here we go.

Ekaterina Damer:
Okay.

Ekaterina Damer:
Hi. I didn’t know I’d go first, but okay, I’ll go now.

Ekaterina Damer:
All right. Hi, everyone. Today I’m going to argue that science needs revolution, not reform.

Ekaterina Damer:
10 years ago the replication crisis started and it was propelled by a paper by Simmons, Nelson, and Simonsohn from 2011 that showed that anything can be presented as statistically significant if only the scientist wants to do so. That means that science can be cheated very easily and it’s very gameable. And you can cheat on so many different levels, from how you develop your theory and hypothesis, how you design your study, how you collect the data, how you analyze the data, how you interpret it and how you write up and frame your paper.

Ekaterina Damer:
So how and why is this possible? Well, the paper by Simmons and colleagues showed how common questionable research practices are; things like P‑hacking or optional stopping. But I’m actually going to say that there’s a deeper layer to this problem, which is around incentives, fundamentally because in academia incentives aren’t aligned. You’re having essentially a broken system, so it’s publish or perish. And the people who publish the best are the ones who make it to the very top typically.

Ekaterina Damer:
So we can, of course, continue building. Also, there’s tools, you know, like registered reports or infrastructure like the Open Science Framework, or even international collaborations like the Psychological Science Accelerator. But as long as the system is incentivizing the wrong behaviors, our efforts are basically pointless in my opinion.

Ekaterina Damer:
So here’s my pitch. I think we need to rethink and reimagine academia from scratch. We need academia 2.0, and we need a proper credibility revolution. This is a term that Simeon [Visser 00:03:28] from the University of Melbourne has coined.

Ekaterina Damer:
So I think the gradual progress that we’ve seen in the past 10 years has been way too slow. We’re wasting taxpayer money, we’re wasting our own time and energy, and we continue publishing rubbish research.

Ekaterina Damer:
So what’s the difference between reform and revolution? A reform is typically gradual improvement, revolution is a more kind of fundamental, profound or sudden change.

Ekaterina Damer:
So how do we revolutionize academia? I’m going to borrow some ideas from the startup world. We need to approach it from a first-principles. So we need to start with the basic building blocks. How do you build a system that works?

Ekaterina Damer:
So this is how I would do it if I had the choice. So for academia 2.0, we would need, one, the right rewards and incentives. So for example, we’d have to offer tenure based on vigor of research, not based on a prestige of the journals or publications; two, we need better accountability and feedback mechanisms. For example, there should be a performance review process for professors. Otherwise, they’ll become complacent and just publish papers that might not even be rigorous; and three, we need a much stronger and more transparent publishing and peer review system. For example, preprints are now emerging as really good alternatives to journal articles. And I also think that peer review should be paid. You know, why are so many scientists doing work for free? And it should also be a lot more transparent.

Ekaterina Damer:
Can we accomplish all of this through reform? I don’t think so. I predict that a startup will come along in the future and rebuild academia 2.0 from scratch. And in fact, we’re already seeing something like that in education. There’s a startup called Lambda School that is disrupting the way education is financed.

Ekaterina Damer:
So this is my pitch.

Sophie Scott:
Thank you very much.

Sophie Scott:
Big news for everybody who doesn’t realize professors are annually reviewed, certainly at UCL.

Sophie Scott:
We are now going to our next speaker, and our next speaker is going to be Uri Simonsohn from Barcelona.

Sophie Scott:
Uri, do we have you?

Uri Simonsohn:
Yes, I’m trying to get it right.

Sophie Scott:
Hi, Uri. Hi.

Uri Simonsohn:
Hi. Just sorting out my screen.

Sophie Scott:
Over to you.

Uri Simonsohn:
I do have some slides. Can you confirm if you see them?

David Rothschild:
Yes. Yeah.

Uri Simonsohn:
Okay. Great.

Uri Simonsohn:
So I’m Uri Simonsohn. I’m in Barcelona and also have a foot still at Wharton, where I was for many years. They told me to speak, at most, five minutes. As little as possible. One minute would be great. So I’ll keep it short.

Uri Simonsohn:
And this is an unusual presentation for me. It’s kind of like an ad for stuff that I have been doing when I’m not doing my research and we’re building infrastructure for conducting research in our focus on research bugs, the newest one in our set. But to give you some background for the Credibility Lab, our goal is to make it easier for people to conduct more credible research. And so far we have three products, to give them a name.

Uri Simonsohn:
AsPredicted, which is for pre-registration. To give you a sense of how common pre-registration has become, we were going to have a conference in Barcelona in 2020, but COVID got in the way, but not before we got all the submissions, and about half the submissions that were sent were pre-registered. So this is about 307 submissions of empirical work, most experiments. And about half of them were pre-registered. Most of them were AsPredicted.

Uri Simonsohn:
This is the academic world that’s closest to me. So I suspect that’s why we have a high market share. I suspect as we go further from judgment decision-making, OSF will be more important, gradually speaking. I don’t think this is representative. But that half the submissions are pre-registered would have been unthinkable five years ago. And this shows the growth, since we launched AsPredicted, how many new pre-registrations we’re receiving per month. We’re getting about 2000 per month now, which is incredible. When we launched it, we decided if we’d get a hundred a year, we would keep it running.

Uri Simonsohn:
Our second product that we launched recently is simply an R package that makes reproducible R code more easy.

Uri Simonsohn:
So there’s this problem, and I won’t talk about details now, but the problem with …

Uri Simonsohn:
Oh, do you have me? I guess my video is off. Sorry about that.

Uri Simonsohn:
A problem that we have with R is that the packages get updated, and when they get updated your existing code can break. So we created Groundhog so that your code will run like in the movie. It will always be the day that you write down. And all you have to do to make your article reproducible is, instead of using the library command, you now use Groundhog Library and this package will always be loaded with what was available that day.

Uri Simonsohn:
This is just a preamble to the third product, so to speak, which is ResearchBox, and I’ll give you a little bit more information about it. It seeks to make open research easier by making fast structures standardized and find-able.

Uri Simonsohn:
So to give you an example of that, here’s a box. You can do this now while I’m speaking if you’ve got ResearchBox 15. That’s a box that I created and it shows all the files that are available in this standardized table. And it’s also a structure table. We call them it bingo table because it resembles a bingo card. And that is a — it should be easy, as in American English, you say bingo when you find something, to find anything you’re looking for.

Uri Simonsohn:
So these are all the files, and I think this will work. Can you see my browser as well? Do you see it?

David Rothschild:
Yeah.

Uri Simonsohn:
Okay. So it has instantaneous previews and it’s very easy to navigate. So for example, let’s say you want to look at the data for study one, if you click here, it immediately opens up preview very quickly. And what I think we do is that, every single data file has a code book. The website helps you create the code book for it. So if you want to know what each variable is, because for example, what does check mean? It may be hard to figure out. Or what rent order means, et cetera. You can just click the book and it shows you the code book for each of the variables.

Uri Simonsohn:
And every single dataset and ResearchBox has this structure for code books. You can also preview code in the same easy manner. So it’s all instantaneous. And to download it, if you want to download, you can select specific files that you want to download, or you can download everything in one click. So we seek to make that as easy as possible.

Uri Simonsohn:
So I’m returning to my presentation.

Uri Simonsohn:
So in terms of the bigger vision, this was a pitch for this new product of ours, but we were asked to think, replicability 2.0, how is it going to be different? I think the main vision that drove us creating ResearchBox in comparison to the OSF or Dataverse or other solutions, is that right now, it’s relatively easy to sort of dump your files somewhere. So if you want to be open, you can just dump them and people can, if they go through an effort for it, they can find them. But I have a view that, if we make the files that we search easy enough to use, a whole lot of potential opens up that right now is not really being tapped.

Uri Simonsohn:
So I’ll just give you concrete examples. Often in reviewing a paper, there has a link to open materials or open data, but it is sightly so difficult to actually find what you’re looking for that it just imposes an extra burden. As Katya was saying, we’re not being paid to be reviewers. So anything we can do to make it easier for them would be good. So the premise is, if we make it easy to look at the open research files, people will actually look at them without any work.

Uri Simonsohn:
Then sometimes a methodologist writes papers on how to do better analysis of data, and a lot of methodology papers rely on simulated data that may or may not reflect real data, and they’re solving problems that real researchers may or may not be facing. If it’s very easy for methodologists to look at data and find it and see what people are doing and how they’re being allies, we believe this openness will lead to more relevant methodological work.

Uri Simonsohn:
Another thing is, if you think about all the effort that goes into generating data, for it to just be stored somewhere and never used again, if it’s easy to find … For example, on ResearchBox, you can search datasets by variable descriptions. So you can easily find any data that uses happiness or reaction time or a particular stimuli. You can look for code. Any posted box that has a particular package or a function within the package, you could find it and use it. This should give more value to all the work that we’re producing.

Uri Simonsohn:
If you are building on existing work, nothing beats being able to easily reproduce or seeing the kind of data they got, the materials they have.

Uri Simonsohn:
And last, but not least, in terms of learning how to run particular research studies or how to analyze data [inaudible 00:13:11], if everything is very easy to find … Imagine you’re trying to learn a new function that you don’t know how to use it. You can find a paper that’s relevant to you and you can just search for that function to find it. We believe it’s going to dramatically increase the benefits of making things public.

Uri Simonsohn:
And in my assessment, it’s of course a bias, I’m [inaudible 00:13:30] to believe ResearchBox is better, but I believe no existing platform allows for any of this potential to be materialized. All we have is just somewhat difficult to obtain stored files. And we’re hoping that this new platform will reach the potential that we believe open science has.

Uri Simonsohn:
And that’s my presentation.

Sophie Scott:
Thank you very much. Thank you, Uri.

Sophie Scott:
Now, our next speaker is David Rothschild from Microsoft Research. Have we got you, David?

David Rothschild:
Yep. Can you guys hear me?

Sophie Scott:
I can indeed. Over to you.

David Rothschild:
Okay. So what I’m going to talk about is another concept of replicability, which is thinking about this in terms of external validity.

David Rothschild:
So one thing is to be able to replicate something in the laboratory area or, one thinks, to be able to replicate it with whatever tools you’re using; another thing is, does it actually replicate in the outcome space that you care about?

David Rothschild:
And I’ll start with a warning. So this is from a paper that’s currently under review, where we just simply asked people in a bunch of these various tools in which people get respondents, “How much do the respondents actually use these various services? Frequency of hour spent on these types of sites?” And you’re going to see, this is a lot of time responding. And of course, Prolific and MTurk and CR is a filtered version of Mechanical Turk. These are places where people are going to kind of work and do tasks. In earlier work, we’d also ask this on Qualtrics and other online panels where you still have the majority of people spending several hours per week answering questions in variance audiences.

David Rothschild:
And so that should kind of baseline affect your understanding of the [inaudible 00:15:22] and understanding of various respondents that may be used in online laboratory experiments or surveys in order to better understand the world.

David Rothschild:
And I can move forward into kind of my main area of work. I’ll just kind of jump through different hoops on it. One is ad effectiveness. And this is from a paper that is going to be sent out soon. I apologize. I had to, at the last moment, cover the names of the companies that are in there. But the point is, we’re looking at ad effectiveness in 50 brands. Pretty big brands, but not crazy.

David Rothschild:
And if you just go to the far left panel, main brand impressions per household. So what we noticed is, we were doing observational work here, followed up by some laboratory work, and that on our observational data, the median brand was hitting about 39 to 40 times for any household in our study. So we’re looking at how much any individual ad is affecting people, but the average household was getting hit by 39 ads from a given brand. And so that’s a lot of baseline exposure if you’re trying to look at the marginal impact of a given ad.

David Rothschild:
Work that we’re doing right now in politics, we’re looking at this kind of paradox about the ineffectiveness of advertising in some ways. But what we’re going to report soon is that the average American, even though they consume very little news, is getting hit by more earned media. So even just continuing on TV, they’re getting hit by more actual just news than they are on ads. A lot more. And so if you’re worried about the marginal effect, you have to worry about where people are coming from originally.

David Rothschild:
And this really plays a lot into some really interesting research around vaccine hesitancy. Again, a lot of laboratory experiments in how you can shift people’s minds on taking vaccines, but you have to put it into the context of the massive amount of understanding and thought people have already put into the process.

David Rothschild:
I’ll jump quickly into questions around market design, which I spend a lot of time on as well. This is a very common question. It’s asking for competence intervals. A very heavily replicated result that people are overconfident when they’re answering this type of question where they’re trying to get their 80% range of when something’s going to happen. Working with Dan Goldstein and others, we created a lot of user interfaces that can essentially just eliminate that type of error. People are even able to reproduce crazy distributions of numbers they saw and things like that.

David Rothschild:
But then the question is, who’s right and who’s wrong? We don’t really know here. It depends on thinking a lot about, under what context do we care about this type of interaction or this type of thought process about confidence? Surely, if the type of work we’re doing is making people [inaudible 00:18:08]-

David Rothschild:
… Helping with the user interface. And at the end of the day, we looked at a sample question, which said, “How are people making money effectively?” And basically, the short of it is, is that people who understood the user interface and actually made the most cost-effective trade for any trade they were doing were making more money than other people, because the vast majority of people weren’t even doing the trade properly. They were below the 45 degree line. It means here that they are, at any given time, purchasing the exact same asset for more money than they could have if they fully understood the interface.

David Rothschild:
We test these things in lab, we taught people too well. People are much quicker and lazier, busier in the real world, and actually continue to make mistakes, which we were able to prove over and over again were effective in a laboratory setting.

David Rothschild:
I’ll jump quickly to public opinion. This is a really cool table on a paper by Jon Krosnick. And what I really love about it is that this kind of shows in the opposite direction. He used this table, specifically the fourth line of data, to show, “Hey, the errors on these non-probability internet samples are one to two percentage points higher than if you use probability samples.”

David Rothschild:
So this is saying, if you go on the internet and you get people coming from panels, you’re not going to get as accurate in describing basically census data as you do on telephones.

David Rothschild:
Interesting, meaningful maybe. But the point being is that, one to two percentage points is actually not that bad for a lot of things people care about. And actually, it’s well within the range that most people would accept for something that was a lot cheaper.

David Rothschild:
And there’s a lot of questions that go into public opinion where there’s no underlying value. So it’s very tricky to kind of understand what a few percentage points really mean, especially when you start kind of comparing. Even a bunch of really hardcore ground truth has a lot of error when it comes to our differences, when it comes to sentiment. And so if you move away from questions like, “What’s your age, gender, marriage status?”, to questions of what you care about in public policy, well, people don’t have very stable opinions. And so, to understand what that means in an externally valid state is super tricky.

David Rothschild:
We know that people love infrastructure and basically the democratic agenda. We know that the vote’s a lot tighter than that when people go out and vote. And so it’s a question of, maybe if they were answering this question truthfully, maybe we actually have a very replicable thing, but maybe it doesn’t translate for various reasons that we’re still trying to learn.

David Rothschild:
And I’ll leave you with one more thought, which is that there’s a lot of studies about the effect of news, and people want to understand how much does being treated with news affect your baseline public opinion? And we can replicate over and over and over again people saying that somewhere around 35 to 40% of people say they’re regular Fox News viewers, but what the bottom line here shows is that, number one, about 14% of people ever consume Fox News for a six minute spell in a given month.

David Rothschild:
And so we can replicate in polling and laboratory people claiming to be Fox News viewers, but they’re actually not if you look at the data. And so it causes all sorts of questions about, what does it mean when people who watch Fox News are different than people who don’t watch Fox News? And we can go into that later, but the point being is some questions are just too hard and so we need to use passive data. But also they fully replicate over and over again. It doesn’t make them necessarily true.

David Rothschild:
Thank you.

Sophie Scott:
Thank you very much. And I have to say, thank you for bringing your top bow tie game to this talk.

David Rothschild:
I thought it was a formal presentation.

Sophie Scott:
It’s very good. I like it.

Sophie Scott:
And now our next speaker is Marcus Munafo from the University of Bristol.

Sophie Scott:
Marcus, have we got you?

Marcus Munafò:
We have, hopefully. I hope you can hear me all right and see my slides.

Marcus Munafò:
Thanks for the invitation to speak. Thanks, Sophie, for introducing me.

Marcus Munafò:
There’s a lot going on at the moment in the UK but nationally around research culture and research incentives. And Katya spoke to that. We don’t have to think too hard to see the ways in which our culture is, in many ways, very old in terms of the ways in which we work and the tension that that’s creating in terms of the incentives that exist and shape our behavior.

Marcus Munafò:
For example, the way in which we disseminate knowledge via journal articles is still predicated on paper being expensive. All of those constraints that you can meet that mean you can only have so many words and tables and figures and so on are all a vestige of that way of disseminating knowledge on dead trees.

Marcus Munafò:
So it’s an important question to ask, how can we do better? How can we improve the culture within which we work, the incentives that shape our behavior, and as a result, the quality of the work that we produce?

Marcus Munafò:
And that last comment that David made I think is worth bearing in mind. Replicable does not mean that we get the right answer. We can end up with very large studies that give us very precise and very precisely wrong answers to our underlying question if we’re not careful. So we need to think about more than just replicability.

Marcus Munafò:
Katya mentioned this study, which is very important in terms of flexibility that we have in our studies and the extent to which we can leverage that flexibility to generate a spurious finding, and can be led astray by our own cognitive biases because we want to find something.

Marcus Munafò:
But that’s not news. In 1988, Richard Peto published a trial of aspirin and the protective effects on heart disease. He was asked by reviewers to include a post-hoc subgroup analysis, and he said, “I’ll only do it if I can add my own post-hoc subgroup analysis to demonstrate the ease with which you can generate spurious results if you do that.” And with his own subgroup analysis showed that, if you’re born under Capricorn, the effect of aspirin on heart disease risk is much more beneficial. In other words, he used astrological science for his subgroup analysis to demonstrate how rolling the dice multiple times gets you the wrong answer.

Marcus Munafò:
I think the real message of this paper is that, when you read something in the published literature, you have no way of knowing whether you’re reading a full account of everything that happened, which is the full abstract that’s shown here, or a redacted, curated storytelling version that’s intended to sell the paper, which is the compact version shown in bold. You simply don’t know, because part of our culture leads us to a model of research which relies on trust; trusting individual researchers to give a full and complete account of everything that they did. And what we need to move toward, which many others have said, including Simeon Visser, but also David Spiegelhalter at Cambridge, is that we need to build a system, a process that is inherently trustworthy rather than one that relies on trust in individuals, because people are fallible, have their own cognitive biases, their behavior is shaped by the incentive structures that we work within. We’re all human, in other words.

Marcus Munafò:
So how can we introduce approaches to working that create a trustworthy system? One insight, which was taken by Edwards Deming, the statistician to the Japanese automobile industry in the 1970s, is that if you introduce quality control checks throughout a process, then of course you create high quality outputs. The Japanese automobile industry started producing reliable cars for the first time, because previously cars were unreliable, and dominated the market, and still has a reputation for reliability today.

Marcus Munafò:
So the analogy is that we produce scientific papers to be fixed later, a bit like the automobile industry in the US in the 1970s produced cars to be fixed later; the era of the lemon, the irredeemably badly-built car. But the less intuitive insight that Deming’s had was that, yes, if you focus on quality throughout a process, you produce higher quality outputs at the end of that process, but you also improve efficiency because you’re not investing time fixing cars that broke down later, or correcting claimed findings that turn out to be false. So if we want to advance knowledge efficiently, if we want to transfer that knowledge to societal impact and translate it into societal impact more rapidly, we need to focus on quality.

Marcus Munafò:
So how can we improve quality? The problem that we have is that we have an interconnected system with lots of stakeholders, each of which have a part to play. So this article describes some of the threats to the scientific process, some of the potential changes to that process that could improve the way in which we work and improve the quality of what we produce, but this requires the coordination of journals, funders, institutions, and researchers themselves.

Marcus Munafò:
One area where researchers can do a great deal is by working more transparently, by making as much of their research process available to scrutiny as possible. To create more external quality control, people will spot the mistakes that we make in our work, because we will make mistakes in any human endeavor. We’ve all read journal articles that have been read by multiple authors, that have been through peer review, that have been read by an editor, that have been copy edited, that had been proof-read, and they still have typos in them. So our code will, and our data will, and we need to create processes that allow those honest errors to be caught and corrected to improve our quality and to improve our efficiency.

Marcus Munafò:
External checking is part of that, but by making our workflows available for scrutiny, in the knowledge that others may check our work, that also creates an incentive for greater internal quality control. In other words, you check your data set four or five times before you’d post it rather than two or three times, because you don’t want someone to spot an error.

Marcus Munafò:
But that coordination is key. You can have funders that are mandating data sharing, for example, and you can have researchers that want to do that, but you need the incentives to motivate them to do it, because not all researchers will be equally motivated, and you need the infrastructure to support it. So that’s why we’ve set up the UK Reproducibility Network, which is a peer-led organization that has, at its base, local networks of researchers, self-organizing groups of researchers, motivated to engage with these issues, but we also have institutions that have joined.

Marcus Munafò:
So in the UK at the moment, we have 57 local networks at different institutions. We have 20 institutions themselves that have joined, working at a different level, working at the level of things like promotion and hiring criteria, and how you can use those to incentivize open research practices and other changes that we might want to incentivize. And then we have the external stakeholders, the funders, the publishers, the learning societies, the professional bodies and the other sector organizations, because we need to be creating those linkages and making sure that we coordinate our efforts across and between those different levels.

Marcus Munafò:
And of course, this is an international effort. The science is global. And so we’re now starting to see reproducibility networks modeled on that same structure, which gives that flexibility to tailor solutions locally while still coordinating across and between levels. We’re starting to see these emerge in other countries. So we have them in Australia, Germany, Switzerland, and Slovakia at the moment, several other countries that are interested in developing their own reproducibility networks. And so that coordination can then extend to a global scale.

Sophie Scott:
Thank you very much, Marcus.

Sophie Scott:
And now we are over to our last speaker, who’s Jo; now not sharing a session, but giving us a talk.

Sophie Scott:
Hi, Jo.

Jo Evershed:
Hello. Thank you. Can you see my screen?

Sophie Scott:
Yes.

Jo Evershed:
Yes. Great.

Jo Evershed:
Hi, I’m Jo Evershed from Gorilla, and we help behavioral scientists create and host online experiments quickly and easily.

Jo Evershed:
I want to talk about where this journey towards better reproducibility leads. As the Cheshire cat says, “If you don’t know where you’re going, it doesn’t much matter which way you go.” And I think it does very much matter.

Jo Evershed:
Reproducibility of findings is important to ensure that science is robust. I believe we’ve come a long way in the last 10 years. We’ve analyzed the problem, we’ve proposed and developed and tested several solutions, and now what’s left is to get the incentives right and implement these solutions. From that perspective, reproducibility is increasingly a solved problem. Although there is still plenty of work to be done to implement the solution.

Jo Evershed:
As an example, when I was a student 10 years ago, underpowered studies were the norm. Online behavioral research has changed all that. With Gorilla Open Materials, you can read a paper, access their protocol and clone it for replication in just three mouse clicks. And with the Recruitment Surface, you can launch it online and get a representative sample of 500 people in a lunch break. And with ResearchBox, you can then store the data analysis for posterity. That’s dramatic progress. I’m not sure exactly where we are on this journey, but I think we can see the path ahead.

Jo Evershed:
But reproducibility is not everything. It’s only part of the journey. So what’s the ultimate destination? In other words, what’s at the top of the mountain? Knowing what we’re aiming for will ensure we get the right processes and safeguards in place.

Jo Evershed:
So I think this is the full journey. We want our science to be reproducible, so that our findings are robust; and then generalizable so that we’re confident that they work in novel context; and then impactful, so they’re confident that they can be used in the real world.

Jo Evershed:
And then we have the prize: evidence-based products and services of the future. I’d like to see the behavioral sciences informing the products and services of the future and improving lives. At Gorilla, we talk about how the behavioral sciences can be leveraged to improve health, wealth, happiness, and education. For me, that’s what’s at the top of the mountain. And to do this, we need more pathways out of academia and into innovation.

Jo Evershed:
For everyone in the audience that can see a problem in the world with a behavioral solution, we want to give you the tools to research that problem, develop and test that solution, and take that with you out of academia and make it happen.

Jo Evershed:
Perhaps surprisingly, behavioral science academics make ideal entrepreneurs. We understand human behavior, we’re good with numbers, we’re tech savvy. A lot of the skills developed in academia translate well to startups. Pitching for grant funding isn’t that different to pitching for investment, running a lab isn’t that different to running a startup; in both, you’re improvising and experimenting and thinking deeply about what works. And as we’ve seen from today, academics are great for standing up in front of a crowd and sharing what they know.

Jo Evershed:
But entrepreneurs have the added bonus of being able to create a sustainable funding model. And as a case in point, moving into entrepreneurship is what me and Katya from Prolific did instinctively.

Jo Evershed:
So we’re trying to help build this pipeline so that the impact section of your grant proposal isn’t the end, but the springboard for the next step of productizing your research. Right now, we stop at the third circle and then go back to the start, but we’ve come all this way. Why are we stopping now?

Jo Evershed:
At Gorilla, we’ve been developing new tools to make it easier to design and test products and services of the future by making them more ecologically valid. With Game Builder, we hope to inspire a generation of education and development researchers to create games that can be used in classrooms to train and develop students, and with Shop Builder, we’re giving researchers the tools needed to nudge behavior. These tools are designed to enable much more rollout ready findings so that the leap to the fourth circle here is smaller.

Jo Evershed:
I’m saddened when I see products or policies based on ideologies or personal opinion, but that’s what risks happening if we leave a void. Instead, we could create pathways from academia to industry so we can take our findings and roll them out responsibly. That way we can ensure that the science isn’t corrupted as it’s translated into practice.

Jo Evershed:
One exciting idea that I want to leave you with is that product development doesn’t stop once you launch it. Imagine what you can find out when you have 2000 people using your product every day. What about with 200,000? Imagine the power of that study. Imagine how much more robust your findings will be when you can run micro experiments every day and see what works.

Jo Evershed:
Once taken to market, an educational [maths 00:34:20] game, like Diana’s game that we saw yesterday, could and should continue to analyze player learning and behavior to further improve the game. This sort of flywheel, where scientific advancement for the betterment of society is self-funding, will mean better products and services. It’s a different way of disseminating knowledge than printing it on bits of dead wood. And over the last two days, we’ve heard lots of ideas that could turn into products or services, from games to life-saving interventions.

Jo Evershed:
To me, this is what it would mean to be a behavioral scientist in industry and to work on products with a strong evidence base. Yes, there are complexities around how to use end user data responsibly, but I believe we can get there.

Jo Evershed:
Many of us want to have a positive impact on the world and leave a legacy. Given that so many of the challenges that face society are behavioral, you’re ideally positioned to do this. We’re making the tools to make that possible, but we need you to want it and dream it and to then go out into the world and do it.

Sophie Scott:
Thank you very much, Jo. And thank you to all the speakers.

Sophie Scott:
We’ve now got about 25 minutes for questions and discussion. So if you have anything that you’d like to say, please put it in the Q&A and I will bring those to the panel.

Sophie Scott:
In the meantime, I suppose I’d like to ask the other members of the panel, what did you think about Jo’s point about, it’s not just where we’ve come from and where we are, but where we’re going to? Could I start with you, Katya?

Ekaterina Damer:
I could not agree more. I don’t really know what else to say. I mean, we should be thinking about the future, right? What do we want to see in the future? Yeah, I just couldn’t agree more basically.

Sophie Scott:
Thank you.

Sophie Scott:
David?

David Rothschild:
Well, I think it touches on what a lot of people have been talking about, which is the evolution of academic publishing. And we’ve talked a lot on the replication side about the data that goes in.

David Rothschild:
I think one thing that was not noted enough and I think is worth noting is there’s an incredible cost on researchers in order to reach the replication standard that we want to do, and there’s also a lot of questions about privacy and data that just simply can’t be shared for various reasons. And I think it’s a really costly time for those of us who are trying to meet the standards because we’re in a transition phase. And to Jo’s point and other’s points, I really look towards startups. I look towards people who can monetize in many ways in order to make it a more efficient system so that there’s more vertical integration and that the costs aren’t so incredibly high, because it is a super frustrating experience for many of us.

David Rothschild:
And on the output side, I look forward to the movement past dead trees. You know, we have been working on building HTML versions of a lot of our papers, live versions where data continues to flow, easy replication, but again, super, super costly. Coming from an academic lab, we’re not going to design the software that’s ultimately going to take over, and so we’re building experimentation on it, again, in order to look towards industry to kind of get that right in order to lower the costs to make it happen.

David Rothschild:
And I just want to give a shout out to especially the younger researchers who are just slammed because of this transition period. It’s tough. It’s tough to make replication standards on a lot of papers, and I know that there’s a lot of late nights in order to do it all right, but then to also make it so that it reaches the standards which we all want to be at.

Sophie Scott:
Thank you.

Sophie Scott:
Uri, did you have any comments?

Uri Simonsohn:
No. Actually I wanted to ask just a quick follow up to David. When you say you’re experimenting with the HTML papers, is it just for your own papers, not for a platform for [crosstalk 00:38:18]?

David Rothschild:
Yeah. That’s correct. So working out of the lab with Duncan [Watts 00:38:25] at University of Pennsylvania, we’re putting up new dashboards and we’re taking every one of our papers and trying to build kind of HTML versions, where we can take the kind of flat charts that are in the paper and build interactive charts where you can click on them and you can see various subgroups, et cetera, et cetera.

David Rothschild:
And so that for data which is still flowing in, and this is especially true for observational data, that the charts could just continue to grow. So if we did a study period from 2016 to 2020 and the data is still flowing in, we can actually make it so that there’s a set version of it, but then there’s also a live dashboard version of it that just continues to grow and people can make their own comments and kind of expand from there.

David Rothschild:
But the idea is to make those. Obviously we’re hiring up and hiring data engineers, but it’s a process.

Sophie Scott:
Thank you.

Sophie Scott:
Marcus, did you have any thoughts about Jo’s … ?

Marcus Munafò:
Just to agree. I think if you want to get anywhere, you have to have a clear sense of where you’re going, and if we were to design from scratch a sector that would deliver the things that we ask academia to deliver, I’m not sure it would look exactly like what we have at the moment. And that means that we then need to confront some of the challenges. Like the fact that we have an overproduction problem, I would say, in academia, that we don’t resource what we do well enough. So the whole business model of academia is sort of predicated on the assumption that academics will work evenings and weekends.

Marcus Munafò:
Those are the things that we really need to grapple with when we think about what we want the future to look like. And then we can map a path from where we are to where we want to be.

Sophie Scott:
Thank you.

Sophie Scott:
There’s a question that’s come in on the chat, which I thought was quite interesting. And it was something that’s kind of been at the back of my mind through a lot of discussions around this area.

Sophie Scott:
So there’s all very excellent ideas about how to sort of improve many different aspects of what we do in science, but this is from Katya [inaudible 00:40:17], I’m sorry, I mangled your name there, but she’s worried about where the theory is going in this, where theory sits. And it is, if we place everything on rigor, we lose a lot or potentially could lose a lot in terms of … You know, you made the point Marcus about what it is we’re actually doing, and it was there in David’s talk as well. So I think, where do you see that sitting within this and how can we maintain the same standards of rigor around theorizing and modeling around our work?

Ekaterina Damer:
I could comment on that.

Sophie Scott:
Please do.

Ekaterina Damer:
I would say it’s all about collaboration and representation. If you only have one person who’s a thought leader and they get a prestigious award at a conference, I don’t think that’s going to get us anywhere. What we need is groups of people that are diverse working together in developing theories, and it needs to be a process where they debate and reconcile. And I’m not sure I see this sufficiently. There are still all sorts of professors and senior tenured academics who just push their pet theories. I think it’s just totally insufficient. That’s my perspective.

Jo Evershed:
I have a thought on this. It’s not a really well put together thought, so I’d love to hear what other people take from this. But I was struck by a separation in physics between theoretical physics and experimental physicists.

Jo Evershed:
So they have experimentalists going out and collecting data and making that data available, and these are people who are experts at designing studies and getting really interesting data, and then you have theoretical physicists who come and take those data sets and look at it, and then think deeply about the theories and see what might fit these data sets. That’s as much as I understand the physics. And I rather wonder whether we need to see some of that practice come to behavioral science so that we can look at both sides and be in conversation with each other, because maybe it’s got too much for one person to be able to do all of it, but maybe it does all need to still be in somebody’s head.

Jo Evershed:
So it’s really a question. It was a thought.

Sophie Scott:
I’m going to jump in here and pretend to be a panelist, not just a chair, then I’ll shut up.

Jo Evershed:
Please do.

Sophie Scott:
Even designing your study, you’ll make theoretical assumptions. Everything we’ve heard about theory was baked in, even if people don’t know that that was a theory they were working from. So I think it’s not that easy to separate it. When you’re running an experiment, you are theorizing, you’re working from the theory behind it. And actually, more discussion of that rather than less is something that I would like to see.

Sophie Scott:
Anyway, we’re not here to hear from me, but would anybody else like to [crosstalk 00:42:55]?

Jo Evershed:
No, I think that’s exactly what the physicists do. It means you’ve got two groups of people working on it together and passing it round between themselves. But yeah, it might be that it still all needs to be in one person’s head at the moment. I don’t know. But yeah …

Sophie Scott:
One of the things I’m always struck by in psychology and cognitive neuroscience is how little we spell out our assumptions when we’re designing a study. Actually the degree to which we are working from a completely unspoken set of assumptions, which are pure theory, is a real issue.

Sophie Scott:
Anyway, enough from me. Anybody else want to talk about theory? Come and shut me up.

David Rothschild:
[crosstalk 00:43:39], which is, and this’ll be a nod to Uri, and AsPredicted and pre-registration, which is that, I’ve noticed in my career, going from a point where things were very expensive right when I started out, people were still using labs, and my advisor really made me write out everything that I was planning to do and write out all the tables for synthetic data and have it all perfect, being like, “You got one shot because this is going to blow your entire budget.”

David Rothschild:
Then we moved into this time where it’s kind of fast and loose and everything felt so cheap, MTurk and other things. You were just running stuff and then you’ll see what happens, and you’re running things.

David Rothschild:
And then you get to this point now with pre-registration. To me, the p‑hacking question aside, the key thing is it gives me a really good way to teach my graduate students to write down the hypothesis, fill out those tables again. And that makes the main point in pre-registration, beyond the p‑hacking part, is to actually lay out the theory and lay out your assumptions, your hypothesis in a very clean and clear way so that it provides a nice iteration. And maybe some of the stuff makes it into the paper, maybe some of it doesn’t.

David Rothschild:
And to Jo’s point, I’d also say that I’m an economist. People still love theory over there, it still dominates. So I’m not worried about theory disappearing from papers, as much as the theory of people talking to the empirical people. That remains a problem, but there’s still a lot of folks out there in academia who love their theories.

Ekaterina Damer:
Psychology too. Loves their theories. Everyone loves their theories. I know.

Uri Simonsohn:
So to what David was saying, this is actually a service that we have done. It’s a different group of researchers asking for benefits of people who do pre-registration.

Uri Simonsohn:
The number one benefit they mention, something like 75% of researchers, is that it allows them to think ahead better to the research they’re going to be doing, even more than any sort of open science or lack of p‑hacking.

Uri Simonsohn:
And another thought on that is, there is a concern that there’s a fetishism with methodology in this credibility concern, that we just want to reproduce specific experiments that may not speak to a general theory.

Uri Simonsohn:
And I think there’s some truth to that concern, but the flip side of that is that, if you want to test a theory that you may falsify, nothing beats a pre-registration that clearly stipulates that prediction comes from theory. And then when you get a null effect, it’s a lot easier to persuade people that it’s worth publishing if it was both pre-registered and predicted by a clear theory.

Uri Simonsohn:
So even though there’s some competition between theory and sort of the fetishation of effects, there’s also some synergy.

Ekaterina Damer:
This reminds me, what you just said, what’s the goal of theory? Is it explanation or prediction? That’s maybe a question that’s relevant. I’d say prediction in the end.

Uri Simonsohn:
Me too.

Marcus Munafò:
Sophie, I think you make an important point that there’s a distinction between formal theory and informal theory. And a lot of our theorizing is relatively informal, I think, and we certainly could spell that out a bit better, including stuff that you may or may not think of as theory, just are sort of the assumptions that we bring to our work.

Marcus Munafò:
And actually, you know, most of us here, I think, do quantitative work. One thing I’ve learned is that every discipline does something well, and one area where qualitative research maybe has something to bring is that they really embrace the subjectivity that we bring to our research. And we all do that because our questions come from somewhere.

Marcus Munafò:
I think theory and observation go hand in hand. You build theories on robust observations and then you use theory to make predictions that you then test. So you iterate.

Marcus Munafò:
You know, a lot of the history of physics is just the ever more precise estimation of specific parameters. A lot of medical research is still effectively serendipitous. A lot of the things that theories were built around-

Ekaterina Damer:
Really?

Marcus Munafò:
Oh, absolutely.

Ekaterina Damer:
Why?

Marcus Munafò:
Because it’s messy, because actually it’s incredibly hard to theorize about biological systems because they’re so incredibly complex and noisy. And most stuff, we find by chance.

Marcus Munafò:
We give a drug to some people and it’s intended for one thing, but the side effect means we end up with Viagra. So serendipity is a huge part of how we progress, and we need to also just recognize that, I think. And when we have observations that are very precisely wrong, people build edifices on the back of that, and we end up with a situation that we’ve had with HDL and LDL cholesterol, for example, or the evidence that a small amount of alcohol consumption is good for you. Those findings are almost certainly wrong.

Ekaterina Damer:
No, I disagree. I want to be contrary now. I think you’re very cynical. I think there are plenty of very good clinical trials. I mean, just look at-

Marcus Munafò:
Oh, the trials are good. No, the trials are excellent. The trials are to a really high standard, but what you put into the trial or how you arrive at the compound is often just by chance. People didn’t theorize-

Ekaterina Damer:
Oh, I see.

Marcus Munafò:
… To get to Viagra.

Ekaterina Damer:
That’s normal. But that’s normal. It’s normal. That’s science.

Marcus Munafò:
That’s a great example. It was a side effect. So a really good trial monitoring side effects showed a signal for something unexpected, that then went into its own trial and it was shown to do this other thing as well. The trials were really robust, but arriving at that was serendipity.

Ekaterina Damer:
But that is part of science. Science is [crosstalk 00:48:58].

Marcus Munafò:
True. Exactly. So we don’t want to obsess about one or the other. We just need to recognize that there’s this kind of dance between really good observations, really good theory, and moving them forward together.

Ekaterina Damer:
So basically, both exploratory and confirmatory. That’s what Brian Nosek always talks about.

Marcus Munafò:
And I think one of the problems we have with our culture at the moment is we have to pretend everything was confirmatory. We have to tell a story as if we hypothesized, which is the great kind of satirical article about, you know, why all psychologists must have extra sensory perception or pre-cognition, because everything we hypothesize turns out to be true. 95% of published articles in psychology show what we claimed they would show. Some of that is publication bias, but there are other things at play. And actually, being clear about, you know, “This is exploratory research. I’m not going to put a P value on it. I’m just going to describe what I saw,” I think there’s a place for that.

Sophie Scott:
And actually, this goes back to Jo’s point of … I remember years ago when I used to teach for The Open University, they had a really good piece of materials. This is an Open University, anyone could sort of sign up to do a degree. And it was just an interview with Donald Broadbent, who was sort of the guy who really got selective attention studies going in the UK. And all of his original work was done from really applied work with the air force, looking at people … The original headphones would play one message into one ear and the other message to another ear, and people kept crashing their planes. And Broadbent started looking at this, and a great deal of highly influential and highly replicable work on attention stemmed from that.

Sophie Scott:
And he said it has to be, “You can take your problems from the real world, you can feed that back into your theory, and then ideally you take it back out into the real world.”

Sophie Scott:
It’s actually something that, you know, it spins back out to where Jo was suggesting.

Sophie Scott:
We’ve got a few more minutes.

Jo Evershed:
Can I ask-

Sophie Scott:
Sorry. Go on.

Jo Evershed:
Can I ask Marcus a question on that spinning it out, because why are things not going from academia and into industries? Is there a gap in the funding model? Because there’s funding for the discovery, and there seems to be funding by UKRI for the entrepreneurial bit, but the roll up, the scale up bits … We see so many of our clients who have got something that’s near product ready, but it needs that transitional element. Is there something missing?

Marcus Munafò:
What kind of interaction are you talking about specifically? Are you talking about academics going into industry or are you talking about industry coming to academia?

Jo Evershed:
It’s academics going into industry. So academics who maybe have designed and tested and run an RCT to create an intervention that works in education. And they’re like, “Well, I’ve done it and I can write my paper and it works. What do I do now?”

Marcus Munafò:
Yeah. I mean, that’s a big question. I think part of it is that our culture leads us to a model where we’re sort of apprenticing undergraduates, then PhD students through to becoming professors. And most of them won’t, but you never think it’s going to be you that falls off the Ponzi scheme, if you like. And so we define success in those terms, which then makes people feel very uncomfortable about doing something other than the thing that we’ve defined as success. I think that’s one of the problems. We need to have a much richer vision of what success looks like, if you have academic skills, research skills and want to pursue those.

Marcus Munafò:
So I think part of it is that, and snobbery. I think part of it is the feeling that, if you have made it within academia and you step too far out, it’s kind of a one-way door. It’s very hard to come back in again, which puts people off. I think we need to, again, think about ways in which we can create more of a revolving door with people from industry coming into academia, people from academia going out into industry, and back again, so that we exchange knowledge in that way.

Marcus Munafò:
And I think there’s also, again, you could think of it as a kind of snobbery thing, to some extent, there are hierarchies everywhere and one of the hierarchies is between more fundamental research versus more applied research. And in every field that kind of hierarchy exists, I think.

Marcus Munafò:
So I think there are lots of reasons, but it’s certainly something that we could do a lot better. And one of the things that’s happening in the UK that I think it’s healthy is that there’s this people and culture strategy that’s developing, linked into all of this other activity around research culture and so on, which is happening at a government level. And I think that’s going to look at many of these things.

Marcus Munafò:
One of the challenges is that academics often don’t appreciate the standard that’s required in industry, certainly in the pharmaceutical industry. And I think the pharmaceutical industry is a really interesting microcosm of incentives when you go from the discovery end, where there’s much more of an incentive to get the right answer, and the marketing end, where there’s a huge sunk cost bias.

Marcus Munafò:
So that’s not really answering your question, except to say it’s complicated, there are lots of bits to it, and I think each one of those bits is important and deserves some attention.

Jo Evershed:
No. And I thought what was really interesting that you said was how different industries do it, because if I’m right, doctors in the UK quite often have dual practice and academic positions, right? They do have one foot in each camp. And that massively helps the transfer of innovation from theory into practice and the other way. And maybe that’s some of what we’re missing in the behavioral sciences and education, is these dual appointments where you’re expected to do both.

Marcus Munafò:
And you are starting to see those. There’s one at Oxford, for example. Again, it’s more in the sort of biomedical space, but I think that’s right. I think that could be [inaudible 00:54:34].

David Rothschild:
I was going to add that I think, on a good note, that over the last few years there’s been a massive growth in real science being done in the tech industry, when just 10 or 15 years ago, there really weren’t options for reasonable kind of academic standard, or at least academic style research, outside of academia. People were moving away into places like consulting, et cetera, et cetera. That’s opening up.

David Rothschild:
I think Marcus’ point, which I think is super interesting, I hadn’t heard it put this way, is the revolving door problem; being able to go back and forth. It really is a one-way street though. There’s really no question about that.

David Rothschild:
And I think part of it is people’s in academia’s misunderstanding of the type of research that can be done, the lack of transparency in the research that’s done, but a part of it has to do with simply, at least in the US, the hiring process is still constrained with inside departments. And so, even when academia recognizes that interdisciplinary and practical types of people are extremely useful to the university or schools, there aren’t lines available. And I hope to see with the more emphasis on kind of computational social science in general, which cuts across departments and other types of kind of newer information schools, there may be more of that coming.

David Rothschild:
But that seems to be the main constraint. You have both these constraints fighting against each other, but definitely this main constraint of hiring lies within the departments really constrain who can be hired back into academia.

Ekaterina Damer:
There’s one more point I want to make before we run out of time.

Ekaterina Damer:
Somebody touched upon, I don’t remember who it was, something about profitability and funding. I’m just going to share something that I don’t think I’ve shared publicly before.

Ekaterina Damer:
In the very early days of Prolific, that was about seven years ago, I actively thought about, “Should I make it a for-profit or a non-profit?” And after some thinking, I was very sure that I wanted to be a for-profit organization, because I never ever want to be begging funders for money. This is why academics have a problem. They have to keep begging for money and applying for grants, and it takes forever and it leads nowhere. And the decision process is arbitrary anyway. It’s not even a good process.

Ekaterina Damer:
So I’d rather be for-profit and then put the right checks and balances in place, having the right kind of board of directors or whatever, and then invest our own revenue back into building infrastructure. That seems like a much more prudent strategy than having to beg The Arnold Foundation, which is the foundation that’s putting a lot of money into the Open Science Framework.

Sophie Scott:
Thank you.

Sophie Scott:
I think we’re going to need to start to wrap up. So I just want to say thank you to all the speakers. I thought that was a really interesting session and a lot of different points to go back into. In fact, I’m going to go find the recording of this on YouTube and watch it again.

Sophie Scott:
So I’d just like to finish by saying thank you to Uri, to Katya, to Marcus, to David, and I’m going to say thank you to Jo, and hand over to Jo, because I think we’re going to wrap up.

Jo Evershed

CEO and cofounder of Gorilla Experiment Builder

Jo Evershed loves providing behavioural scientists with tools to liberate their work from the lab and accelerate the creation of evidence-tested interventions. She holds a BSc in Psychology from UCL and a BSc in Combined Studies (Economics and Business) from Oxford Brookes and is an Innovate UK Women in Innovation Award Winner.

Follow on Twitter

Connect on LinkedIn

Professor Uri Simonsohn

AsPredicted

Professor Marcus Manufo

University of Bristol (UKRN)

Dr. Ekaterina Damer

Prolific

Dr. David Rothschild

Microsoft Research

Get on the Registration List

BeOnline is the conference to learn all about online behavioral research. It's the ideal place to discover the challenges and benefits of online research and to learn from pioneers. If that sounds interesting to you, then click the button below to register for the 2023 conference on Thursday July 6th. You will be the first to know when we release new content and timings for BeOnline 2023.

Register Now

Repro­ducibil­i­ty 2.0

Full Tran­script:

Jo Evershed

CEO and cofounder of Gorilla Experiment Builder

Professor Uri Simonsohn

AsPredicted

Professor Marcus Manufo

University of Bristol (UKRN)

Dr. Ekaterina Damer

Prolific

Dr. David Rothschild

Microsoft Research

Get on the Registration List

With thanks to our sponsors!

Reproducibility 2.0

Full Transcript: