Here’s a comment on this post originally put on Facebook, by Daniel Handwerker (copied here with permission). See next comment for my reply…
“I disagree.
You say the goal of science is to understand what can be "accepted as fully established knowledge" You do not claim or explain why that goal has to be reached to publish a peer reviewed article. I'd go father and say, if the criteria for publication is that the reviewers thinks a finding should be accepted as established knowledge, then nothing would ever get through peer review. A replication might increase the likelihood a finding is reliable but it wouldn't remotely cross the high bar you're setting.
Replication, particularly the split half replications you're using in examples, only focus on two causes of irreproducibility: Statistical power & methodological clarity/processing consistency. It might help in some situations, but won't fundamentally change the core issues underlying seeking truths about a complex system.
This is still my favorite piece on this topic: https://drugmonkey.scientopia.org/2014/07/08/the-most-replicated-finding-in-drug-abuse-science/ The core message is that publishing and sharing inconsistent findings from the fundamentally same study design is what advanced scientific understanding. This example isn't a replication failure, it's advancing science through conversations on why things aren't replicating.
Fundamentally, the question is what is a peer reviewed scientific publication? If we treat scientific publications as markers of scientific truth, then we have a replication crisis every time the findings in a specific paper aren't replicated. If we treat scientific publications as part of a conversation that advances science, then the purpose of each paper should be to clearly explain what they did (so that others replicating is possible), explain why their results are plausible (i.e. statistics & methods are sufficiently robust to support claims), and limitations on their interpretations of results. Some papers might present more support for their interpretation and some might present an interesting observation paired with clear discussion of interpretability limits. Both are critical parts of scientific discourse and advancement.
There are two core trouble with the goal of making every publication reproducible. 1. It's impossible and claiming it is possible is unscientific. 2. It amplifies the file drawer effect where potentially valuable non-replications never enter scientific discourse.”
You make some really good points (most of which I agree with), but I think you misread my main point. That's mostly on me; I should've been clearer with my message. I'll clarify below.
I’m not saying it should be necessary to fully establish a finding to publish it. I’m saying there should be more value placed in replication than present, and it should be seen as necessary (not sufficient!) for fully establishing findings. This would set fully establishing a finding as a higher bar than peer reviewed publication, leaving room for the sort of lack-of-replication debate you described (among other things). That said, the increased value placed in replication would supposedly reduce the rate of novel results (given limited scientific resources), such that the total number of peer reviewed publications may decrease a bit. I see this as a good thing – quality over quantity.
Regarding the link Drug Monkey post you shared, it makes a separate point from my post but ends up in the same place: "Replication is the very lifeblood of science."
You wrote: "There are two core trouble with the goal of making every publication reproducible. 1. It's impossible and claiming it is possible is unscientific. 2. It amplifies the file drawer effect where potentially valuable non-replications never enter scientific discourse."
I wasn't saying we need to replicate every published finding (but I do think a finding should be replicated in order to say it's established knowledge). That said, why are you saying it's impossible? (I can imagine it being impractical, but not impossible.) Are you saying that because of the Drug Monkey example? If so, that's an issue of invariance of an effect rather than lack of reproducibility per se (see http://www.joelvelasco.net/teaching/5330(fall2013)/Woodward00-explinvarspecialsciences.pdf). Drug Monkey makes the excellent point that the effort to replicate an effect can reveal its variant/invariant nature. I'm not focusing on that sort of thing here, though, since I'm emphasizing within-study replication. The idea is to reduce variation in experimental procedures by having the same lab collecting the same data in the same way, then test for replication within that large dataset. This helps combat overfitting to noise from excessive experimenter degrees of freedom in complex data analyses. It would be a separate (but important) exercise to determine the many factors that could influence the finding, revealing its variance/invariance.
With regarding to your point #2, I am not saying we should put up a gate to keep non-replicated studies out of the literature, but rather that the higher bar of having an established finding should require replication. Thus, it should not amplify the file drawer effect. Indeed, it should reduce the negative impact of the file drawer effect. This is because the biggest problem from the file drawer effect is that published results are not reproducible due to selective reporting, such that having more attempts at reproducing results would counter this problem by revealing which of those results are not reliable. Certainly, another problem with the file drawer effect is lack of public knowledge of the variance/invariance of a finding, but I think this is much less of a problem than the possibility that a finding isn't real to begin with. In the Drug Monkey example, I'm saying it would be most important to ensure that the finding that rats intravenously self-administering cocaine is replicated so it can become established knowledge (he also puts this first when he writes: "if I had to pick one thing in substance abuse science that has been most replicated it is this"), while many other failed attempts to replicate it would add context to the finding's variance/invariance.
Here’s a comment on this post originally put on Facebook, by Daniel Handwerker (copied here with permission). See next comment for my reply…
“I disagree.
You say the goal of science is to understand what can be "accepted as fully established knowledge" You do not claim or explain why that goal has to be reached to publish a peer reviewed article. I'd go father and say, if the criteria for publication is that the reviewers thinks a finding should be accepted as established knowledge, then nothing would ever get through peer review. A replication might increase the likelihood a finding is reliable but it wouldn't remotely cross the high bar you're setting.
Replication, particularly the split half replications you're using in examples, only focus on two causes of irreproducibility: Statistical power & methodological clarity/processing consistency. It might help in some situations, but won't fundamentally change the core issues underlying seeking truths about a complex system.
This is still my favorite piece on this topic: https://drugmonkey.scientopia.org/2014/07/08/the-most-replicated-finding-in-drug-abuse-science/ The core message is that publishing and sharing inconsistent findings from the fundamentally same study design is what advanced scientific understanding. This example isn't a replication failure, it's advancing science through conversations on why things aren't replicating.
Fundamentally, the question is what is a peer reviewed scientific publication? If we treat scientific publications as markers of scientific truth, then we have a replication crisis every time the findings in a specific paper aren't replicated. If we treat scientific publications as part of a conversation that advances science, then the purpose of each paper should be to clearly explain what they did (so that others replicating is possible), explain why their results are plausible (i.e. statistics & methods are sufficiently robust to support claims), and limitations on their interpretations of results. Some papers might present more support for their interpretation and some might present an interesting observation paired with clear discussion of interpretability limits. Both are critical parts of scientific discourse and advancement.
There are two core trouble with the goal of making every publication reproducible. 1. It's impossible and claiming it is possible is unscientific. 2. It amplifies the file drawer effect where potentially valuable non-replications never enter scientific discourse.”
You make some really good points (most of which I agree with), but I think you misread my main point. That's mostly on me; I should've been clearer with my message. I'll clarify below.
I’m not saying it should be necessary to fully establish a finding to publish it. I’m saying there should be more value placed in replication than present, and it should be seen as necessary (not sufficient!) for fully establishing findings. This would set fully establishing a finding as a higher bar than peer reviewed publication, leaving room for the sort of lack-of-replication debate you described (among other things). That said, the increased value placed in replication would supposedly reduce the rate of novel results (given limited scientific resources), such that the total number of peer reviewed publications may decrease a bit. I see this as a good thing – quality over quantity.
Regarding the link Drug Monkey post you shared, it makes a separate point from my post but ends up in the same place: "Replication is the very lifeblood of science."
You wrote: "There are two core trouble with the goal of making every publication reproducible. 1. It's impossible and claiming it is possible is unscientific. 2. It amplifies the file drawer effect where potentially valuable non-replications never enter scientific discourse."
I wasn't saying we need to replicate every published finding (but I do think a finding should be replicated in order to say it's established knowledge). That said, why are you saying it's impossible? (I can imagine it being impractical, but not impossible.) Are you saying that because of the Drug Monkey example? If so, that's an issue of invariance of an effect rather than lack of reproducibility per se (see http://www.joelvelasco.net/teaching/5330(fall2013)/Woodward00-explinvarspecialsciences.pdf). Drug Monkey makes the excellent point that the effort to replicate an effect can reveal its variant/invariant nature. I'm not focusing on that sort of thing here, though, since I'm emphasizing within-study replication. The idea is to reduce variation in experimental procedures by having the same lab collecting the same data in the same way, then test for replication within that large dataset. This helps combat overfitting to noise from excessive experimenter degrees of freedom in complex data analyses. It would be a separate (but important) exercise to determine the many factors that could influence the finding, revealing its variance/invariance.
With regarding to your point #2, I am not saying we should put up a gate to keep non-replicated studies out of the literature, but rather that the higher bar of having an established finding should require replication. Thus, it should not amplify the file drawer effect. Indeed, it should reduce the negative impact of the file drawer effect. This is because the biggest problem from the file drawer effect is that published results are not reproducible due to selective reporting, such that having more attempts at reproducing results would counter this problem by revealing which of those results are not reliable. Certainly, another problem with the file drawer effect is lack of public knowledge of the variance/invariance of a finding, but I think this is much less of a problem than the possibility that a finding isn't real to begin with. In the Drug Monkey example, I'm saying it would be most important to ensure that the finding that rats intravenously self-administering cocaine is replicated so it can become established knowledge (he also puts this first when he writes: "if I had to pick one thing in substance abuse science that has been most replicated it is this"), while many other failed attempts to replicate it would add context to the finding's variance/invariance.