• Due to ongoing issues caused by Search, it has been temporarily disabled
  • Please excuse the mess, we're moving the furniture and restructuring the forum categories
  • You may need to edit your signatures.

    When we moved to Xenfora some of the signature options didn't come over. In the old software signatures were limited by a character limit, on Xenfora there are more options and there is a character number and number of lines limit. I've set maximum number of lines to 4 and unlimited characters.

How to do bad Science (or On The Emptiness of Failed Replications)

cosmicaug

Graduate Poster
Joined
Jun 27, 2012
Messages
1,957
From http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm

On the emptiness of failed replications
Jason Mitchell
Harvard University​
  • Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.
  • Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.
  • Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
  • Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
  • The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their "degrees of freedom," for example, by specifying designs in advance.
  • Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.

And it only gets worse from there, people, but I know the mods do not like excessive quoting. Basically, Daryl Bem and pharmaceutical companies have reason to love this essay.
 
Quoting his last end note:
[8] As a rule, studies that produce null results—including preregistered studies—should not be published. As argued throughout this piece, null findings cannot distinguish between whether an effect does not exist or an experiment was poorly executed, and therefore have no meaningful evidentiary value even when specified in advance. Replicators might consider a publicly-searchable repository of unpublished negative findings, but these should in no way be considered depositive with regard to the effects of interest.
 
And positive findings cannot distinguish between whether an effect does exist or an experiment was poorly executed.

What an idiot.
 
Most plans are critically flawed by their own logic. A failure at any step will ruin everything after it. That's just basic cause and effect. It's easy for a good plan to fall apart.

Therefore, a plan that has no attachment to logic cannot be stopped. The success or failure of any given step will have no impact on the macro level.

Couldn't help it. The bullet points remind me too much of this comic.
 
Hey, you know what matters even less to science than failed replications? Rants about how science should be done posted to someone's personal blog.

Srsly, what study of his just got retracted due to no one being able to replicate it? He seems a mite butthurt about it.
 
Jason Mitchell said:
Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.
This is nonsense. An unsuccessful experiment proves that that methodology doesn't work. Even if it has no value in terms of advancing knowledge on the topic at hand, it has TREMENDOUS value in terms of preventing people from going down blind alleys. I've heard numerous professors tell grad students "I know someone who did what you're talking about; it didn't work"; had those grad students not known someone who knew the person, they'd have wasted years of effort.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.
The old "If you don't succeed you weren't trying hard enough" canard. :rolleyes:

Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
All three are nonsense. Obviously knowing WHY a replication failed is going to "contribute to a cumulative understanding of scientific phenomena", for example (and it's wrong to differentiate scientific from normal phenomena).

Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
Translation: Anyone who tries my experiments for themselves for that reason has an axe to grind, and sinice they have a viewpoint they're working towards we can ignore hte quality of their research.

The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their "degrees of freedom," for example, by specifying designs in advance.
Culling false theories isn't an improvement, apparently. Guess we should just let Freudian psychology stand, and no one should have touched the Uniformity of Rate or Uniformity of State. :rolleyes:

Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.
NO KIDDING. Science isn't a happy shiny land where everyone poops rainbows and lollypops. It's a very rough-and-tumble world full of pasionate people. If you put your work out there, someone will almost inevitably attack it. If you don't like that, either grow up or go away. We're not going to change just because we hurt your widdle feewings. Science is not something for the feignt of heart; it requires a great deal of courage.

This is just rank nonsense, and serves as further proof that an ivy league background is no guarantee of quality.
 
What I don’t like is this guy is conflating negative results and replication of results. While negative results can be interesting, there is some merit in downplaying them in many cases. Not all cases but many.

One of the cases where a negative result is VERY interesting, however, is when you strongly expect a positive result. In fact some of the most interesting experiments/predictions in the history of science have been interesting because they failed. They represent a hole in current understanding that offers room to build on.

Failed replication doesn’t measure up to that standard of interesting, but it’s still a case where you strongly expect one result and get another and therefore still points to a hole or a flaw in understanding the subject. In this case the it points to a flaw in the idea being advanced by the original researcher.

It’s not even just outright fraud that is a concern, people by nature are subject to confirmation bias and any number of other effects than can lead them down a incorrect path. Everyone’s results need to be validated and vetted, and the more frequently the better because as you get more and more certain of an experiments results the more interesting it becomes if you can find some twist that genuinely doesn’t fit expectations.
 
  • Recent hand-wringing concern over failed replications in social psychology is largely pointless wise, because unsuccessful unreproducible experiments have no meaningful scientific value.
  • Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator originator bungled something along the way. Unless direct replications are conducted by flawless experimenters can confirm the results, nothing interesting can be learned from them the original experiment.
  • Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily have to identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
  • Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
  • The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their "degrees of freedom," be more rigorous, for example, by specifying designs in advance.
  • Whether they mean to or not, authors and editors of failed replications unreproducible experiments are publicly impugning the scientific integrity of their colleagues themselves. Targets of failed replications are unjustifiably upset, particularly given the inadequate basis for replicators’ their extraordinary claims.
 
I find serious problems with all the quoted bullets so I'm not going to bother with the full link.

The truth is that when I initially was reading the bullets I was expecting this to be a satire.
 
  • Recent hand-wringing concern over failed replications in social psychology is largely pointless wise, because unsuccessful unreproducible experiments have no meaningful scientific value.
  • Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator originator bungled something along the way. Unless direct replications are conducted by flawless experimenters can confirm the results, nothing interesting can be learned from them the original experiment.
  • Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily have to identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
  • Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
  • The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their "degrees of freedom," be more rigorous, for example, by specifying designs in advance.
  • Whether they mean to or not, authors and editors of failed replications unreproducible experiments are publicly impugning the scientific integrity of their colleagues themselves. Targets of failed replications are unjustifiably upset, particularly given the inadequate basis for replicators’ their extraordinary claims.

That kind of sums it up for me. I pretty much disagreed with almost everything written in the piece.

You know who else would have liked this piece? Benveniste!
 
  • Recent hand-wringing concern over failed replications in social psychology is largely pointless wise, because unsuccessful unreproducible experiments have no meaningful scientific value.
  • Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator originator bungled something along the way. Unless direct replications are conducted by flawless experimenters can confirm the results, nothing interesting can be learned from them the original experiment.
  • Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily have to identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
  • Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
  • The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their "degrees of freedom," be more rigorous, for example, by specifying designs in advance.
  • Whether they mean to or not, authors and editors of failed replications unreproducible experiments are publicly impugning the scientific integrity of their colleagues themselves. Targets of failed replications are unjustifiably upset, particularly given the inadequate basis for replicators’ their extraordinary claims.
Well fixed and made meaningful. I quite agree with the " he/she/they picked on my paper and now I am butthurt" theory of the origin of the paper in the OP. [eta] I am quite sure further research will prove this hypothesis to be both valid and correct.
 
Last edited:
There is a lot of buzz in the SciLit in recent years about the failure of journals to publish negative result papers. It creates a sort of selection bias of publications.

Note that the blogger suggests that incompetent or biased replication experiments are rife. If true that's a valid complaint against those specific experiments, but the blogger seem to believe that experiments can't be humanly replicated. It's certainly true that, particularly in social sciences, it's very hard or impossible to control all variables, however that same argument may invalidate the conditions and claims of the original finding. A good reason to replicate.

What I don’t like is this guy is conflating negative results and replication of results.

In a replication experiment, we test the same hypothesis, and a negative result is one that fails to support the hypothesis. There is no invalid conflation.

While negative results can be interesting, there is some merit in downplaying them in many cases. Not all cases but many.

Horrific nonsense! The idea that we can "downplay" any result is just an argument in favor of bias - totally wrong-headed. THIS PROBLEM that you suggest has merit is exactly why pre-registration of experiments is now required in some fields - to prevent downplaying or even ignoring experiments based on results and not methods. If you examine the protocols for Cochrane Reviews (which I consider to be top quality methods) meta-analyses you'll see they can weight experimental results based on confidence intervals (the precision of the result) - but cannot weight experiments based on results.


It’s not even just outright fraud that is a concern, people by nature are subject to confirmation bias and any number of other effects than can lead them down a incorrect path.

Yes, but the blog author is suggesting that there is a rejection bias - that some social science experimenters set out to invalidate another's result.

Somewhere closer to reality - all experimenters have expectations wrt results, and that's fine so long as they don't "go Millikan" and select data, or as the blogger suggests - fail to fairly replicate the experiment conditions.
 
Last edited:
stevea said:
Yes, but the blog author is suggesting that there is a rejection bias - that some social science experimenters set out to invalidate another's result.
Even that wouldn't be a real concern, provided the protocols were sufficient to justify the results. A great deal of science has been done simply because two people really, really didn't like each other. As long as both sides are honest, attempts to disprove the other person can be very useful in advancing science. For one thing, it's a fantastic litmus test--if my worst enemy says "I cannot refute his findings", I'm pretty much right. For another, it encourages a great deal of creativity in terms of coming up with novel ways to explore a problem. As long as you keep the TNT out of each other's hands and you don't sacrifice rigor to achieve the results you want, rivalries can be a good thing.
 
From http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm



And it only gets worse from there, people, but I know the mods do not like excessive quoting. Basically, Daryl Bem and pharmaceutical companies have reason to love this essay.

This assumes, of course, that only the repeats had flaws that yielded this failure to repeat, and not that the origins had flaws that falsely generated the original results. Why on earth would anyone believe this, let alone put it in writing after having the chance to think about it?

I also learned very early in my own work that you have to describe your experiments adequately so anyone following that description will be able to repeat your results. If the necessary information was not fully described in your Materials and Methods, either you were a poor scientific writer or you didn't really understand what actually generated your results in the first place.
 
Horrific nonsense! The idea that we can "downplay" any result is just an argument in favor of bias - totally wrong-headed.



Which part do you think is "horrific nonsense". Do your disagree me that replication failures are very important or are you suggesting all negative results need to be treated as seriously?

Either way you are incorrect. The OP link wants to play on the fact that negative experiments are indeterminate and inconclusive to say the same for inability to replicate. The reverse is also problematic, you get people who want to say a negative result proves the hypothesis false. Both of these are problematic which is why we need to avoid conflating the two concepts.
 
Even that wouldn't be a real concern, provided the protocols were sufficient to justify the results. A great deal of science has been done simply because two people really, really didn't like each other. As long as both sides are honest, attempts to disprove the other person can be very useful in advancing science. For one thing, it's a fantastic litmus test--if my worst enemy says "I cannot refute his findings", I'm pretty much right. For another, it encourages a great deal of creativity in terms of coming up with novel ways to explore a problem. As long as you keep the TNT out of each other's hands and you don't sacrifice rigor to achieve the results you want, rivalries can be a good thing.

I agree. The point I want to add is that this covers the case of similar experiments. What if he performer a different experiment and it fails to support your hypothesis? In this case he is not entitled to say he's refuted your findings because a negative result doesn't disprove your hypothesis.

In most hypothesis testing a negative result is an inability to distinguish the hypothesis being tested from the null hypothesis. It does not mean the hypothesis being tested is wrong, it could simply be the data is inadequate or the experiment to imprecise. It certainly does not mean the null hypothesis is correct, which is an all to common interpretation for a negative result. An inability to replicate your results would be a MUCH stronger finding.
 
lomiller said:
What if he performer a different experiment and it fails to support your hypothesis? In this case he is not entitled to say he's refuted your findings because a negative result doesn't disprove your hypothesis.
I think there's some confusion regarding the concepts "proof" and "evidence" (the terms are used loosely enough even among scientists that this is understandable). Something can be evidence for an idea without being proof of it, or can be evidence against an idea without being proof the idea is wrong. In the case you describe, the negative results are evidence that my idea is wrong, but not necessarily proof.

For example, let's say I'm a paleontologist looking at forams in the Late Cretaceous. I sample them extremely precisely up to and through the K/Pg boundary. My results show that there's a slow, gradual transition in foram fauna. This is evidence against the Alvarez Hypothesis--such findings at least indicate that it is not true. However, they do not disprove the hypothesis--local factors can overwrite global forcing mechansisms, and differen taxa reacted to the impact differently.

In contrast, findings that multiple taxa can be found in their normal abundance right up to the irridium layer did disprove a strict interpretation of Uniformitarianism. The facts flatly contradicted key concepts in Uniformitarianism sensus stricto.

I've gotten into some trouble recently with my definition of proof, but in my opinion if some datum demonstrates an idea to be true to the point where withholding acceptance is irrational, the concept can be termed proven. A photo of the Earth from space proves tha the Earth is round (not that it needed proving, but it illustrates the point). The fact that five bombs could sink a WWI battleship proved that air power was vital to future warfare in the 1920s/1930s.

Most of the time we dont' get to deal with proof or disproof. What normally happens is that we examine the preponderance of evidence, and determine which concepts are better supported. Proof and disproof are ideal, but not necessarily possible.
 
Let me try to explain things, as I understand them, in a slightly different way:

If a scientist publishes some experimental data, and descibes how the experiments were performed, anyone who duplicates the experiments correctly should be able to get the same data. If they indeed do the experiments correctly as described, but did not get the same data as the original, they have refuted the first scientist and it suggests that the first scientist was wrong (perhaps in only their description of how to do the work, or just dead wrong).

If, the first scientist's experimental results are reproducible, but also would predict a different future result and that result does not happen when tested, then the original theory is wrong, even if the original experiments were correct. Thus, doing an experiment different from the first may prove a theory wrong, but is cannot prove the original experiments wrong.

One example: I might "prove" that time is unaffected by location using a mechanical clock, and theorize this is always true. The simplest theory is that time is unaffected by location. I expect that anyone measuring time using the same type of mechanical clock would not see a variation due to location. Yet, if someone using an atomic clock sees that time is affected by velocity and by gravity, they have proven my theory wrong in detail, but not my experimental observations (a mechanical clock is not as accurate as an atomic clock).

As a scientist, I feel a very strong obligation for my experiments to be correct, and that I create the best theory possible based on my experimental result. My theory may prove incorrect when new, unanticipated results are obtained later, but that is part of science.
 
Well, data are never incorrect--they are what they are, and they more or less define "correct". So no experimental data can be disproven (outside of fraud, obviously). Only interpretations can.

The only problem I see with your explanation is a practical one: no one does direct replication of experiments without some reason to believe the experiments were flawed. So outside of rare cases, I'm not sure you're making a meaningful distinction. That said, this is pretty much the most minor objection imaginable. I'm in no way arguing against your explanation. :)

Giordano said:
My theory may prove incorrect when new, unanticipated results are obtained later, but that is part of science.
Someone, I forget who, once told me "If you're never wrong, you're not doing science." :D
 
I find it very hard to believe that someone with such an impressive CV as Dr. Mitchell could seemingly not understand even the basics of the scientific method. Much butt-hurt indeed!
 
This assumes, of course, that only the repeats had flaws that yielded this failure to repeat, and not that the origins had flaws that falsely generated the original results. Why on earth would anyone believe this, let alone put it in writing after having the chance to think about it?

I also learned very early in my own work that you have to describe your experiments adequately so anyone following that description will be able to repeat your results. If the necessary information was not fully described in your Materials and Methods, either you were a poor scientific writer or you didn't really understand what actually generated your results in the first place.

Or you were trying to cover something up:) .
 
Image the insanity of "Facilitated Communication" had that fiasco not been put to bed. And there are still true believers.
 
PLOS ONE: “Positive” Results Increase Down the Hierarchy of the Sciences by Daniele Fanelli

There is a well-known hierarchy of the sciences from "hard" to "soft":
  • Physical sciences
  • Biological sciences
  • Social sciences
From the paper,
... in some fields of research (which we will henceforth indicate as “harder”) data and theories speak more for themselves, whereas in other fields (the “softer”) sociological and psychological factors – for example, scientists' prestige within the community, their political beliefs, their aesthetic preferences, and all other non-cognitive factors – play a greater role in all decisions made in research, from which hypothesis should be tested to how data should be collected, analyzed, interpreted and compared to previous studies.
In one study, 222 scholars were asked to rate several academic disciplines by similarity. An analysis revealed three axes of variation:
  • Hard - soft
  • Pure - applied
  • Life - non-life
There's support for a hard - soft axis of variation from studies of lots of features, like number of colleagues acknowledged per paper, immediacy of references, and even the fraction of paper area dedicated to graphs. There are some other opinions, however:
  • The social sciences cannot be objective
  • The natural sciences and the social sciences work much alike
  • They are all socially-constructed intellectual fashions
An intermediate position would be to distinguish between a "core" and a "frontier" of a field. The frontiers of different fields may be much alike, while the cores may be very different. If the contents of advanced university textbooks are any guide, the cores are indeed different, with the physical sciences being much more structured and developed than the social sciences.

From the abstract, the results:
Controlling for observed differences between pure and applied disciplines, and between papers testing one or several hypotheses, the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioural and social methodologies on people compared to physical and chemical studies on non-biological material. In all comparisons, biological studies had intermediate values.
In his discussion, DF considers an odd conundrum: results in the physical sciences are typically much stronger statistically than results in the biological and social ones. So why do they get more negative results?

It seems to me that part of the problem is what sort of spin one can place on negative results. Can such results be interpreted as upper limits or lower limits? The Particle Data Group has oodles of such limits.

There is also the problem of how far-reaching are the theories that the experiments test. The more far-reaching, the more important the negative result. From the paper,
Younger, less developed fields of research should tend to produce and test hypotheses about observable relationships between variables (“phenomenological” theories). The more a field develops and “matures”, the more it tends to develop and test hypotheses about non-observable phenomena underlying the observed relationships (“mechanistic” theories). These latter kinds of hypotheses reach deeper levels of reality, are logically stronger, less likely to be true, and are more conclusively testable.
The harder sciences also tend to have more rigorous analysis procedures than the softer ones, with greater ability to avoid experimenter effects.

A good illustration of the importance of a theory being far-reaching can be seen in the most famous negative result in the history of science: the Michelson-Morley experiment. It was motivated by a curious conundrum. Newtonian mechanics was a far-reaching theory that had been enormously successful, and Maxwellian electrodynamics was also a far-reaching theory that had been enormously successful. But the two theories did not coexist very well.

A common solution was the electromagnetic ether or aether. If one moves relative to it, one gets some correction terms to Maxwell's equations from that motion, and it ought to be possible to observe the effects of those correction terms.

Michelson and Morley tried to observe the effects of those terms, but they got something like 1/4 of the Earth's orbital velocity, well within experimental limits. Was the Earth stationary? It was hard for that to happen without blatantly violating Newtonian mechanics. Does the Earth drag the ether? That solution could work, but it has other testable consequences. No evidence of ether drag either. Does moving through the ether alter space and time? That solution does work, but Albert Einstein showed that the most successful version of that solution makes the ether physically meaningless. He also showed that one must revise one's expressions for momentum and kinetic energy, thus turning Newtonian mechanics into a low-velocity limit.
 

Back
Top Bottom