Perhaps you should try to essentially prove your hypothesis using Bayesian statistics.
I realize this is a veiled reference to Jabba, but the data-dredging techniques that PartSkeptic's pseudoscience has relied upon are, in some cases, alleviated by a Bayesian approach to the analysis.
Make no mistake, there are many things we can learn from mining existing datasets if we go back and look for patterns we previously didn't seek. However, we must be
very careful to ensure that such patterns are not accidents or artifacts. Time and again we see patterns emerge from large datasets with head-turning
p-values. Rather than say, "Oh, wow, this must be a highly conclusive result," the prudent approach is to reason instead from a properly formed null hypothesis and a reasonable estimate of prior probability. Even more prudence requires repetition with an ostensibly similar dataset.
Here's how science normally works. Anecdotes from the field suggest a previously unseen causation. You arrange to collect data to test first whether there's a correlation between indications of the cause and indications of the effect. If there is none, you dismiss the anecdotes. Let's say there is, but correlation is not causation. To test causation, you hypothesize various ways in which that causation might occur. Each of those hypothetical mechanisms produces side-effects whose results might not have anything to do with the effect you care about, but are nevertheless observable. This is the deduction step of science. You deduce that if some particular mechanism is the operative one, you'll be able to tell that by observing the presence or absence of the side-effect. That's the differentiation step of experiment design. Now you don't want to waste empirical effort, so you gather the whole set of possible observables from several different hypothesis and you arrange to measure them all in the same experiment.
But wait, there's more. You're a good scientist, so you reason that some of these side-effects will have multiple possible causes. You want to control for those. So you gather together all the possible other causes of those side-effects and agree to make sure they are evenly distributed between your control and experiment groups. And you realize that there may be untold causations, so you collect a whole bunch of other common data in hopes of ruling out confounds you haven't initially thought of. For human subjects, there is a whole standard set of demographic, health, and other kinds of data that are common correlates to things we want to investigate about them. That's why human subjects in an experiment fill out an intake form that has a whole bunch of seemingly irrelevant information. The goal is to divide the subject pool -- whether humans, rodents, or rutabagas -- into groups that differ
only by the purported causation(s).
The null hypothesis is that your hypothesized causation doesn't occur. That is, if the results appear to coincide with what you expect were the hypothesis true, what's the probability that's just random happenstance? Science says you need a 95% probability that the significant result is not a fluke.
Right, we all learned that somewhere in our education. But here's how pseudoscience works. You've collected all this data, all these rows in the database with all those columns corresponding to variables you measured for the subject, whether they belonged to any particular hypothesis or not. And there's the summer intern sitting there with nothing to do. You have her run variance analysis not just on the variables that you deduced, but all the variables you collected. Indeed, all the
combinations of variables. Lo and behold, she finds a statistically significant correlation for brain tumors in the subgroup of gay, left-handed, model train enthusiasts.
Yikes! What about any of that could possibly cancer? That's the rub. That "conclusive" result didn't arise out of any sort of causal reasoning. Sexual orientation, handedness, and a particular choice of hobby have nothing reasonably to do with each other. None separately and nothing about them together has any health consequence that oncologists would deduce according to our extensive knowledge of carcinogens. The error in judgment here is to presume that the statistical analysis cannot lie and must be taken at face value. The fallacy is, "Numbers don't lie, therefore there must be
some causation at play even if we can't imagine what it is."
Subgroup analysis is one of the new darlings of pseudoscience. We expect that everyone knows the basics of statics. We express quantities in fundamental terms such as percentages and margins of error, and expect everyone to be able to reason from there. We expect that something like a "batting average" doesn't require a lot of explanation. Lots of concepts in statistics are necessarily intuitive. Comparatively few -- but enough -- know about such things as
p-values and statistical significance in scientific research. They know that if something doesn't rise to a certain level of prominence, it should be discounted as chance, even if they aren't professional scientists. Conversely, if a study produces a result with very small
p-value, and the study has been properly conducted, it reflects a reliable conclusion. Again, details that aren't common knowledge but can be easily learned. This is the new sophisticated layman that subgroup hacking is intended to fool
It's now possible to mine data to absurd lengths. So let's say I have a hypothesis A and a measurable outcome X. I also collect control information, B, C, and D, all of which are potential confounds to X. I want to see whether X varies only with A, and not with B, C, or D. Software to do that is essentially free. But since it's quarantine and we're all stuck indoors, I can play with the software. Does X correlate to the Boolean expression (B and C)? (A or B, but not C)? (not B, but C and D)? If you have a lot of control variables, you have a practically infinite set of algebraic possibilities you can test for "significance."
But are any of these actually significant? Does (not B, but C and D) correlate to anything that makes sense in terms of what those variables actually measure? I originally set out to test A versus X, not "some pile of gibberish" versus X. More importantly, what are the chances -- statistically speaking -- that
no contrived combination of variables will predict X? Subgroup hacking is predicated on the layman's assumption that this will be a very rare event. And in fact it may be, but that doesn't matter. We have the computational ability to test astounding numbers of algebraically definable subgroups with little effort. That slicing and dicing is quite likely to produce accidental correlations, whether it's what the experimenter set out to discover or not. And whether the correlation makes any logical sense or not.
Fringe claimants love this, because the vast amount of computation required to stumble onto an accidental correlate can be spun to seem like the researchers were dedicated and exhaustive in their research, rather than the undirected groping in the dark it really is. If you're committed to keep going, slicing and dicing more absurdly until you get a positive result, you
will get a positive result. That doesn't make it valid science. It's just the scientific version of
ad hoc refinement, wrapped in a misleading statistical disguise.