The credibility element is unfortunately indispensible.
Utterly false. You've made it necessary by deciding to collect not just the data, but also extraneous information that only feeds your additional attempts to rig the experiment. If you just stopped trying to rig the experiment, you'd find you don't need any of that.
If I ask "did I write 1, 2, 3 or 4", I won't necessarily find that more than 25% of participants gave the right answer.
Right, in which case there is no demonstrated effect to explain, and your hypothesis fails. There is no getting around this, no matter how many times you want to call us stupid. You're trying to rig the experiment. You say that even if you get only the results statistically predicted by chance, some of the hits will still be by telepathy and not by chance. You say that even if a guess is a miss, you can tell whether it was an intentional miss on the part of a snarky participant who did read your mind but decided to say something else instead just to toy with you.
So you've introduced a completely subjective and irreproducible variable that allows you to sift the "true" hits from the "false" hits, and the "true" misses from the "false" misses, all the while knowing the answers. As many have pointed out, at that point you might as well throw statistical analysis and
p-values out the window because you're literally doing nothing
but cherry-picking. No comparison to any expected distribution will have meaning.
These telepathy tests are extraordinary...
No, they really aren't. Either your subjects can guess what you're thinking at a rate better than chance, or they cannot. Once you show there is an effect that requires explanation, then you can construct an additional experiment to explore potential causes. Everything else is covered in the null hypothesis.
As jsfisher says, establishing credibility in the scientific community means paying attention to their expectations regarding methods, protocols, and statistical models. Proposing something that is indistinguishable from cherry-picking and claiming it to be a novel method for exploring a variable that you claim is preventing you from achieving a meaningful
p-value otherwise is exactly the sort of nonsense this science has been inundated with for decades. Do you honestly think people haven't tried to rig the data in exactly the same sorts of ways before?
There are probably some psychological barriers.
Yes, but not the ones you think.