New telepathy test, the sequel.

I know this case, which I guess follows from it being famous, and slightly absurd. And also because a surprising number of landmark tort cases seem to involve railroads. Isn't there also one where a guy gets hit by a train and his flying torso takes out some old lady? I mean, this thread is about the probability of silly events, right?


Read the statement of facts in the link I gave you, no cheating, and then tell me what you think happened. The facts are first and are just a paragraph.
 
Read the statement of facts in the link I gave you, no cheating, and then tell me what you think happened. The facts are first and are just a paragraph.

Sure, I'm game. Guy tried to catch a train while carrying an unmarked package containing fireworks. Two employees of the railroad variously tried to help him board the train. In the process, the package was knocked out of his hands and onto the rails. The fireworks in the package exploded, the shock causing a scales to fall on the plaintiff who was waiting on the same platform for a different train, injuring her.

ETA: I may have mentioned that I own the closest private drinking establishment to our university's law school. Grades dropped for the 1Ls yesterday, so it's busier tonight than usual. Faithful to your instructions, I completed the assignment without mentioning it to anyone. Then after having committed my summary to to the forum, I opened up the notion that the facts in the case were inaccurately recited in the opinion. Voice from the back of the room: "Palsgraf is ********!"
 
Last edited:
True, but all experimental science involves at least one human being - the experimenter, who needs to ensure he's not seeing patterns that aren't really there.

“Science is a way of trying not to fool yourself. The principle is that you must not fool yourself, and you are the easiest person to fool.” - Richard Feynman
 
Sure, I'm game. Guy tried to catch a train while carrying an unmarked package containing fireworks. Two employees of the railroad variously tried to help him board the train. In the process, the package was knocked out of his hands and onto the rails. The fireworks in the package exploded, the shock causing a scales to fall on the plaintiff who was waiting on the same platform for a different train, injuring her.


Now erase your entire conception of the train platform. It was actually exceedingly busy - crowded with riders and merchants, like you might see in a silent movie of the era (1920). In reality, the guy running for the train was hoisted into the car by his ear. This caused his package to drop and explode. Hearing the loud bang, waiting passengers panicked. As a huge crowd, they began running in this direction and that. In the commotion, they knocked over a lot of merchant carts and other stuff. One of them was a large set of scales, the balancing arm of which wasn't attached to the base. This fell and clonked Mrs. Palsgraf.

So, was did the to the railroad owe a duty to waiting passengers because it was foreseeable that a large, unmarked package could fall from some guy's hands and actually be fireworks and explode with enough force to knock over some scales? No, according to Cardozo.

However, was is foreseeable that packing their train stations full of people and unlicensed merchants would eventually result in some sort of event that would panic them, cause a stampede, and hurt some of them? Both the trial judge and two of three Appellate Circuit judges thought so, as well as two of five of the judges on Cardozo's court. He wrote, "minds cannot differ" regarding his ruling, though 5 of 9 judges who heard the case came to the opposite conclusion.

Bias in the writing of the opinion led to a concept of torts that, with some refinement, endures today.
 
Were you mistaken when you said, "the sample is too small" then?
No, I don't think so. One can achieve a high hit rate even when a small number of answers is given.

For example, assume just one answer is given in a "Did I write and circle 1, 2, 3 or 4 on my paper?" test, and this answer is correct (for example, I wrote "4", and one person said "I believe you wrote 4"). I could then (correctly) claim a 100% percent hit rate, which, at first glance, looks quite remarkable. However, this answer could also just be a lucky guess (it could have been just chosen on a random number generator, like this one: https://www.random.org/integers/).

This is why it may be important to calculate a p-value, which gives (roughly) the probability that the success was just a lucky guess.

Note that any serious investigator of telepathy would also find a huge difference between the answer "I believe you wrote 4" and just "4" (as an answer). This difference seems to be rejected by many posters in this thread, probably because they actually like telepathy (or the idea that Michel H is telepathic) somewhat like Donald Trump likes Iran: only in words and through sanctions (in reality). They are exasperated by evidence, they claim it was sarcasm whenever they see it, they can't handle it in a rational, sound and honest way.
 
No, I don't think so. One can achieve a high hit rate even when a small number of answers is given.
...
This is why it may be important to calculate a p-value, which gives (roughly) the probability that the success was just a lucky guess.

No. That's why you report also the degrees of freedom in that p-value, and why you obtain enough data points that there actually are degrees of freedom. You're trying to hang your hat on one number as if it fully described the behavior of your data. It may be as Loss Leader said -- you hastily Googled an online calculator for binomial probability and assumed that was all that the field of statistics needed in order to crown you a competent scientist. A p-value with df=1 or df=2 is largely useless. Your "roughly" qualification matters far more in real science that you're letting it. You don't get to just sweep it under the carpet and mislead these good people into believing you know what you're doing.

Note that any serious investigator of telepathy would also find a huge difference between the answer "I believe you wrote 4" and just "4" (as an answer). This difference seems to be rejected by many posters in this thread...

No. A serious investigator designs his experiment so that only the data are collected. Editorial comment or issues of sincerity in the participation are obviated by experimental protocols before the data are collected or know -- even before the subjects are accepted for participation. They are most certainly not latched onto by the researcher afterward as a way to "correct" the data. Instead you've designed a poor experiment. And to correct the effects of your poor experiment design, you simply add more error to it by blatantly dishonest post hoc practice.

What your critics object to is your unwillingness to change your experiment design to weed out the insincere participants before they get to provide data. You insist on collecting bad data. You freely admit that the "credibility rating" part of your experiment is its heart and soul. That's obviously true. And the reason it's the core of your experiment is because that's where you subjectively hand-pick the data to get the outcome you want.

...probably because...

And here you go speculating again.

...they actually like telepathy (or the idea that Michel H is telepathic) somewhat like Donald Trump likes Iran: only in words and through sanctions (in reality). They are exasperated by evidence, they claim it was sarcasm whenever they see it, they can't handle it in a rational, sound and honest way.

Your method is not rational, sound, or honest. It's accepting or rejecting data via ad hoc criteria after you see whether it's the answer you want or not. You're trying to do exactly what the whole scientific method was invented to keep people from doing, and you're offering us nothing but delusions of grandeur to say that it's okay in your case to do so. You're not some great scientific sage who has to endure uninformed criticism from the unwashed masses.

No. You're not being challenged because the people here don't want to believe in telepathy. We've already told you there's no bias here against telepathy itself, and we've explained why several times. The notion that skeptics don't want telepathy to exist in the world is a fantasy invented by claimants with poor cases, who really, really want to be believed anyway and want there to be some easily-dismissed ideological reason behind criticism against them. You're being challenged because you're a poor scientist, not because everyone hates you for claiming to be a telepath.

As I said, no one is exasperated because of the evidence. We're exasperated by your assiduous rejection of better ways to get more reliable evidence, and more of it. You bet the farm on your ability to detect serious answers from fake ones, an ability that disappears entirely when tested. You even have people telling you flat-out that they weren't serious, but because the gave you the number you wanted you have to double-down and insist that they must all now be lying. That's not sound, rational, or honest in the least. It's obviously a desperate rhetorical move to hand-craft a data set that tells you want you want to hear.

We don't care whether telepathy exists or not. We don't care whether you're a telepath or not. What we care about is whether something purported to be science actually is. What we care about are people misusing science for crass personal gain.
 
No, I don't think so. One can achieve a high hit rate even when a small number of answers is given.
But you said the sample size was too small to allow a conclusion to be drawn. Now you're saying that a small sample size can allow you to draw the conclusion you want.

For example, assume just one answer is given in a "Did I write and circle 1, 2, 3 or 4 on my paper?" test, and this answer is correct (for example, I wrote "4", and one person said "I believe you wrote 4"). I could then (correctly) claim a 100% percent hit rate, which, at first glance, looks quite remarkable. However, this answer could also just be a lucky guess (it could have been just chosen on a random number generator, like this one: https://www.random.org/integers/).
That's the issue, isn't it? ALL of the answers are guesses and about 25% of them will be your number. That's what every one of your tests has shown.

This is why it may be important to calculate a p-value, which gives (roughly) the probability that the success was just a lucky guess.
You haven't a clue about that, do you.

Note that any serious investigator of telepathy would also find a huge difference between the answer "I believe you wrote 4" and just "4" (as an answer). This difference seems to be rejected by many posters in this thread, probably because they actually like telepathy (or the idea that Michel H is telepathic) somewhat like Donald Trump likes Iran: only in words and through sanctions (in reality). They are exasperated by evidence, they claim it was sarcasm whenever they see it, they can't handle it in a rational, sound and honest way.
Then we should start over with you engaging in a fair and blinded test such as LL has suggested. That will allow you to prove that Michel H has telepathy to all the mean skeptics.

You can assign all the "credibility" you want (before you know whether an answer is correct or not).

When would you like to start?
 
No, I don't think so. One can achieve a high hit rate even when a small number of answers is given.

For example, assume just one answer is given in a "Did I write and circle 1, 2, 3 or 4 on my paper?" test, and this answer is correct (for example, I wrote "4", and one person said "I believe you wrote 4"). I could then (correctly) claim a 100% percent hit rate, which, at first glance, looks quite remarkable. However, this answer could also just be a lucky guess (it could have been just chosen on a random number generator, like this one: https://www.random.org/integers/).

This is why it may be important to calculate a p-value, which gives (roughly) the probability that the success was just a lucky guess.
...

I'll admit that I'm no scientist of any type, but it seems to me that p-value is properly applied to a data set testing a hypothesis, and you're trying to use it here to assess a data point. The idea that, the fewer the data points the more useful the set, is just ridiculous frosting on a whole cake of wrong.
 
I'll admit that I'm no scientist of any type, but it seems to me that p-value is properly applied to a data set testing a hypothesis, and you're trying to use it here to assess a data point. The idea that, the fewer the data points the more useful the set, is just ridiculous frosting on a whole cake of wrong.

Here we retread the methodology ground I covered in my lengthier post. If you receive, say, 8 responses and you cull 5 of them ostensibly by metadata criteria and are thus left with 3, and of those 2 are successes, then you have data that can ostensibly be fit to a binomial distribution with a size of 3 and a probability of 0.25. But these paltry three data points fit any distribution so coarsely as to be effectively useless. Michel admits that the binomial probability computable on this data is "rough," but he begs you to ignore the effect that coarse fit has on the solidity of his case. Since the p-value seems to be the only thing he knows how to compute, he wants to pretend its the only thing that determines the credibility of his findings.

Now if we had a binomial distribution with N=50 or so, for a p of 0.25, we can do a helpful thing: we can suggest that the data also fit a normal distribution, with certain parameters applied to correct for the asymmetry of the distribution. This is helpful because the normal distribution has properties that favor it over the binomial distribution, and for various experimental outcomes, we can actually achieve a higher degree of confidence with it using fewer subjects than would be needed in the binomial case. Or, conversely, if we have that many subjects we can achieve a higher degree of confidence in the result.

But this is not in fact what these kinds of researchers do. Instead they do a run comprising a large number (e.g., N1=25) of Bernoulii trials, then use the hit rate from that one run (0.67 in the case we're considering) as a single data point in a larger scheme of experiments. A suitably large number (say N2=75) of hit rates from a series of runs, each with a suitable number of Bernoulli trials to give it stability, will produce a normal distribution. If the null hypothesis holds, this normal distribution will have a mean of 0.25 and suitable central tendancy. If the null hypothesis is falsified, the normal distribution at mean=0.25 will have only a 5% probability of fitting the experimental data. And the experiment will have enough degrees of freedom to enshrine this as a stable statistic. The reason real researchers do it this way is because it almost completely obviates any instability arising from the coarseness or asymmetry of the binomial distribution, and instead produces a symmetric, well-behaved normal distribution. According to the proper methodology, Michel indeed has only one data point. Almost nothing can be concluded reliably from it.
 
The idea that, the fewer the data points the more useful the set, is just ridiculous frosting on a whole cake of wrong.


Indeed, which is why political polling tends to need at least 800 participants to even have a chance at calling itself significant (and still may have a margin of error of up to 5%).

Then, polsters can manipulate that data by, say, undercounting people of color (because they're less likely to vote) or overcounting people of color (because they're a larger % of the population than they are of the poll participants). All of that, however, is post hoc number-fitting to appeal to a particular client. Good demographic fitting needs to be done by individuals (or algorhythms) who don't know (or care) what the raw data said.
 
Indeed, which is why political polling tends to need at least 800 participants to even have a chance at calling itself significant (and still may have a margin of error of up to 5%).

Indeed, the question you want to answer is whether the margin of error overlaps the margin of success. If you need 51% of the vote to win an election, and your poll shows 55% of the electorate in your favor with a ±7% margin of error, then you should still be nervous. Only when the margin of error excludes the margin of success to the agreed upon degree of confidence can you breathe easy. The margin of error in Michel's "two out of three" study is so wide as to render it almost completely impotent.

All of that, however, is post hoc number-fitting to appeal to a particular client.

Which, obviously, is what Michel is trying to do. He makes it even worse by stating after the fact what the alleged criteria were for excluding some particular data.

In a real experiment you will always need to exclude some subjects along the way. Say the trials for a new drug require the subjects to abstain from alcohol because it is suspected that it will interfere with the test. If it becomes known later, via a routine urine test, that a subject had consumed alcohol during the test, the researchers must exclude that patient, even if they know the outcome of his data. The difference here is that the criteria were spelled out ahead of time, and that the fit to the criteria was made a matter of objective measurement, not subjective opinion. The world is messy, which is why doing science in it means obeying the method and subjecting the data to the statistics.

Here Michel wants to pretend that his "credibility" metric is just such a thing. But it isn't. It remains open-ended and subject to whatever whims he wants to apply along the way. "All responses must be sincere," is a reasonable limit to place on data. "I think your original correct answer was sincere, and that you're lying about it now for ideological reasons," is not an objective application of such a standard. And far better is an obviation altogether of that standard. You apply subject-selection protocols before you collect data from them.

Good demographic fitting needs to be done by individuals (or algorhythms) who don't know (or care) what the raw data said.

Ideally before the data are even collected. For example, Philip Zimbardo in his notorious Standford prison experiment required subjects of a certain psychological disposition. This was because his planned protocol would have been affected by variables arising from such things as predispositions to violence or to sympathy. Hence before he even divided up the group into "guards" and "prisoners" and ran his experiment, he ensured that the subjects were unlikely to exhibit behavior that would taint whatever results he obtained later. The subjects who did not lie in the center of the various psychometrics that indicated Zimbardo's requirements were dismissed from the study before it even began. It's a stellar example of how to plan an experiment involving human subjects from whom you want reliable results.

It's also a stellar example of what can happen when the researcher loses his objectivity, even if he does everything else right.
 
I've started to wonder whether all this is a misunderstanding over the use of words. Michel has been completely clear that he only wishes to try his experiment when he can read the answers before culling the data. I wonder if when Michel uses the term "telepath" he simply means what the rest of mean by the word "read."

I do think that we can all agree Michel has produced evidence that he can read. Maybe rather than arguing with him, when he says he has evidence of being a "telepath" we should all just be agreeable and say "yes Michel, you can read."
 
Now erase your entire conception of the train platform.

The revised facts are enlightening. I agree they alter the pictures that I as a layman would paint of duty and causation as I understand them.

If I consider it from the perspective of the proximity of the cause of Mrs. Palsgraf's injury, it seems to interject another element. The railroad employee caused the man to drop the package, which caused it to explode, which caused the crowed to panic, which caused an unsecured apparatus to fall. A devil's advocate would argue that the railroad's duty can't possibly extend to such a parade of hypotheticals. If this is where Cardozo's reasoning proposed to go, why omit the fact of one more link in the causal chain, separating the respondent's foreknowledge even further from the appellant's injury? You provide the answer:

However, was is foreseeable that packing their train stations full of people and unlicensed merchants would eventually result in some sort of event that would panic them, cause a stampede, and hurt some of them?

Exactly. I think a duty-based analysis may be more instructive to this case given the revised facts. A great many people packed into a small space with limited opportunities for egress and obstacles impeding ambulation should fall within the reasonable foresight of any proprietor. Should something cause a panic, the ensuing stampede is likely to cause the type of injury Mrs. Palsgraf suffered. Today we recognize that almost intuitively as we impose occupancy restrictions on spaces open to the public. Since we generally fail at precluding all the many, many causes of a panic, our duty devolves to mitigating the potential of a panic to injure. Cardozo seems to have absolved the railroad of its duty to foresee the effects of crowded, inherently unsafe conditions that were within their control to mitigate.

We draw the analogy to science. It is impossible to prevent all the ways nature can interfere with our attempts to collect good data. So we evolve methods of mitigating the effects. But where we can mitigate the cause, we have a duty to do so, because to eliminate the cause is almost always more effective and robust than to struggle with the result.

He wrote, "minds cannot differ" regarding his ruling, though 5 of 9 judges who heard the case came to the opposite conclusion.

And, as you say, he recited the facts of the case in a way that seemed to make his reasoning unassailable. That is its relevance here. Michel wants to paint the picture that numbers don't lie -- the math is irrefutable. Well, people lie using numbers. A statistic is not an inviolable imprimatur on the strength of the underlying science.

I note that the dissent in Palsgraf doesn't seem to mention the conditions on the platform. In unraveling the causal chain, Andrews writes, “The only intervening cause was that instead of blowing her to the ground the concussion smashed the weighing machine which in turn fell upon her.” Palsgraf v. Long Island R.R. Co., 248 N.Y. 339, 356 (N.Y. 1928) So was Cardozo's court uniformly misled? The minority seems content to agree with Cardozo's reconstruction of the accident. They seem to want to focus on his calculus of proximity, arguing that it should encompass more causes and effects.

As the erudition of the dissent shows, reasonable minds can differ even within the distorted picture of the facts. Add in the rest of the facts, and it becomes clear that reasonable minds should differ.

Bias in the writing of the opinion led to a concept of torts that, with some refinement, endures today.

And I understand that the doctrine of torts is something that really requires a deep dive to understand properly, which is why we have law schools. To that point, my understanding is that the case method in law school requires students simply to accept the facts presented and discuss the legal reasoning based on that representation. So I don't imagine that a lot of 1Ls are ever presented with the notion that the facts in Palsgraf are disserving. And I wonder how that affects the doctrinal understanding of torts that you refer to. It certainly explains the outburst I reported last night.
 
I'll admit that I'm no scientist of any type, but it seems to me that p-value is properly applied to a data set testing a hypothesis, and you're trying to use it here to assess a data point.
One can calculate a p-value too when the data set reduces to just one data point, it's just a special case.
The idea that, the fewer the data points the more useful the set, is just ridiculous frosting on a whole cake of wrong.
I never said that a data set is more useful when there are fewer data points. Usually, the opposite is true, of course.
 
One can calculate a p-value too when the data set reduces to just one data point, it's just a special case.

One can compute a value, but that doesn't mean the statistic represented by that number means anything. You seem hung up on the notion that because your online calculator gives you a value, it must represent a useful, rigorously-determined concept. I and others are testing you on your knowledge of what a probative statistical model of data entails. You simply don't know.

I never said that a data set is more useful when there are fewer data points. Usually, the opposite is true, of course.

Is there a minimum N for the dataset to be valuable at all? I'm not asking because I want to know. I already do. I'm asking to see whether you know.
 
Cardozo seems to have absolved the railroad of its duty to foresee the effects of crowded, inherently unsafe conditions that were within their control to mitigate.


Hey, guess which judge had most of his money invested in railroads? It's Cardozo.


So I don't imagine that a lot of 1Ls are ever presented with the notion that the facts in Palsgraf are disserving. And I wonder how that affects the doctrinal understanding of torts that you refer to. It certainly explains the outburst I reported last night.


We were. "Facts matter," is, in hindsight, a better lesson than "proximity."
 
I've started to wonder whether all this is a misunderstanding over the use of words. Michel has been completely clear that he only wishes to try his experiment when he can read the answers before culling the data. I wonder if when Michel uses the term "telepath" he simply means what the rest of mean by the word "read."

I do think that we can all agree Michel has produced evidence that he can read. Maybe rather than arguing with him, when he says he has evidence of being a "telepath" we should all just be agreeable and say "yes Michel, you can read."
One of the things I can read is, for example, this post:
... I do indeed have ESP, and know for a fact that he wrote 2!
The answer given (2) was correct. This is an example of a serious-sounding (or credible) post, which gave the correct answer. Everybody can see that credible answers tend to be correct.
 
One of the things I can read is, for example, this post:

The answer given (2) was correct. This is an example of a serious-sounding (or credible) post, which gave the correct answer. Everybody can see that credible answers tend to be correct.

Everybody can see that your dishonest bias makes you assign high crediblity to correct answers after you know what each answer is.

Wouldn't you be happier showing that you don't have to be dishonest?
 
Everybody can see that credible answers tend to be correct.

We can see that you contrived that to be the case. That's the problem. You want it believed that it's okay to engage in such things and still call it science. In fact what you're trying to do is what science was invented to prevent.
 

Back
Top Bottom