• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical significance

In particular, if I understand the example right, the outcome probability space is different for different experiments.

It sounds like you understand the example right.

For example, the probabiliiy of the couple having a single boy and a single girl is 0.25 for experiment A, the case where the parents just want one of each. The corresponding probability is zero for experment B, where they just want eight kids, irrespective of sex. Similarly, the probability of three boys and five girls is zero for experiment A, non-zero (I'm too lazy to figure it exactly) for experiment B.

The odds you were too lazy to figure out are 7/32.

Given that the underlying probability mass is different, the fact that the probability mass in the rejection region defined by the same words differs can hardly be considered to be a fault of the statistics.

The statistics calculates exactly what it said it would calculate. I did not mean to imply fault in the calculation of the statistics.

The problem lies in how people interpret and act on those statistics. We decide whether or not to reject a null hypothesis. We will make decisions and carry out actions differently after these two experiments. Is it reasonable to do so?

Well let's take the most reasonable of all possible procedures for drawing an inference. And that is to use Bayes' Theorem. Suppose, for instance, that the experimenter starts with the following prior expectations:
  1. 50% chance of the couple having boys vs girls be 50-50.
  2. 20% chance of having the odds be 55-45.
  3. 20% chance of 45-55.
  4. 5% chance of 100-0.
  5. 5% chance of 0-100.
What should the expectations be after observing either experiment A or B? Well it turns out to be the same, you just apply Bayes' Theorem. Under option 1 the odds of the observed outcome is 0.00390625, so the odds of option 1 and the observed outcome is 0.001953125. Option 2 gives 0.0068509585546875 for odds, so that plus the outcome is 0.0013701917109375. Option 3 gives 0.0020551819921875 so that plus the outcome is 0.0004110363984375. Options 4 and 5 say the result was impossible.

So let's crank it into Bayes' formula. According to the experimenter's expectations, the probability of the outcome was 0.003734353109375 so our revised expectations are 52.3% for option 1, 36.7% for option 2, 11% for option 3 and 0% for options 4 and 5.

This change in expectations is true whether the experiment that was run is version A or B.

In short, by the most reasonable method we can find for adjusting our expectations in the light of further evidence, the differences in experimental design are absolutely and completely irrelevant. In fact it isn't hard to prove that, no matter what set of prior expectations the experiment had, the design difference will be irrelevant.

So when we take the step of using the results of hypothesis testing to draw an inference and make a decision, we are making our decisions in a way that is not consistent with any set of possible prior expectations. And we are doing so because (as 69dodge pointed out) we are explicitly taking into account in our decision the likelyhood of things that didn't happen. (Note that Bayes' formula completely ignores the might have beens that didn't happen - they can't matter to it.)

Cheers,
Ben

PS Note that I am not arguing for throwing out hypothesis testing. As I said before, it gets simple to interpret results when alternatives either produce nothing or give very complex answers. While acting according to what hypothesis testing tells you can lead to some absurd choices, most of the time it leads to fairly reasonable decisions.
 
One thing I have wondered. If 95% significance level is good enough, for example, does that mean that 1 in 20 tests actually be wrong.

The short answer is, "No."

The medium answer is, "That statement shows the confusion that most people have about hypothesis testing."

The long answer is, "1 in 20 times when the null hypothesis is true, we will incorrectly reject it. However we have no idea how often the null hypothesis is true, and without knowing the correct hypothesis we have no way to determine how often we do not correctly decide to reject the null hypothesis."

The nutshell is that hypothesis testing only concerns itself with limiting the odds of making one type of error (incorrectly rejecting the null hypothesis) and says absolutely nothing useful about the true odds of any hypothesis. It is very often misinterpreted as doing so, and that is always a mistake in someone's understanding.

Cheers,
Ben
 
Huh? That makes no sense to me whatsoever.

Why is it absurd? It is a fact that, no matter what beliefs you have about the world, possibilities that didn't happen shouldn't affect how you change your beliefs. (Your beliefs about the odds of that not happening might matter, but the things that didn't happen don't.)

Mathematically it can't. Arguing that it should is like arguing that your bank account is going down because there is dust blowing on Mars. (Actually it is worse than that because there is a logical possibility that you will lose money from a bet about whether dust is blowing on Mars.) You are caring about what is irrelevant.

"I lit the fuse, but the firecracker didn't explode. Therefore it must have been a dud."

"That's absurd!"

"What do you mean, that's absurd?"

"Well, how do you know that a leprechaun didn't come out and pee on the fuse while your back was turned?"

".... um,.... what?"

I don't see how you think this analogy relates to the discussion. Unless you are trying to support my point. (Which is that possibilities that didn't happen, like the lerechaun peeing on the fuse, are totally irrelevant.)

Cheers,
Ben
 
The problem lies in how people interpret and act on those statistics. We decide whether or not to reject a null hypothesis. We will make decisions and carry out actions differently after these two experiments. Is it reasonable to do so?

Er --- yes, it it? Different questions and backgrounds provoke different experimental designs, which in turn generate different actions. There's an implicit "duh" in there somewhere, I think.


Well let's take the most reasonable of all possible procedures for drawing an inference. And that is to use Bayes' Theorem. Suppose, for instance, that the experimenter starts with the following prior expectations:
  1. 50% chance of the couple having boys vs girls be 50-50.
  2. 20% chance of having the odds be 55-45.
  3. 20% chance of 45-55.
  4. 5% chance of 100-0.
  5. 5% chance of 0-100.
What should the expectations be after observing either experiment A or B? Well it turns out to be the same, you just apply Bayes' Theorem.

But why on Earth should the experimentor start out with that particular set of prior expectations?

The problem with Bayesian analysis is that it just pushes the assumptions back one more level., and furthermore, it specifically ignores information (such as the stated intentions of the couple).

This change in expectations is true whether the experiment that was run is version A or B.

In short, by the most reasonable method we can find for adjusting our expectations in the light of further evidence, the differences in experimental design are absolutely and completely irrelevant.

I think I still fail to see why this is a Good Thing.
 
Why is it absurd? It is a fact that, no matter what beliefs you have about the world, possibilities that didn't happen shouldn't affect how you change your beliefs. (Your beliefs about the odds of that not happening might matter, but the things that didn't happen don't.)

This is gibberish. You say that something not happening shouldn't affect how I change my beliefs -- but my beliefs about the odds of something not happening are, by definition, part of my beliefs, and will be changed as a result of something not happening.

If I think such-and-such is a dead cert, and it doesn't happen, I'm certainly changing my belief!
 
OK, what about an example. Say I am going for the JREF Million and my claim is that if a person is in an isolated room looking at a series of randomly selected pictures on a computer screen and I am in another isolated room looking at four pictures, one of which is the image being viewed by the sender, then I will click on the correct image 35% of the time.

What is the design of the experiment, what is the value of p and the size of the sample that would get me the million?
 
And what is the reason for this requirement?
So that a proper statistical statement can be made. Otherwise, you're just data mining.

There's no way to look at the results of an experiment directly, and see what they tell us about a hypothesis?
There's no way to make a porper statistical statement about the result, without setting up a statistical test beforehand. One can make as sorts of conclusions, such as "That looks very convincing" or "I don't think I'll be eating plutonium after seeing what it did to that guy", etc. One just can't make a statistical statement.

What an experiment tells us about a hypothesis depends not only on the actual results of the experiment, but also on some arbitrary decision we made beforehand about rejection regions?
What statistical decisions we make depend on the decisions beforehand. And the decisions are arbitrary regardless of when we make them; making them before collecting the data ensures that we aren't creating them ad hoc. Think of an archer hitting a target: we make decisions about how good the archer is based not only on where the arrow goes, but on arbitrary decisions made beforehand, such as where to put the target. If we put a target on an elm tree, and an archer hits the elm tree, and then we move the target to an oak tree, and another archer hits the elm tree again, we conclude that the first archer is better than the second, even though their arrows went to the exact same place. If we don't have any targets, and we try to engage in reasoning like "well, the elm try is smaller, so the first archer is better", that's rather fallacious.

Before we can decide whether a method is reasonable or not, we need to decide what goal we want it to accomplish. Then we can say that it's reasonable if it accomplishes that goal, and unreasonable if it doesn't.
Well, I think that "efficient" is better than "reasonable" to express the concept you're talking about.

I think it's to help us decide whether a hypothesis is true or not. The decision to "reject the null hypothesis" should depend on, and only on, how much evidence there is that it is false.
I disagree. Also, you need to define "how much evidence".

So if two different experiments give us the same amount of evidence against the truth of a hypothesis, it makes no sense to reject the hypothesis in one case but not in the other.
Sure it does. If you think that some procedure is inefficient, you should reject it before the experiment. You can't look at the procedure after the experiment, declare that it's inefficient, and say that you're therefore going to use some other.

Do you think that the results of Ben Tilly's experiments A and B give different amounts of evidence against the hypothesis of equal boy/girl probabilities?
Yes.

How could they? They're the same results!
No, they're not.

Consider this: there's a game show called "Deal or No Deal". There are 26 briefcases. A contestant chooses a briefcase, then opens the rest, one by one, until they either accept a deal to sell their briefcase, or there are only two briefcases left (at which point they are offered the option to switch). One of the briefcases contains a million dollars. If all of the contestants hold out until there are two briefcases, then 1/13 will still have the million dollar briefcase in play, and 1/26 will have the million dollar suitcase. So there is no benefit to switching.

No suppose there were some variant of the game where instead of the contestant choose which suitcases to open, the host opens suitcases, and always opens suitcases that don't have a million dollars. Now, in 100% of the cases, the million dollars will still be in play, but the contestant will have the million dollars only 1/26 of the time. So now the contestant should switch.

So if one player plays the first game, and ends up with the million dollars in play, and another plays the second game, and also ends up with the million dollars in play, are those the same result? Do they both give the same amount of evidence against the null hypothesis of "I don't have the million dollars"? Should both players come to the same conclusion?

The problem with significance tests based on p-values is that they take into account all sorts of experimental results that didn't happen (namely, all those in the predetermined rejection region). Where's the sense in that?
Well, of course they do. How could they not? If you only look at what happened, then whatever happened happened, so every experiment will have the exact same result (whatever happened, happened).

For an experiment to be worth anything, there must be at least two sets of results, with different conclusions for each. So every conclusion must be based on the fact that it's not in the other set.

As Sir Harold Jeffreys wrote in Theory of Probability (third edition, pp. 384--385, emphasis in original):
But why should these be stated in terms of P? The latter gives the probability of departures, measured in a particular way, equal to or greater than the observed set, and the contribution from the actual value is nearly always negligible. What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred.
Well, that's based on the conception of the test as having the calculations done after the data is collected, and gives more support to the rule that they should be done before data is collected. With the latter conception, there is a rejection region determined beforehand, and all one needs to do is check whether the result is in that region; p-values are not needed. Now, it is possible to set up an experiment such that one can use p-values, but I don't recommend it. Addressing Jeffreys' argument further, he focuses on the things that didn't happen and are "greater departures". But there are also the things that didn't happen and are "lesser departures". The issue is not just how large the former is, but what the ratio of the former to the latter is. Both are "things that didn't happen", but the latter is stuff that "might have happened, just as likely", and therefore isn't included in the p-value.

An experiment has three possible outcomes: A, B, and C. On hypothesis H0, their probabilities are 0.02, 0.02, 0.96. On hypothesis Ha, their probabilities are 0.01, 0.04, 0.95.

I choose a rejection region of {A, B}, whose probability on H0 is 0.04, which is less than 0.05, its probability on Ha.

I run the experiment and the outcome is A, which is in the rejection region. Does this result therefore constitute evidence against H0 and in favor of Ha?
Yes. Your pulling A out of the rejection region is just data mining. You're constructing a new set around data that you already have.

I can play this game, too. Suppose that we split A into two further subsets, A1 and A2. Under Ho, the probabilities are .019 and .01; under Ha, they are .001 and .009. And suppose the result is in A2. Now, if we view it as in A2, it supports Ha. If we consider as being in A, then it supports Ho. If we see it as being in {A,B}, then it supports Ha again. You can't just pick and choose which set you want it to be consider it to be a member of.

The opposite, obviously.
Why?

Why should I care about the probability of possible outcomes that happen to be in the rejection region, if they didn't actually occur?
Because points don't have measure, sets do. To have a probability, you need a set. And you need to decide what set you're using beforehand; otherwise you're engaging in data mining. So they aren't "possible outcomes that... didn't actually occur"; the rejection region must be considered as a whole.
 
Er --- yes, it it? Different questions and backgrounds provoke different experimental designs, which in turn generate different actions. There's an implicit "duh" in there somewhere, I think.

Different experimental designs testing different questions should generate different reactions.

However here we are talking about identical data produced by an identical physical process which will be interpreted differently by the same person. It isn't obvious that there should be a difference in that case. (And indeed the point of the argument is that there should not be one.)

But why on Earth should the experimentor start out with that particular set of prior expectations?

I was using that as an example.

While there is no reason that an experimenter would start with that particular set of expectations, the experimenter probably starts with some set of expectations. And no matter what set of expectations that experimenter has, experiments A and B should not lead to a different posterior conclusion.

The problem with Bayesian analysis is that it just pushes the assumptions back one more level., and furthermore, it specifically ignores information (such as the stated intentions of the couple).

Not guilty in either regard.

The complexity with Bayesian analysis is that it explicitly avoids making an assumption. It even avoids making it possible to accidentally make an unwarranted assumption by misunderstanding a critical term (like "confidence"). As for "ignored" information, that information is not ignored. It is merely provably irrelevant.

I think I still fail to see why this is a Good Thing.

I assume that you have beliefs about the world. I assume that as you encounter more information, you update your beliefs about the world. Well if you're able to quantify your beliefs about the world, then Bayes' formula describes how you absolutely, logically should update your beliefs in light of further information. If you update your beliefs in any other way then you are being illogical.

If you don't understand why this is the case, then review the explanation of Bayes' Theorem either online or in the probability book of your choice.

Of course in practice people don't do this because it is too complex for us to do on the fly. We are, after all, illogical creatures. However Bayes' Theorem is the ideal method of drawing inference. As we've known for over 2 centuries, there is no other method that has nearly as strong a logical foundation as this one.

So if you're using a method of inference that gives results that are nonsense according to Bayes' Theorem, then that is a fact that should make you sit up and pay attention.

Ben
 
This is gibberish. You say that something not happening shouldn't affect how I change my beliefs -- but my beliefs about the odds of something not happening are, by definition, part of my beliefs, and will be changed as a result of something not happening.

If I think such-and-such is a dead cert, and it doesn't happen, I'm certainly changing my belief!

I admit it was poorly phrased.

What I meant is that your prior beliefs about things happening or not happening factor into your updated beliefs after you observe what you observe. But your beliefs should change solely based on what you did observe, and not based on what you didn't observe.

In other words if you see a cat I shouldn't be able to change your beliefs about what you are seeing there by saying, "That isn't a dog." That simply isn't relevant, it is a cat and that is that.

If this still doesn't make sense, then we should drop this subthread and accept that this was a confusing phrasing on my part which didn't convey anything useful.

Cheers,
Ben
 
I'm sorry, I'm perhaps not following this properly. But it seems that your experiment as proposed offers next door to no information at all -- and to the extent that it offers information, offers information in favor of H0[/sub[/i]. So, basically, you ran the wrong experiment. Why should your poor choice of experiments be an argument for or against a statistical theory?
It's definitely not a great experiment. The most probable result is C, which is about as likely on either hypothesis, and so would tell us very little. But a good statistical theory should let us extract as much information as possible from whatever result happens in whatever experiment we do. If the result of my poorly designed experiment should happen to be A, as unlikely as that result was a priori, there's no reason to ignore what it can tell us. Since A was twice as likely on one hypothesis as on the other, its occurrence provides evidence in favor of the one and against the other. It doesn't matter how likely B or C were on either hypothesis, because, although they were more likely a priori, it turns out that they didn't happen. What happened was A, so its likelihood on the two hypotheses is all that matters.
 
Let's get the ad hominems out of the way first, shall we?
Huh? Ad hominems? What do you mean? You're the one trying to make an argument from authority. I just responded by disputing the alleged authority. That's hardly "ad hominem".

I have discussed it since with a number of people, including several statisticians who were tenured professors at different universities.
And none of them took issue with your step three?

I have no idea what your bona fides are to back up your self-identification as a statistician, but if you claim that anyone who disagrees is not a statistician, then you've made a claim that is very much on the outrageous side.
It's a rather basic principle of statistics. Without it, you're not doing statistics.

In experiment A the evidence that is as strong or stronger against the null hypothesis than the observed outcome will occur if the couple has 7 sons then a girl, 7 girls then a son, 8 sons then a girl, 8 girls then a son, 9 sons then a girl, and so on. [/qquote]This is a bit of an abuse of the word "experiment", as it is more of an observational study than an experiment.

According to Bayes' Theorem, under no prior set of beliefs should the difference in design of the experiments make any difference in your conclusions.
Theorems do not speak of "should". I'm rather suspicious of your repeated, and unsupported, claims of what BT "says".

If one takes the view that reasonable people start with a set of prior beliefs which they then continuously modify in the light of experience, then no reasonable person can ever draw the distinction between these two cases that hypothesis testing does.
Sure they can.

Of course if you do not believe that reasonable people should have beliefs and modify those beliefs in the face of experience in a logical fashion, then you may not think that the results of hypothesis testing are unreasonable.
I don't see what they have to do with each other.

However it seems absurd that what your conclusion about what is true is based on what didn't happen.
Not to me.

Bayes' Theorem allows us to quantify the reason why our intuition says that this is absurd. /quote]How?

Therefore hypothesis testing leads to absurd distinctions being made.
Just bewcause you don't understand the reason doesn't mean they are absurd.

Now the p-value drops to 1/8192!
You should put a space between 8192 and !. 1/8192! is less than 10^-25000.

This is a drastic change in the strength of our conclusion, yet the extra coin flips gave us absolutely no information about the likelyhood of sons versus daughters!
It is not, on average an increase in the "strength" in our conclusion. It's an increase in the "strength" when we get the result, but we will get the result less often, resulting in no average increase.

Things that should be irrelevant matter greatly in hypothesis testing.
More prescisely, one can design an experiment in which issues that aren't part of what's being tested will have an effect.

And we are doing so because (as 69dodge pointed out) we are explicitly taking into account in our decision the likelyhood of things that didn't happen. (Note that Bayes' formula completely ignores the might have beens that didn't happen - they can't matter to it.)
That's not true.

The nutshell is that hypothesis testing only concerns itself with limiting the odds of making one type of error (incorrectly rejecting the null hypothesis) and says absolutely nothing useful about the true odds of any hypothesis.
More precisely, it deals with quantifying that error. How large one wants it to be is up to the experimenter. And theree is susually an effort made to avoid the other type of error; it's just that it can't be quantified.

Why is it absurd? It is a fact that, no matter what beliefs you have about the world, possibilities that didn't happen shouldn't affect how you change your beliefs.
If possibilities that don't happen don't affect your beliefs, then possibilities that do happen shouldn't matter, either.

(Your beliefs about the odds of that not happening might matter, but the things that didn't happen don't.)
If my beliefs matter, then how can the things not matter?

Mathematically it can't.
Why not?
 
But a good statistical theory should let us extract as much information as possible from whatever result happens in whatever experiment we do.
You can choose the most efficient statistic. Or you can choose a particular alpha. You just can't, in general, do both.

Since A was twice as likely on one hypothesis as on the other, its occurrence provides evidence in favor of the one and against the other.
Then you should have thought about that before you started tghe experiment. Here's something to try: go into a casino, go to a blackjack table, and hit on everything less than 21. Now, suppose you hit on 20, bust, and then see that the dealer ends up with 19. You should tell the dealer that, with 20, it was more likely that you'd win than that you'd lose, so you should get your money back, and it's silly to ignore that information. See how well that plays out.

But your beliefs should change solely based on what you did observe, and not based on what you didn't observe.
But those are logically indistinguishible. Observing X is the same thing as not observing not X.

In other words if you see a cat I shouldn't be able to change your beliefs about what you are seeing there by saying, "That isn't a dog."
The knowledge that it's not a dog absolutely may change my beliefs.

However here we are talking about identical data produced by an identical physical process which will be interpreted differently by the same person.
No, it's a different process.

While there is no reason that an experimenter would start with that particular set of expectations, the experimenter probably starts with some set of expectations. And no matter what set of expectations that experimenter has, experiments A and B should not lead to a different posterior conclusion.
They don't.

It is merely provably irrelevant.
Yet you haven't presented the proof.

If you update your beliefs in any other way then you are being illogical.
Nope.

If you don't understand why this is the case, then review the explanation of Bayes' Theorem either online or in the probability book of your choice.
That's both a fallacious and arrogant thing to say.

So if you're using a method of inference that gives results that are nonsense according to Bayes' Theorem, then that is a fact that should make you sit up and pay attention.
"Nonsense" is not a mathematical term.

OK, what about an example. Say I am going for the JREF Million and my claim is that if a person is in an isolated room looking at a series of randomly selected pictures on a computer screen and I am in another isolated room looking at four pictures, one of which is the image being viewed by the sender, then I will click on the correct image 35% of the time.

What is the design of the experiment, what is the value of p and the size of the sample that would get me the million?
P depends on the data from the experiment; perhaps you mean to ask what alpha is? I don't know what JREF uses, but I would imagine it would be one in a million, or stronger. To get that, you would have to answer correctly 10 times in a row (and you would, assuming that your claim is correct, have a probability of 27 in a million of doing so). If you have 100 pictures, then you would have to identify 45 correctly (giving you a 1.5 chance). With 1000, you would need to get 310 correct, giving you a 99.6% chance.
 
Let's get the ad hominems out of the way first, shall we?
Huh? Ad hominems? What do you mean? You're the one trying to make an argument from authority. I just responded by disputing the alleged authority. That's hardly "ad hominem".

Ad hominem means "of the man", and an ad hominem argument means one where you are appealing for or against the person making the argument rather than for the quality of the argument. So you claimed in essence, "I am an authority and anyone who is will agree with me." I respond by saying, "So and so is an authority and disagrees with you." This is an ad hominem argument on both of our sides.

I was acknowledging that was happening before moving on to the actual discussion of substance. Which you'll note I have not been conducting through "argument from authority", but rather by presenting detailed examples and calculations.

I have discussed it since with a number of people, including several statisticians who were tenured professors at different universities.
And none of them took issue with your step three?

Not only did they not, but it was one of them who first lead me through that calculations. Several of them pointed to places in the literature where I could find further debate on whether Bayesian statistics should be used more.

I have no idea what your bona fides are to back up your self-identification as a statistician, but if you claim that anyone who disagrees is not a statistician, then you've made a claim that is very much on the outrageous side.
It's a rather basic principle of statistics. Without it, you're not doing statistics.

Who is arguing from authority now?

Would you mind explaining why it is a basic principle of statistics? If it is, you should be able to provide a cogent reason why it should be so. While you're at it, I would appreciate an explanation of how you would analyze the results of both experiments A and B.

(As an aside, I hate the fact that the interface here loses nested quotes. Because as things stand I have no idea what the exact phrasing of the claim is unless I want to go back and track it down by hand. And I'm getting really tired of cutting and pasting in the previous discussion to add context for what I am saying...)

In experiment A the evidence that is as strong or stronger against the null hypothesis than the observed outcome will occur if the couple has 7 sons then a girl, 7 girls then a son, 8 sons then a girl, 8 girls then a son, 9 sons then a girl, and so on.
This is a bit of an abuse of the word "experiment", as it is more of an observational study than an experiment.

I don't care what term you wish to use to describe the situation as long as the situation described is clear. However I'll note that the hypothetical mother who was described had to do a lot more than just observe.

According to Bayes' Theorem, under no prior set of beliefs should the difference in design of the experiments make any difference in your conclusions.
Theorems do not speak of "should". I'm rather suspicious of your repeated, and unsupported, claims of what BT "says".

My claims are a matter of easily established fact. If you wish to convince me that I am wrong, all that you need to do is produce a set of prior beliefs which would lead to a different set of posterior beliefs after observing case A and B. I am quite confident that you will fail.

The cause of my confidence is that Bayes' Theorem says that P(X given Y) is P(X and Y)/P(Y). (With appropriate amendments for probability density functions if you wish to go from discrete to continuous distributions.) This concrete formula provides a clear way to factor in ones prior expectations and the observed results. It provides no way for the difference in experimental design to matter.

However, surprise me. Please.

If one takes the view that reasonable people start with a set of prior beliefs which they then continuously modify in the light of experience, then no reasonable person can ever draw the distinction between these two cases that hypothesis testing does.
Sure they can.

Example. Please.

By that I mean give me a detailed set of prior beliefs which, when modified according to Bayes' Theorem in the light of these two experiments, leads to different conclusions. If you succeed I will be both astonished and fascinated to see how it happened.

Of course if you do not believe that reasonable people should have beliefs and modify those beliefs in the face of experience in a logical fashion, then you may not think that the results of hypothesis testing are unreasonable.
I don't see what they have to do with each other.

I strongly suspect that if you try and fail to provide me with the requested example, the connection will become much clearer to you. Of course if you try and succeed in producing the requested example, then I am clearly wrong and there is no connection.

However it seems absurd that what your conclusion about what is true is based on what didn't happen.
Not to me.

Bayes' Theorem allows us to quantify the reason why our intuition says that this is absurd.
How?

Because it shows that any set of prior expectations will lead to the exact same posterior beliefs after observing either experiment A or B. Therefore the details of what might have happened but didn't should be irrelevant to the inferences we draw. All that should matter is that there were 8 children and 7 of them were boys.

Therefore hypothesis testing leads to absurd distinctions being made.
Just because you don't understand the reason doesn't mean they are absurd.

Is there any reason for this not-so-subtle putdown? Particularly when I've fairly conclusively demonstrated that I understand why hypothesis testing leads to a distinction being drawn in this case?

Now the p-value drops to 1/8192!
You should put a space between 8192 and !. 1/8192! is less than 10^-25000.

My meaning was clear and I doubt that anyone was confused by my typology.

I'll note, though, that a grammar nazi might disagree with you on the grounds that there is no actual ambiguity because you would not have a correctly formed sentence unless the ! was taken to be a punctuation mark. Therefore it should not be interpreted as factorial.

This is a drastic change in the strength of our conclusion, yet the extra coin flips gave us absolutely no information about the likelyhood of sons versus daughters!
It is not, on average an increase in the "strength" in our conclusion. It's an increase in the "strength" when we get the result, but we will get the result less often, resulting in no average increase.

But it is an increase in the strength of our conclusion in this particular case. Which is exactly what I was saying.

My point stands.

Things that should be irrelevant matter greatly in hypothesis testing.
More prescisely, one can design an experiment in which issues that aren't part of what's being tested will have an effect.

You're right that I've merely shown that it is possible, not that it is a common occurance. I have no desire to argue how common it is, so I'll let this point drop.

And we are doing so because (as 69dodge pointed out) we are explicitly taking into account in our decision the likelyhood of things that didn't happen. (Note that Bayes' formula completely ignores the might have beens that didn't happen - they can't matter to it.)
That's not true.

I look forward to your elucidation of this point. Preferably with a calculated, worked out, example.

[the rest skipped because I'm tired of repeating myself.]

Regards,
Ben
 
OK, what about an example. Say I am going for the JREF Million and my claim is that if a person is in an isolated room looking at a series of randomly selected pictures on a computer screen and I am in another isolated room looking at four pictures, one of which is the image being viewed by the sender, then I will click on the correct image 35% of the time.

What is the design of the experiment, what is the value of p and the size of the sample that would get me the million?

The experiment that would be insisted on is something like this.

The two of you go to your isolated rooms. The other person is shown a picture, you make a selection. Then the other person is shown another picture and we continue. This will continue for a set number of trials. The question is what that set number of trials should be.

Randi will insist on using 25% odds as the null hypothesis, and will insist on a criterion that would give you only 1/1000 of succeeding by chance. To win the million dollars you will have to succeed twice. So the question is what you want your odds of winning to be. For instance if you set up the experiment so there is a 90% chance that you succeed, then your odds of succeeding twice are going to be 90% of 90%, or 81%.

So how do we work out those odds? Well for a first pass we don't use exact calculations, instead we do approximations.

A single trial is (under the null hypothesis) a random event with 25% odds of having 1 success, and 75% odds of none. This is a random distribution with average 0.25 and variance 0.1875. (Var(X) = E((E-E(X))^2). With 2 cases, you can just calculate that by hand.) Therefore the sum of N of these is approximately a normal distribution with mean 0.25N and variance 0.1875N. Glancing at http://www.math.unb.ca/~knight/utility/NormTble.htm we see that we'll get to Randi's desired p-value at 3.08 standard deviations out. 1 standard deviation is the square root of the variance, so Randi will accept as evidence if you get more than 0.25N+3.08*sqrt(0.1875N).

Analyzing things under your theory is similar, except the mean is 0.35 and the variance is 0.2275. Suppose you are going for the aforementioned 90% chance of success (and therefore an 81% chance of making the million). Then you want to be aiming to have Randi's target at least 1.28 standard deviations below what you think your average will be.

So under your theory you're going to want 0.25N+3.08*sqrt(0.1875N) <= 0.35N-1.28*sqrt(0.2275N). Solving that algebra problem we see that the critical condition is N>=10*(3.08*sqrt(0.1875) + 1.28*sqrt(0.2275))^2 which is about 37.8.

Now if you wanted to be precise you'd have to work out your exact odds using the binomial approximation to see what the actual cutoff that satisfies both you and Randi is, but it will be about 38 trials.

Cheers,
Ben
 
The cause of my confidence is that Bayes' Theorem says that P(X given Y) is P(X and Y)/P(Y). (With appropriate amendments for probability density functions if you wish to go from discrete to continuous distributions.) This concrete formula provides a clear way to factor in ones prior expectations and the observed results. It provides no way for the difference in experimental design to matter.
Ben, I have a question. When you say, P(X and Y), doesn't that mean (P(X) x P(Y))? In other words, does one not multiply the probability of X by the probability of Y to get P(X and Y)?

Thanks. I've never seriously looked into Bayesian statistics, and would like to understand the subject a bit better, and you seem facile with it.
 
So you claimed in essence, "I am an authority and anyone who is will agree with me." I respond by saying, "So and so is an authority and disagrees with you." This is an ad hominem argument on both of our sides.
You have it backwards. You stated that authorities agree with you, and then I disagreed with you,

Which you'll note I have not been conducting through "argument from authority", but rather by presenting detailed examples and calculations.
You most certainly have been conducting an argument from authority. The authority is what you claim Bayes' Theorem says.

Not only did they not, but it was one of them who first lead me through that calculations.
I don't believe you. How did it come up?

Who is arguing from authority now?
Not me.

Would you mind explaining why it is a basic principle of statistics?
Because probabilities apply to random varaibles, not data that've already been collected.

My claims are a matter of easily established fact.
Argument by assertion. You seem to be doing that a lot.

If you wish to convince me that I am wrong, all that you need to do is produce a set of prior beliefs which would lead to a different set of posterior beliefs after observing case A and B. I am quite confident that you will fail.
You are the one with the burden of proof. Furthermore, you keep equivocating between Bayes' Theorem and statements that are beyond the theorems. You haven't addressed any my points:
Theorems don't speak of "should".
Theorems only speak of mathematical concepts, therefore any declaration that Bayes' Theorem makes a statement about a nonmathematical concept is clearly wrong.
You are claiming that hypothesis testing is unreasonable because it's not allowed by Bayes' Theorem, and you say that it's not allowed by Bayes' Theorem because it's unreasonable.

Now, you said "According to Bayes' Theorem, under no prior set of beliefs should the difference in design of the experiments make any difference in your conclusions."

Then, you said "If you wish to convince me that I am wrong, all that you need to do is produce a set of prior beliefs which would lead to a different set of posterior beliefs after observing case A and B."

Well, that's just silly. To show that your statement is wrong, I just have to show that your statement is wrong, not show that some completely diffferent statement is wrong. Now, hypothesis testing is a case where design of experiments can make a difference in the conclusion. Therefore, you are wrong. Or else Bayes' Theorem is.

It provides no way for the difference in experimental design to matter.
Do you really not see how fallacious that is? You are denying the antecedent.

Example. Please.
I am a reasonable person, and I draw a distinction. Therefore, you are wrong.

By that I mean give me a detailed set of prior beliefs which, when modified according to Bayes' Theorem in the light of these two experiments, leads to different conclusions.
Then you aren't speaking English. You see, among English speakers, when someone says "Give me an example of X", that means "Give me an example of X", not "Give me an example of Y".

Here's your argument: "I only accept arguments based on Bayes' Theorem. There is no way to establish a difference based on Bayes' Theorem. Therefore, I dismiss that there could possibly any difference."

I strongly suspect that if you try and fail to provide me with the requested example, the connection will become much clearer to you.
I strongly suspect that you have a huge mental blind spot. "If you would only think about it, you'd agree with me".
:rolleyes:

Because it shows that any set of prior expectations will lead to the exact same posterior beliefs after observing either experiment A or B.

Therefore the details of what might have happened but didn't should be irrelevant to the inferences we draw. All that should matter is that there were 8 children and 7 of them were boys.

The second doesn't follow from the first, and neither of them answer my question.

Is there any reason for this not-so-subtle putdown?
I don't think that it's reasonable to call it a "putdown" to point out the flaw in your logic.

Particularly when I've fairly conclusively demonstrated that I understand why hypothesis testing leads to a distinction being drawn in this case?
You're clearly begging the question, as this statement assumes that the understanding of hypothesis that you have demonstrated is, in fact, the correct understanding.

But it is an increase in the strength of our conclusion in this particular case. Which is exactly what I was saying.

My point stands.
Then your point depends on cherry-picking.

I look forward to your elucidation of this point. Preferably with a calculated, worked out, example.
Suppose that we have three dice, a red die, a blue die, and a green die. The red die has the numbers 1-6. The blue die has the even numbers, each twice, and the green has the odd numbers, each twice.

Let's say that we start out with the belief that each die is equally likely. We roll a die, and note that it's a three, but don't note the color.

Red die: one side has a three, each side has probability 1/6, likelihood=1/6
Blue die: no threes, likelihood=0
Green die: two sides have threes, each side has probabilitiy 1/6, likelihood=1/3

So now we have confidences of: red, 1/3; blue, zero; green, 2/3.

But our calculations regarding the green die included the probability of both threes, even though we only saw one. In fact, if it is indeed the green die, then, when we considered the probability of the red three, we were also including the probability of something that never happened. No matter what, two thirds of the possibilities that we considered in our analysis are possibilities that never happened.

Furthermore, the reason that we think that the red three has a probability of 1/6 is because there are six sides, and we assume that they are equally likely. But we didn't see any of those other sides! Why are we including those other possibilities that didn't happen when calculating the probability of what did happen?
 
But art, how do the probabilities affect your conclusion that you saw a 3 after the experiment?

Not at all, says I. The probability after the experiment that you saw a 3 is 1; the probability that you saw anything else is 0. And that's the point you're missing. If I understand this correctly.
 
That's why I said "likelihood", as there is a distinction. The question is what the probability would have been before the experiment. From the probability of a three, given each die, one tries to figure out what the "probability" of each die is. This is a basic principle in Bayesian analysis, so I very much doubt that any of Ben Tilly's disagreements stem from that.

I don't understand what you're saying about my conclusion that I saw a three. That's not in doubt. There is no analysis that is needed to come to that.
 
The average of the population is a parameter. It makes no sense to speak of the "average value" of a parameter; a parameter has only one value.

http://dict.die.net/parameter/

In my world, we would describe the blood pressure taken in one individual as a "parameter", which is consistent with the second definition in that link. Your sense is the third definition. So, I think we are both right.
 
That's why I said "likelihood", as there is a distinction. The question is what the probability would have been before the experiment. From the probability of a three, given each die, one tries to figure out what the "probability" of each die is. This is a basic principle in Bayesian analysis, so I very much doubt that any of Ben Tilly's disagreements stem from that.

I don't understand what you're saying about my conclusion that I saw a three. That's not in doubt. There is no analysis that is needed to come to that.
Ummm, well, it looked to me like what Ben said that you objected to is that the probability of what might have happened before the experiment has no effect on the observed outcome after the experiment, so I beg to differ, I believe this is the crux of the argument.

Putting it another way, the probability of you seeing a three before you threw the single die has no effect on the probability of you having seen a three after you already did see a three; that probability is now one, period, end of conversation. And I'm sitting here watching you tell Ben "no" when he says that, and wondering whether you got it or not, and thinking not.
 

Back
Top Bottom