Someone who is better at stats than I am could give you the math, but I'm quite certain that 21 wins out of 54 with roughly 1:2 odds is not statistically significantly different and would not call for rejecting your null hypothesis.
Thanks. Given a probability of 50/50, I know how to compute the odds. Using Excel's binomial function, I get BINOMDIST(21, 54, .5, 1) = 0.0668. This isn't perfect, but it's fairly close. I think we can say his luck on these hands falls between 5 and 10%. That's in the territory for rejecting the null or at least continuing to study the situation.
..... In the long run, if you're making the right DECISIONS, the luck will even out for you and for all the players.
Yes, that's basically the assumption I'm trying to test. Will his 'luck' even out over time?
ETA: As was intimated earlier on - simply tracking the frequency with which you are dealt certain starting hands can be used to 'test luck'. You 'should' get dealt pocket aces 1:220 hands (or thereabouts). Indeed, you can easily determine the probability to receive ANY of the Hold'em starting hands. After testing a squillion hands being dealt, you could see how 'lucky' you were on your hand distribution. Your starting cards are 'pure luck' (assuming its a straight-up game).
Well, whether or not the distribution of hands deviates from the expected distribution can be tested, but it wouldn't test the hypothesis in a game situation - i.e. winning the hand. So while it's one measure of luck, it isn't the point we are wanting to test. Further, while I know how to compute the expected probabilities, I'm not sure of how to compute the probability of getting the hands he actually received (or worse), I'm open to looking into it if you have any suggestions on how that should be computed. How should the two hole cards be sequenced from best to worst?
There are software 'add on' tools out there that you can use that will track these kinds of situations for you automatically. The focus is more about whether or not your moves are making you money (as sometimes you are semi-bluffing or stone-cold bluffing when you move all-in post-flop - or you should be, if you expect to ever get called when you actually have a monster and want action). However playing with one of these add ons would help you gather data quickly.
I've suggested this, but until I have a better idea on how to evaluate the data collected, I'll not be successful in talking him into doing that.
Also, getting one of those simulator programs I mentioned would help you as well, although this may lack the sensation of testing 'your' luck. You really would be testing whether or not the math is 'correct'.
Yes, it's not a bad idea in regards to testing my computations for what's expected. But you've already concurred that we have the correct odds for the data we've been collecting.
I'm not sure I follow.
If we're only considering situations where the odds are 1:2, and we run 54 trials, we wouldn't find a final score of 21-33 to be significantly different than chance. So the hypothesis that the guy's results are due to "bad luck" (as opposed to random chance) is not supported--that is we can't reject the null hypothesis that the outcome is due to chance.
It depends on your alpha value - the probability at which you decide to reject the null. As I said above, the p-value for 21 heads in 54 flips is 0.0668. If we set the alpha level at 0.10, we could reject the null. At alpha = 0.05, we could have rejected it at some previous points in the data collection process. When he was at 5 wins out of 18 races, p = 0.0481, for 6 wins out of 21 races, p = 0.392, for 14 wins out 40 hands, p = .0402, for 15 wins out of 44 hands, p = 0.244 - this was the lowest cumulative probability for the 54 races. He's continuing to collect data.