• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Help needed with Excel.

What little mathematics I've done I was much better at than any of my fellow pupils at school. Indeed in my 4th year exam when I was 14 I got the highest mark ever in the history of the school. But I never did any more mathematics after 16, and I've never done any statistics whatsoever (and certainly have never done any calculus and the like).

Don't feel bad, Ian. Math gets a LOT stranger about that time.

I got straight A's in math (the highest of grades in the US) constantly until my first run-in with probability, calculus, and trigonometry. Even algebra and analytical geometry were a cakewalk.... but those three were like knowing French and suddenly being spoken to in Portuguese.
 
Ian, one way to do the 5 minute interval thing is to write a simple VBA function running on a loop or timer control. If you post that question on the computers forum I bet someone would post some sample code. If I get a moment tomorrow I might have a quick look at it myself.
 
[latex]e^{\pi i} = -1[/latex]

Another way to write this is:

[latex]e^{i \pi} + 1 = 0[/latex]

What's interesting about this is that 5 of the most important numbers in mathematics appear in this rather simple equation, but in doing so, imply a lot of really odd things.

Note I wrote [latex]i\pi[/latex]. Just seems more natural to me that way. :)
 
Ian, one way to do the 5 minute interval thing is to write a simple VBA function running on a loop or timer control. If you post that question on the computers forum I bet someone would post some sample code. If I get a moment tomorrow I might have a quick look at it myself.

Which computers forum? Yes I could ask somewhere and hopefully someone will do it for me.

Or I might attempt to do it myself. Need to learn some elementary visual basic first though.
 
BTW I'm just wondering how you would calculate the probabilities for the half-time/full-time market?

Let's suppose we have the probabilities for half-time i.e we have the probability for team A to be in the lead at half-time, or for there to be a draw at half-time, or for team B to be leading at half-time.

And we also have these probabilities for full-time.

And we are also armed with the information that, on average, 44% of goals are scored in the first half and 56% in the second half (at first I kinda assumed that this wouldn't be relevant, but on reflection it obviously is. Thus if there were a third of a chance of the teams drawing at half-time, it wouldn't just be a third multiplied by a third to get draw/draw).

Anyway this is all much more complex than I originally thought and I simply do not possess the mathematical knowledge to work it out.

Any advice? :)
 
You're still going to have to make some assumptions. Are you going to trust that Poisson is a valid approximation? Are you going to assume that while the probabilities change from one half to another, they stay the same within a half?

If you make both of those assumptions, you can then find then define f(a,b,x,y) as being the porbability of having a score of a to b, given Poisson distributions with lambdas of x and y. Then g(x,y)=sum of f(a,b,x,y) where a>b. There's quite possibly a closed-form version of this, but I don't know it. You can then look at where g(x,y) has the expected value, and use that to figure out the other probabilities.
 
You're still going to have to make some assumptions. Are you going to trust that Poisson is a valid approximation? Are you going to assume that while the probabilities change from one half to another, they stay the same within a half?

Poisson is indeed a valid approximation, but the lamba value does not decrease linearly as the match progresses. But I have the figures. However I think that the only relevant figures here are the 44% and 56% figures for the 1st and 2nd half respectively.
If you make both of those assumptions, you can then find then define f(a,b,x,y) as being the porbability of having a score of a to b, given Poisson distributions with lambdas of x and y. Then g(x,y)=sum of f(a,b,x,y) where a>b. There's quite possibly a closed-form version of this, but I don't know it. You can then look at where g(x,y) has the expected value, and use that to figure out the other probabilities.

Sorry this is just completely over my head. You're talking to a person who has only ever done CSE maths here.

Damn, I suspected it would be very complex. I might be able to figure it out by myself eventually.
 
If you make both of those assumptions, you can then find then define f(a,b,x,y) as being the porbability of having a score of a to b,

What does "having a score of a to b" mean? The score changing from a specific value at half time to another value at full time? What is x and y? What is a function? What does (a,b,x,y) actually mean?

given Poisson distributions with lambdas of x and y.

So are x and y the pre-match goal expectations for each team respectively?

Then g(x,y)=sum of f(a,b,x,y) where a>b.

You're not speaking in English. How is a person with no background in mathematics supposed to understand this?? Speak in plain English please.

Thank you.
 
I think I've worked out how to do it, but it is horrendously complex and convoluted.

Oh well, unless there's a shortcut . . .
 
However I think that the only relevant figures here are the 44% and 56% figures for the 1st and 2nd half respectively.
If you want to interpolate to five minute intervals, you'll need to know how lambda changes.

Of course, there's a lot of information you're not taking into account, such as yellowcards, field position, etc.

Interesting Ian said:
What does "having a score of a to b" mean?
The first team has a goals, the second team has b goals. The score is a to b.

What is a function?
A function is a relation between one set and another.

What does (a,b,x,y) actually mean?
It means that the input for the function f is a,b,x,y.

So are x and y the pre-match goal expectations for each team respectively?
Yes.

You're not speaking in English. How is a person with no background in mathematics supposed to understand this?? Speak in plain English please.
If these concepts could be easily expressed in plain English, the mathematical terrms never would have been invented.

If you assume that x and y have particular values, and calculate, based on that assumption, the probability of getting each particular score, and then sum the resultant probabilities for all scores which result in the first team winning, and then compare the result to the known probability of the first team winning, they should be the same. If they are not, you know your assumption was wrong. By adjusting x and y (not too hard to do, with Excel), you can determine which values are consistent with the known data.
 
Originally Posted by Interesting Ian :
However I think that the only relevant figures here are the 44% and 56% figures for the 1st and 2nd half respectively.

Art Vandelay
If you want to interpolate to five minute intervals, you'll need to know how lambda changes.

I have stated what I want to do and it has absolutely nothing whatsoever to do with considering how the expectation of goals decreases as the match progresses. If I'm interested in how many goals will be scored in a 45 minute period of time then it doesn't matter whatsoever about each 5 minute interval of time. It's only the average for the 45 minutes which can possibly be relevant.

Art Vandelay
Of course, there's a lot of information you're not taking into account, such as yellowcards, field position, etc.

II
I don't need to take them into account since they are implicit in the determination of the probabilities I am using eg the 1x 2 percentages for full time and half time, and the pre-match total goals expectation and superiority.

Originally Posted by Interesting Ian :
What does "having a score of a to b" mean?

Art Vandelay
The first team has a goals, the second team has b goals. The score is a to b.

Eh? The score is a-b, not a to b.

II
What is a function?

Art Vandelay

A function is a relation between one set and another.

This conveys no meaning to me. What is a set? What is meant by a "relation" in this context?

II
You're not speaking in English. How is a person with no background in mathematics supposed to understand this?? Speak in plain English please.

Art Vandelay
If these concepts could be easily expressed in plain English, the mathematical terrms never would have been invented.

Right. So out of the countless billions of human beings that have ever lived and who have reasoned about mathematics and have built upon other peoples' insights, I'm the one poor sod who has to work everything out for himself from first principles because it is difficult to communicate mathematical reasoning in English.

That's just great . . .

If you assume that x and y have particular values, and calculate, based on that assumption, the probability of getting each particular score, and then sum the resultant probabilities for all scores which result in the first team winning, and then compare the result to the known probability of the first team winning, they should be the same. If they are not, you know your assumption was wrong. By adjusting x and y (not too hard to do, with Excel), you can determine which values are consistent with the known data..

I done all this months ago in drawing up a poisson distribution for correct scores. And obviously that was one thing I checked out, although I had to decide at what point the scores became sufficiently high, and thereby sufficiently unlikely, that they could be safely ignored.

OK, I had to do everything from first principles. Here's how I see it. And note my ability to communicate in plain English!

Let's consider the case where we're calculating the probability that team A will be leading both at half-time and full-time (and similiar reasoning will apply to the other possibilities).

It seems we have to sum the following probabilities up:

The prob that Team A wins 1st half * the prob that Team A wins 2nd half
+
The prob that Team A wins 1st half * the Prob the 2nd half is drawed
+
The prob that Team A wins 1st half * the Prob Team B wins second half by less than Team A's first half lead.

Obviously the only difficulty here is the calculation of Team B wins second half by less than Team A's first half lead.

But I think I've worked out how to do it.

Obtain the total goals pre-match expectation and also the supremacy rating of team A over team B from spreadfair (the supremacy rating is the expectation in numbers of goals of how much team A will defeat team B. Obviously this could be negative).

Now, armed with this information one can then generate the average expected score for the entire match eg 1.5-1.1

If we assume the likelihood of a goal is unaffected by how many goals have been scored so far (which I've read is apparently a pretty reasonable assumption), and taking into account the fact that 44% of all goals will be scored in the 1st half and 56% in the 2nd half, then the number of additional goals by each team in the 2nd half will be the poisson distribution of 0.56*average expected score for team A for whole match and 0.56*average expected score for team B for whole match.

We can then generate probabilities from the beginning of the 2nd half for team A scoring a specifiable number of gaols and team B scoring a specifiable number of goals. In otehr words we can generate probabiliites for all possible correct scores for the 2nd half

And of course, using similar reasoning we can do this for the first half too. So what in effect I've done here is to split the whole game into 2 games, one played in each half with the probabilities for the correct scores in each of these halves.

Now to remind ourselves of what we have to do:

To find the probability of Team B winning the second half by less than Team A's first half lead.


If team B is to win by less in the second half, it must therefore follow that team A necessarily has to win the first half by a goal differential of at least 2 goals.

So team A could win the first half by 2-0 which means that B would have to win the 2nd half by 1-0.

And 3-0/1-0, 2-0, 2-1, 3-2 etc etc

And 3-1/1-0, 2-1, 3-2 etc etc

And etc etc.

Obviously the further I go the less probability the scenarios will be so when the scores become sufficiently big I can safely ignore them.

Right!

So the probability that Team B wins the second half by less than Team A's first half lead will be (the prob that Team A wins 2-0 in first half * the prob that Team B wins 1-0 in 2nd half) + (the prob that Team A wins 3-0 in first half * the prob that Team B wins 1-0 in 2nd half) + (the prob that Team A wins 3-0 in first half * the prob that Team B wins 2-1 in 2nd half) etc etc etc

So anyway that's how I was thinking of doing it. Not entirely sure if it's all correct; will need to think more about it tomorrow.

Night.
 
I have stated what I want to do and it has absolutely nothing whatsoever to do with considering how the expectation of goals decreases as the match progresses.
Uh.. no, you didn't. You earlier said that you are starting with the probabilitiees of the team being in the lead at half-time/ end of game, and didn't really explain what you were tying to figure out. Now you're saying that you're starting with the expected number of goals, and yet again not really saying what you're trying to figure out. If you want help, you really should make it clear what you want help with.

If I'm interested in how many goals will be scored in a 45 minute period of time then it doesn't matter whatsoever about each 5 minute interval of time.
They're all linked. And if you look at 5 minute intervals, you'll eventually get to 45 minutes.

I don't need to take them into account
Sure, you don't need to. Just pointing out that there's a possible avenue of further advantage.

Eh? The score is a-b, not a to b.
You don't say things like "The score is five to three"? It's quite common in America.

This conveys no meaning to me. What is a set? What is meant by a "relation" in this context?
Not knowing about Poisson is one thing, but it's really quite remarkable that someone could graduate from high school without learning about functions. As much I don't like simplistic, nonrigorous explanations, you’ve already expressed an aversion to trying to understand real definitions, so how about this: stuff goes in. Stuff comes out. Whenever the same thing goes in, the same thing comes out. That’s a function.

Right. So out of the countless billions of human beings that have ever lived and who have reasoned about mathematics and have built upon other peoples' insights, I'm the one poor sod who has to work everything out for himself from first principles because it is difficult to communicate mathematical reasoning in English.
Nothing of the sort. Rather, if you refuse to learn basic math, and you make snarky remarks at people who use actual math, you may very have to work everything out for yourself. It is somehow fashionable to flaunt one’s ignorance of math, as if it is somehow the other person’s fault that one does not understand math, and that using mathematical terms to discuss mathematical subjects is somehow inappropriate. I can put in one line what it would take several paragraphs to say in “plain English”. It's kinda rude to complain that someone who is trying to help you isn't doing it to your liking.

I done all this months ago
Did.

The prob that Team A wins 1st half * the Prob the 2nd half is drawed
Drawn.

Obtain the total goals pre-match expectation and also the supremacy rating of team A over team B from spreadfair (the supremacy rating is the expectation in numbers of goals of how much team A will defeat team B. Obviously this could be negative).
How is the supremacy rating different from the expected number of goals by A minus the expected number of goals by B?

Obviously the further I go the less probability
Probable.
 
It is somehow fashionable to flaunt one’s ignorance of math, as if it is somehow the other person’s fault that one does not understand math, and that using mathematical terms to discuss mathematical subjects is somehow inappropriate.

No it's not. It's fashionable to say that they were hopeless at maths at school.

It's quite the opposite for me. I was vastly better than maths than anybody else in my class and I was vastly better at maths than in any other subject (well, maybe apart from physics). Maths and physics were the only exams I passed at school. I failed all other exams including English Language. As for English Literature I wasn't even allowed to take the CSE course in it never mind the "O level"! Maths and physics were the only subjects I had the remotest interest in. They were the only subjects I considered to be important. I was a complete idiot in thinking this. I now understand that philosophy is more important than anything else, and that other subjects are just as important as maths and physics.

But anyway, come to think of it I am familiar with the word function. It's just that I've forgot. It's over 20 years ago since I did any maths.

BTW stringing together words in a meaningless way do not constitute help.

I thought I'd made it sufficiently clear what I was trying to do.

You have 2 teams.

For instance Barcelona are playing AC Milan tonight.

Now in the half-time/full-time market there are a total of 9 possibilities.

1. Barcelona are in the lead at half time, and are also in the lead at full time i.e they win the game. (current odds of 3.4)

2. Barcelona are in the lead at half time but in the 2nd half AC Milan manage to score sufficiently so that at full-time it is a draw. (current odds of 17.5)

3. Barcelona are in the lead at half time but in the 2nd half AC Milan manage to score sufficiently so that at full-time they are in the lead i.e AC Milan win. (current odds of 50)

4. It is a draw at half-time, but subsequently Barcelona go on to win at full-time. (current odds of 5.9)

5. It is a draw at half-time, and it is also a draw at full-time. (current odds of 5.2)

6. It is a draw at half-time but then AC Milan subsequently go on to win the match at full-time. (current odds of 11.5)

7. AC Milan are in the lead at half time, but in the 2nd half Barcelona manage to score sufficiently so that at full-time they are in the lead i.e Barcelona win. (current odds of 30)

8. AC Milan are in the lead at half time, but in the 2nd half Barcelona manage to score sufficiently so that at full-time it is a draw (current odds of 17.5)

9. AC Milan are in the lead at half time, and are also in the lead at full time i.e win the game. (current odds of 9.4)

The odds quoted are decimal odds. Thus in the last scenario with odds of 9.4, this means that if you put £10 on to back that outcome, and ignoring commission, you would win £84 and also get your £10 back. But you have to pay 5% commission on the £84 so you would only really win £79.80 (+ the £10 original stake back).

Now what I want to figure out are the true odds using more basic data such as the average expected score in the match.

Looking at spreadfair we see that the total average number of goals expected is 2.33, and Barcelona have a superiority of 0.63.

This means that the average expected score is 1.48 goals to Barcelona and 0.85 goals to AC Milan. This also implies that the ratio of goals will be 63.5% to 36.5%. That is to say of all the goals scored, on average Barcelona will score 63.5% of them, and AC Milan will score 36.5% of them. Of course it's unlikely that these figures are the real true averages, but they're the best estimates that are available since the total goals and superiority are going to be fairly efficient markets.

Note also that these figures take into account the possible weather and whether certain players will play and a whole host of other factors. As new information comes in you can expect these figures to move eg if it starts pissing down where the match is taking place or whatever.

Now there are certain assumptions we are making. We are assume a poisson distribution fairly accurately describes the goals distribution. We are also assuming that the number of goals scored so far will not alter the 63.5:36.5 ratio. We are also assuming that, on average, 44% of all goals are scored in the first half, and 56% in the 2nd half (however I also have the figures for each 5 min interval in the game. I maintain though that this is not relevant). So we should be armed with enough information to try and ascertain what are the "value odds" for the 9 outcomes. That is to say what the odds should be in order to win in the long term.

I hope all this makes it sufficiently clear. I appreciate your help, but you really need to leave out all mathematical terminology and speak in plain English, otherwise we're both wasting our time.

I think I've worked out how to do it anyway as I explained last night. I just need to think about it to see if I've got it right.
 
The gotcha for this is that the by far safest way to predict anything - be it the result of a tootball match or an election – is to look up the odds at the bookies. At the last Australian election I had a bet with my mates about the result – they were all reading the papers and the opinion polls and saying the results may not be known for days (the headline was “Too close to call”). I said Labor will concede at 8 pm on the night of the election (the polls closed at 6 pm) and I was just about right. I ignored the polls, the pundits and the papers and simply looked at the odds.

Which seems to suggest that the odds really do contain more information than you are ever likely to gather on your own.

However most of the best enterprises get the advice “it can’t be done” before they start so good luck in the project.

Actually the best advice I can give you is – learn programming. (I am assuming you don’t program since you are new to Excel) If you were good at maths in school you will probably pick it up quickly and you can even start with just Excel. Just find a good basic text on Excel Visual Basic.

This will allow you to crunch the data on a much more ambitious scale. Also some sort of screen scraping software to allow you to mine the stats in the background – cutting, pasting and formatting gets old really quickly.
 
The gotcha for this is that the by far safest way to predict anything - be it the result of a tootball match or an election – is to look up the odds at the bookies. At the last Australian election I had a bet with my mates about the result – they were all reading the papers and the opinion polls and saying the results may not be known for days (the headline was “Too close to call”). I said Labor will concede at 8 pm on the night of the election (the polls closed at 6 pm) and I was just about right. I ignored the polls, the pundits and the papers and simply looked at the odds.

Which seems to suggest that the odds really do contain more information than you are ever likely to gather on your own.

However most of the best enterprises get the advice “it can’t be done” before they start so good luck in the project.

Actually the best advice I can give you is – learn programming. (I am assuming you don’t program since you are new to Excel) If you were good at maths in school you will probably pick it up quickly and you can even start with just Excel. Just find a good basic text on Excel Visual Basic.

This will allow you to crunch the data on a much more ambitious scale. Also some sort of screen scraping software to allow you to mine the stats in the background – cutting, pasting and formatting gets old really quickly.

Taking the average odds that bookies quote will be fairly accurate for the match outcome I would imagine. But not say for something like correct scores. The odds they quote for the less probable scorelines eg 3-0 to the weaker team, are nowhere near as long as the true odds.

I've been using spreadsheets for 6 months now. I've certainly been able to pick it up quickly yes. The 3 books I've had out of my library were very simple indeed. Can't remember them referring to any programming though.

If anyone could recommend an excel book for me to buy which goes into the programming aspect, but which does not assume any prior knowledge of programming, I would be grateful.
 
Hi, does anyone know what this poisson equation is?

Let's suppose x represents the number of goals by a team and y is the pre-match expectation of how many goals that this team will score. What would be the equation for the probabity of x just for this one team?
 
Sure, the Poisson Theorem is a way to estimate the probability of an event happening k times within n tries. It is written as:

equation1.gif

Eric W. Weisstein. "Poisson Theorem." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PoissonTheorem.html

The first equation you may recognize as the equation for finding the probability of an event happening k times within a binomial distribution. The theorem states that the second equation is about equal to the first, and can be used for estimating the probability of random events.

So, if you know the average number of times an event happens over a given period of time, you can use this theorem to create a function to estimate the probability of the event happening k times, or a function for the probability that X is some value k.

P(X=k) = (μ^k * e^- μ) / k! Where μ = average number of times an event happens over a given period of time.

For example, let’s say an intersection averages about 10 car accidents a year. You, however, want to know the probability of there being 2 accidents at this intersection this week.

First you’d find the average number of car accidents within a week in this case its 10/52 or 0.1923.

P(X=k) = (0.1923^k * e^-0.1923) / k!

Now we make k = 2 since were interested in knowing the probability that it will happen 2 times this week.

P(X=2) = (0.1923^2 * e^-0.1923) / 2!

Then just solve the equation.

P(X=2) = (0.03697929* 0.825059314) / 2
note x! = 1 * 2 * 3 * 4… * x, so 2! = 1 * 2 = 2

P(X=2) = 0.0305101076 / 2

P(X=2) = 0.0152550538

I just copy and pasted towards the end there, you could probably round the number a little. :) So the probability that there would be 2 car accidents at this intersection this week is about 1.53%

Moving way from that long winded explanation and completely off topic example, I believe you could make μ = to the average number of goals scored with a certain amount of time (5 mins, 10 mins, 20 mins, half the game, etc). Then input it into the equation to calculate the probability of scoring k goals within that time frame. However, I’m not sure if this is correct. It has been quite a few years since I’ve worked with probabilities in high school.
 
Last edited:
One needs to be careful about the modeling distribution one assumes, particularly for very unlikely events. As an example, the widely-used Black-Scholes option pricing model assumes that the log of security price changes can be represented by a normal distribution. For small price changes, this is not a bad assumption.

The standard deviation of the log of daily changes of the Dow Jones Industrial Average is on the order of 1%, a value that is independent of the assumed distribution. If we then assume a log normal distribution, the probability of a change of 23 standard deviations in one day can be shown to be about one in a googol (10^100) days. Since the age of the universe is only about 5 x 10^12 days, one would expect that a price change of 23% in one day will never happen in our lifetime or even in the lifetime of the universe, unless the universe lasts 10^87 times longer than it has been in existence to date. Pretty good odds, eh?

However, on October 19, 1987, the Dow Jones Index did indeed lose 23% of its value, and a lot of option traders went belly up, in particular those who had bet against large negative price changes (i.e., those who had sold puts).

So from that experience, we can conclude that perhaps the stock market is not well represented by a normal distribution. Still, to this day many option traders still rely on the Black-Scholes model. Nothing better is readily available for option pricing, as far as I know. Reminds us of old saying, “it may be crooked, but it’s the only game in town.”

So how does one determine how well a Poisson distribution represents the betfair market? One probably needs a lot of historical data and a pretty good statistics background to figure it out. Anyone have any ideas?
 
Actually, wouldn't a drop of 23% be 26 standard deviations?
.99^26=77%

Will you please stop trying to show off??

Anyone with a modicum of intelligence won't be impressed.

Why don't you try engaging in a philosophical argument? You know, just so I can take the p*ss out of you.
 
Last edited:

Back
Top Bottom