(This is as good a place as any for this I guess...)
We're all gamblers here but how many of us understand and appreciate the maths behind what we're doing? I'm not talking about being able to work out the profits on a £5 e/w yankee in the blink of an eye, I'm thinking more about
the maths behind systems and so on.
This post may get a bit daunting in places but I will try and keep it as simple as I can.
Introduction
Suppose there is a football match this weekend between Win2Win Wanderers and RacingPost Rovers. You know that Wanderers score an average of 3 goals in a game and Rovers score 1 goal in an average game. So you just run to the bookie and lump on 3-1 since that's the obvious winning scoreline, yeah? No. At least not necessarily. It may be that the bookies have priced it up all wrong and the odds for a Wanderers 3-1 win offers great value. But do you know what the odds should be?
We can use mathematical and statistical modelling here to help us...
The Normal Distribution
So far we know that on average Wanderers score 3 goals a game and Rovers score 1. But they are only averages. It would be highly unlikely that Wanderers scored 3 times in every game they played. If they did I would certainly suspect something dodgy is happening. Similarly Rovers are extremely unlikely to find the net exactly once a game.
We need to consider how the number of goals scored by each side in previous games is distributed about this average. As I said before, Wanderers will not have scored exactly 3 goals in each and every game. There will be games when they have failed to score, or just got 1 or 2 goals to their name. There will be games when they have scored exactly 3 goals though. There will also be games when they have scored 4, 5 or 6 times, perhaps even more. The further away the scoreline from the average, the less often we expect it to occur. We can expect 3 goals most often, then 2 or 4 goals with roughly equal frequencies. 1 goal or 5 goals will be seen even less frequency with 0 or 6 occurring rarely. We expect the number of goals scored to follow what is called the Normal Distribution.
It is called the Normal Distribution because it is just that - the standard distribution that most things follow. It is usually described as a bell-shaped curve. It peaks at the average and then tapers off to the extremes away from that average. It is symmetrical around the average.
Have a look at the attached picture...
Note that for the example of Wanderers' goals the values are discrete. This means that the number of goals scored in any game is an integer value and we can't have 2.82 goals in a game. Wanderers score goals in whole number
increments. So while the average may not be an integer value, the number of goals scored in a game will be. Why am I telling you this? It means that were we to plot out the number of goals scored by Wanderers in all their games so
far the distribution would look lumpier than the Normal Distribution I have posted. This means we are only modelling the goals with the Normal Distributions since it is an approximation rather than an accurate representation.
Standard Deviation
When using the Normal Distribution we can look at how spread out data is from the average. This is measured using the standard deviation.
Take the example of Win2Win Wanderers averaging 3 goals a game. I said earlier that it is extremely unlikely that each game in the sample of games we are looking at would have exactly 3 goals, but let's suppose for a few seconds that suddenly that is the case. Here the standard deviation would be 0 as there is no deviation from the average; each value in our sample is exactly on the average.
Now suppose we had 60 games in our sample and that 17 had 2 goals, 26 had 3 goals and 17 had 4 goals giving us a very crude Normal Distribution. Note that the average here is still 3 as (17*2)+(26*3)+(17*4) = 180 and 180/60 = 3.
We now have a distribution of values around the average. 26 of the games finished with the average number of goals but 17 games were 1 goal under and 17 games were 1 goal over. If we were to compute the standard deviation here we'd see it comes out as 0.76.
Let us taken an even more extreme case: 60 games again but 30 games where Wanderers fail to score and 30 games where they score 6. Again the average is 3 as (30*0)+(30*6) = 180 and 180/60 = 3. But now the standard deviation is 3.03.
So you can see - the more the data is spread out around the average the greater the standard deviation is.
Exactly how you calculate the standard deviation is outside the scope of this post (but may be the subject of another future post if the demand is there for it). All you really need to know is that it is a measure of how spread out around the average your data is. And that Excel will calculate it for you in seconds :)
NB For this measure to be meaningful you should have a sample size of at least 30. The bigger the sample the more accurate and meaningful the standard deviation is.
Confidence levels
OK, so we've computed the standard deviation - what use is it? It can be used in conjunction with the Normal Distribution to work out what are known as confidence levels. Eh, what's that all about then?
Go back and look at the bell-shaped curve of the Normal distribution. It's peaked around the average value. In fact 67% of the area under that curve is +/- 1 standard deviation of the average. That means if we had a average of 3 and a standard deviation of 1 then 67% of our data would lie in the range 2 to 4 i.e. one standard deviation either side of the average.
Obviously this means a third of our data lies outside this range. Look again at that Normal Distribution curve and see how low it is at the extremes. In fact 95% of our data is within 2 standard deviations of the average, i.e. 1 to
5 and 99.7% of our data is in within 3 standard deviations of the average i.e. 0 to 6 using an average of 3 and a standard deviation of 1.
OK, so what does this mean and what are these confidence levels? For a known average and standard deviation we can be 67% confident that the value will lie within one standard deviation of the average. Suppose we go back to Win2Win Wanderers and their 3 goals a game. Suppose also that we have a standard deviation of 0.5. For Wanderers next game we can then be 67% sure that they will score between 2.5 and 3.5 goals. Similarly we can be 95% confident they will score 2 to 4 goals and 99% confident they will score 1.5 to 4.5 times.
How does this maths apply to our gambling?
There are limitations to using the normal distribution when it comes to gambling, unfortunately. Take another look at that Normal Distribution picture. It's symmetrical about the average so for every value above the average there is a corresponding value equally below the average. This is not true of gambling.
When you place a 1pt bet it either wins or it loses. If it wins you make a profit that depends on the odds. But when it loses you lose the 1pt stake. Your losses on each bet are fixed but the profit is variable. If we were to plot
the frequency with which we recorded each possible profit or loss on our bets we would not get a curve that looked like the Normal Distribution curve. That means we can't use standard deviations doesn't it? Not necessarily...
Standard deviations can be used for distributions that aren't Normal but it's a poor approximation. What we can do instead is change our measure. Instead of assessing each bet we group them into say batches of 10. So taking each
set of 10 bets at a time we work out our total profit/loss on those 10 bets. Were we to plot the frequency of these figures we would get a much better approximation to the Normal Distribution.
So what? If you have over 300 bets and group them into 10s you have at least 30 values in your new data sample of profit/loss per 10 bets. If you then compute the standard deviation for these values you can work out confidence levels for your profit per 10 bets. You'll then know that 67% of the time your profit over 10 bets will lie within 1 standard deviation of the average, 95% of the time it is within 2 standard deviations and 99% of the times it is within 3 standard deviations of the average.
Why would I want to do any of this?
You may never want to, it depends on what level you take your gambling to. Suppose someone told you they had analysed their system and for every 100 bets at 1pt they expected a profit of 15 points with a standard deviation of
6 points. You ignore the standard deviation bit because you don't understand what it means but think 15 points profit on 100pts invested is OK and you'll follow the system. 100 bets later you are a couple of points down but remain convinced to try the same system for another 100 bets. At the end of that you are now 5 points down overall and getting all grumpy. You are 35 points away from where you think you should be. You expected to be about 30
points up but are in fact 5 points down. You complain to the guy who runs the system and get all mad at him. Are you right to be angry?
Let's go back and look at that standard deviation again, just in case it was important. After 100 bets you were 2 points down when you should have been 15 points up. Should you? 99.8% of the time your profit over this period will be between -3 points and 33 points i.e +/- 3 standard deviations from the average. OK, in this example you were maybe slightly unlucky to be near the worst case. Surely it'll get better from here though. Another 100 bets goes by and you lose a further 3 points, again an extreme case perhaps. But it happens, and the system is still performing as well as it ever has done. The system isn't broken, what you are experiencing is statistical fluctuations within acceptable ranges for the data.
So you see, that standard deviation could be important. Someone else comes along with a system that profits to the tune of 10 points per 100 points invested and has a standard deviation of 2 points. This system is 50% less
profitable that the previous one isn't it? This makes 10 points every 100 bets and the other maked 15 points. Ahh, but here you can expect to make 10 points +/- 6 points every hundred bets and remain within the realms of acceptable statistical fluctuations. That's 4 points to 16 points every hundred bets. This is a lower risk system as you should make at least a small profit every 100 bets. With the other system you could make a loss.
And don't be fooled into thinking that things won't carry on the same for hundreds of bets. With the first system you could make -3 points each and every hundred bets and be 30 points in the hole after 1000 bets and the system would be working as advertised. But followers of the second system would have at least 40 points over the same period. OK, going to the other extreme you could be looking at 330 points over 1000 bets on the first system and only 160 on the second system but would you face that risk?
Conclusion
There is a good chance that many of you will never need to worry about Normal Distributions, standard deviations or any of that sort of stuff. But it does you no harm to know about it and hopefully understand some of it. There is a lot of maths at work and many tricks and tools that can be used to help you analyse your gambling and systems. Anyone turning pro would probably benefit from understanding this sort of stuff to balance steady growth systems (low standard deviation) with more high risk systems (high standard deviation).
By the way, I reckon the odds on a 3-1 scoreline in the mythical game between Win2Win Wanderers and RacingPost Rovers should be a touch over 11.0 (in fact anything over 12.13 in decimal odds would be value) given the goal averages stated. Why this is so, along with a discussion on the Poisson Distribution plus dependent and independent events is the subject of a follow-up post of there is enough interest.