## Sunday, 15 November 2015

### Introducing the Normal Distribution (Part 1)

I can’t really say I enjoyed stats lessons in school; I found S2 particularly confusing and difficult, kind of like going bare-handed fishing whilst blindfolded. It wasn’t just me, when attempting to solve a problem nobody in my class, least of all my teacher seemed to understand what we were doing or why we were doing it . It felt like were memorising a book of recipes and then to answer a question we had try to work out which particular dish it was asking for before cooking it. We didn't accept any variations on dishes - if it wasn't on the menu you couldn't have it! Stats is not an easy subject to teach and I hate to point my finger at our teacher but they certainly had a very procedural teaching style and most things were presented in a pretty abstract context. I think we were seriously malnourished in terms of conceptual understanding.

Somehow I managed to muddle through but I never really came round to enjoying stats until I started applying it during my PhD, it was then that things started to really click into place. Now as a teacher of S2 I am determined to avoid focusing too heavily on teaching procedural knowledge in abstract contexts so I have set myself the following goals;

1. When introducing a new statistical technique or distribution do so by framing it in a practical context.

2. Wherever possible collect some real data to use in examples.

3. Whilst I don't think it's necessary for students to follow from 1st principles the derivation of every formula they use, I want to give them at least a feel for how/why formulas work and where they come from.

To put these ideas into context I'd like to share some highlights from a few lessons that I have recently taught on the normal distribution. In this first post I will focus on introducing the normal distribution then in a second post I'll discuss using the normal distribution as an approximation to the binomial distribution.

I have a particular keen pianist in my S2 group and I thought I'd draw on him for inspiration here and take a look at hand-spans.  Hand-span is one of my go to bits of continuous data to collect from  a class. Unlike height, hand-span takes seconds to measure but students still find it interesting to see who has the biggest! With a small sample size (around 20) and a mix of male and female students I wasn't really sure what to expect but I figured whatever happened it would lead to some interesting discussion points.

As students settled down I defined what I meant by hand span and asked students to measure each others to the nearest millimetre and asked them to just shout them out; as they did so I punched the numbers into the spreadsheet column (X) of this sketch.

So what's this chart you see appearing? A histogram? Well kind of but not as your used to seeing one at GSCE - you can see that the areas don't add up to the total frequency, what do you think I've done?

Well, I normalised the bar heights to give a total area of one so that it's comparable to a continuous probability density function - my students were familiar with this concept from an earlier lesson on pdf's. If you want to have a play round with the data set I collected from my class then I have uploaded it to GeoGebra tube for you here.

As you can see from my data it worked out kind of OK; fairly normal. We had a discussion about if this was what they'd have expected from the data.  For comparison I had some secondary data up my sleeve with a larger sample size to take a look at; it came in pretty handy (sorry I couldn't resist). This is from the website smallpianokeyboards.org which questions the one size fits all approach to key-size.

Some nice points here are that female distribution is pretty normal, the male distribution not so much. Get your students to scrutinise the table below (or even just the frequency curve) for reasons why this may be so.

An obvious difference is sample size. The other point to make here is that if the sample size was the same or you normalised the frequencies, if you combined the data into one distribution then this maybe bi-modal or possibly a bit flat topped! My group were mixed gender but the distribution looked OK if a little shaky round the edges so we proceeded. I did however explain that there are formal tests which can be used to determine whether the normal distribution was an appropriate model to use.

Going back to GeoGebra if you check the show $f(x|μ,σ)$ check-box then the normal distribution curve and its equation will pop up.

Now this equation is either hideous or beautiful depending on your point of view but you may wish to discuss some of the features of the equation or at least brace yourself for at least one student asking what $pi$ and $e$ are doing there. I was pleasantly relieved when someone asked because I would have questioned whether I had done a decent enough job in conveying the wonder of these constants in core lessons if they hadn't.

Anyhow without an understanding of multivariate calculus integration we can't really formally derive the normal function but we can convey an intuitive sense of why $e$ and $pi$ should appear. First lets take a look at $pi$.

Starting with a dart board if you aimed at the centre you would expect to get more darts clustered around the centre and less as you move out. This scenario can be formalised to act as the starting point for the derivation of the normal distribution function. Anyway, skirting round any formality, if you imagine you plotted dart density as a function of horizontal and vertical position (x and y) in a 3D plot you'd get something like this*.

For this to be a pdf you need the area under it to be equal to one but in this 3D plot that is the volume. Well we can't integrate a multivariate function yet but your students will recall;

$Volume \; of \; a\; Cone = \frac{1}{3} \pi r²h$

$Volume \; of \;a\; Sphere = \frac{4}{3} \pi r^3$

It shouldn't therefore come as a surprise that due to the shape of the plot that pi comes into play here and you can see this on the contour plots. The plot can basically be sliced into disks and integration sums the volume of each of these disks.

You can plot these live in WolframAlpha - just type in $e^{-(x^2+y^2)}$.

Now your students may be wondering why you just plotted $e^{-(x^2+y^2)}$ to represent the dart density. Well you could choose any other positive number (a) aside from e for the base and get a similar bell shaped distribution but as the derivation of the normal function continues you'd find another scaling factor ($ln(a)$) cropping up. If we chose $e$, the magical propety $\int{e^x }.dx=e ^x {\;}(+c)$  means that things run a bit more smoothly and no awkward scaling factor is required.

Getting back on track, students need to know that the normal distribution function is dependent on two parameters; $μ$ and $σ$. Play around with these using the sliders to give students a feel for how they change the shape of the distribution. In this case you want to set them to the values of your data buy typing in the pre-calculated values of $μ$ and $σ$ and see how well the normal curve models your distribution.  Ask your students for some comments: A key thing to notice is that if you cut the bits of the histogram off that stick out above the curve - you should be able to sandwich them between the curve and the histogram; The total area under each should be one.

Maybe you could now modify the sliders to give a distribution that is distinctly different from the data you've collected and ask the question;

Can you describe in everyday terms  how a class whose hand-spans could be modelled by this distribution differs from our class?

Moving on;

Suppose we use our class data as a sample to reflect the hand-spans of all year 13 students in the school. What is the probability that a randomly chosen person in year 13 will have a hand-span that is less than 190cm.

Let $X$ denote the random variable hand-span and assume $X$ is normally distributed $X∼N(μ,σ^2)$. In other words;

$f(x)=\frac{1}{\sigma\sqrt{2 \pi }}e^{\frac{-(x-\mu)}{2\sigma²}}$

We are assuming from our sample $μ=213.88$ and $σ=18.72$

$f(x)=\frac{1}{\ 213.88\sqrt{2 \pi }}e^{\frac{-(x-\ 213.88)}{2\times\ 213.88²}}$

Great, now we have a continuous probability density function so thinking back to the work we have done on pdf's how can we evaluate $P(X \leq 190)$?....Anyone particularly good at integration?

Nah, unfortunately it's not possible to analytically integrate this function (or at least to my knowledge)  but thankfully there is another way round.

At this stage I introduced my students to the cumulative normal distribution tables found in the formula booklet.

If you take a look at these  some kind sole has found the integral for you numerically and tabulated the results. Unfortunately they have only done it for one very specific distribution;

$Z∼N(0,1)$

At point we discussed standardising the distribution by firstly subtracting μ off X resulting in a mean of zero and then dividing by σ the give a variance of one. I think the first step is pretty obvious but the second bit more conceptually challenging. I used this sketch that I have discussed in more detail in a previous post on coded data - I have added an option to change the transformation so that it is comparable to a normal standardisation.  The key points to emphasise here are;

1 - If a scale factor is applied to a distribution then standard deviation will increase proportionally by the same scale factor.

2 - If the original data set has a mean of zero or is first transformed so that is has a mean of zero then applying a scale factor will not affect its mean.

Thus first subtracting μ and then dividing by σ (or the root of the variance) will transform the distribution to $Z∼N(0,1)$.

You can show this in the 'Normal Distribution' sketch by checking Standardise Data' and then holding shift and using the cursor arrows to zoom to see the standardised distribution. Finally check 'Show $\phi(z)$' to show the standardised curve and equation.

At this point we calculated a couple of fairly arbitrary probabilities using the standardised tables just to get a bit of practise using tables. We then went back to our distribution and amongst others tackled the question I posed earlier;

"What is the the probability that a randomly chosen person in year 13 will have a hand-span that is less than 190cm"

One last thing that I should mention is the GeoGebra Probability Calculator. You may well be getting your students to use graphical calculators to calculate probabilities rather than looking them up in tables. GeoGebra also has this functionality conveniently packaged in its Probability Calculator that is very easy to use and comes in handy for checking answers. You can access this by pressing ctrl+shift+p or by going here:

I really like the clear visual displays which come in handy when talking about complementary probabilities and the like. For some strange reason sometimes when I open the calculator screen it is cut-off at the bottom, just click on the bottom of it and drag it down if this happens to you.

I think I've covered more than enough ground for one lesson so I'll leave it here for now and talk about approximating a binomial distribution  by using a normal distribution in a second post on this topic. Any questions or comments are as always very welcome.

*Thanks to vonjd, mathstackexchange contributor for the idea to use WolframAlpha to show these plots.