Tuesday, May 26, 2009

THE SHAPE OF NUMBERS

If you are like me perhaps you've found the whole field of statistics intimidating. All you know is that there is some magic percent that you can measure from a sample and somehow get a picture of the macro reality. One might infer that the greater the volume of sheer numbers, the smaller of the percentage of a sample you need take. There are no doubt mathematical formulas on this but I want to narrow the topic a little. I want to talk about the herolded "Bell curve". Once I did a program in BASIC where I had the computer spit out random numbers- - and in my model I used the "pile of sand" or the conical model rather than a bell curve. In my model I had the varience from the mean dependant on the square of the different of a number- - in this case a random number between zero and six. What this meant was that the "mid point" where half of the numbers were closser to the mean, would be at nine above the mean and nine below the mean. In other words this means that the middle eighteen points (in this case of IQ scores randomly generated) would comprise exactly half the scores.

However statisticians don't use the "middle hald and two end quarters" number. Instead they use a number two thirds the way out to the two extremes of the score. That is, two thirds of the numbers are between these two lines and one sixth is to the far right, and one sixth to the far left. In this model Sean Hannity and Rush Limbaugh would be to the far left, if you catch my drift. These two lines are collectively known as "standard deviation". It is best to speak of a standard deviation line or set of lines rather than the oft heard "standard deviation curve".

Now that you understand what "standard deviation" is, I can go on to say that it needn't be always the same number. Obviously it will vary by what you are sampling. For some things it will deliniate a wider range and for others a more narrow range. One might imagine for measuring things like blood PH I would expect a very narrow range, or for human temperature, a narrow range. But there are a few other aspects to the curve you need to know. First there is the right or left "skew" of the numbers, if any. A "normal" distribution of numbers is not supposed to have any "skew". For certain things like personal income- - a rightward "skew" would be assumed. One way to define a "skew" is to the extent that the mean average is to the right of the median. As I may have said before, when you use "median" you are talking about human beings, the domain of the liberal. When you are talking about things like worker efficiency or energy usage or production, then you would use the mean. These are the sort of things conservatives are more interested in. Is this concept now clear to you?

There are two moremore dynamics I want to fermiliarize you with first and that is what I call the amptitude. This is merely "how high is your bell curve"? You can think of it as sine waves. If you turn up the gain on the ociliscope, the sine waves will become more vertically elongated. But this is a dynamic of "projection" of you will, of the image, and says nothing of the "nature" of the wave itself. Moving on - - a large sample of anything will generate more numbers than a small sample, so on the same graph marking or "projection" or one might say "calibration" the bell wave will look taller the more sampling that you do. These things are taken into consideration when you draft the chart. There is another "projection" you should be aware of. And often you hear of a "logarithmic chart". These can either be horisontal or vertical progressions. The key thing to remember with that these charts is that they are proportional, and also that you never reach zero no matter how low the readings go. There are places where logarithmic charts are and are not appropiate, and I think I went into this recently, and let me give you a simple illustration right now. If you bought a hundred dollar stock and it goes down to tem, you are all but wiped out for you have lost ninety. So a logarithmic chart isn't relavant. If you however wait till the stock is at ten to buy it, now the logarithmic chart is relavant. Because it can continue in your eyes to plumet sharpely. People like to use straight linear charts to show a stock's rise. But if you just bought a stock when it about to reach its peak- - a logarithmic chart would be more relavant to you. The first investor may have baught the stock at ten, so if the stock goes up ten now he's doubled his money. But if you paid a hundred and it goes up ten, you've only made ten percent on your money. Capish?

Now we come to the most difficult concept, at least that I am going to discuss here. There is a quality called kurtosis. I became fermiliarized with this word in wolframalpha.com. What it is, to put in simple terms, is the "pointiness" of a Bell Curve. A high kurtosis will be "pointy looking" whereas a low kurtosis will be "more blob like". What I also need to stress here is that the standard deviation curve needn't change for the kurtosis to change. In fact, kurtosis is best illustrated where the standard deviation number or "point spread" as I call it, does not change. Let me next tell of an excell thing I did lately. I took four quarters of a circle. Think of a black line circle on white cardboard pieces. Now take the pieces are rearrange them so that they form the shape of a "Bell", with the high point in the middle. Now, this is what I refer to as a bell curve with "trigonometric logic". This curve had a peak to standard deviation line ratio of 1.409. This number is close enough to the 1.414. square root of two that there may be some connection, since I was not that exact in my "construction of the model". In this model I did it was as though there were some gymnasium with a bell shaped roof, and you had the people file in and stand on the line on the floor which represents their IQ score, or highth or whatever I am measuring. From here it was simply a matter of "counting noses" to see how many were on the node or peak line of the "Bell" and how many were on the two-thirds line and taking the ratio of the numbers. I say all of this because what I'd like to do is recalibrate and simplify the whole "kurtosis" computation process. First you have to know some things about statistics. Reality isn't what you'd logically think. If you have a class room and Chaquille ONeil walks in to the room, his presence throws the standard deviation for height curve way off. Computing the standard deviation doesn't take into account "flook" or isolated numbers. Their motto is "where there is one, there will be more". If the tallest person in the room is six one - - statistical sampling will just tack on a half an inch assuming there are people you didn't count. But it gets a bit freakier than that. I wondered what a "flat" Kurtosis number was. One might suppose it would be a number like One, or perhaps Zero. But it isn't It's the square root of three. But here's another freaky reality. Not all straight lines are equal. You've heard there are certain words you should never use the adjitive "vary" with, such as "pure" or "unique". To use such adjitives here is to diminish the primary word. Are all Straight Lines created equal? No. Longer straight lines are more "pointy" and short ones are "more blog like". So a flat series of numbers counting at some regular intervle, like an Excell copy command, will render different kurtosis figures depending on the length of the line. I discovered that a line of numbers PI length, or 6.28 - -which is twice pi - - will render a kurtosis with the square root of three. My version of kurtosis would have this as ONE. How I would compute kurtosis is to divide the number they give you by the square root of three and take THAT number and SQUARE it. So the "normal" kurtosis of three would remain unchanged. This three would mean (I think) that the node of peak of the curve would be 1.732 - - times the "two thirds" hash marks, as we'll call them. In my model as you know the hash mark reading was only 1.409. So what accounts for the difference? Here is is: The difference is the peak or tightness of the curve of the line at the top of the bell verses the curviture of the line as the ends of the bell. Is this is a higher ratio or difference, then your kurtosis will go up. I'm thinking a scenario where the curvature is equal or "trigonometric" model - - this would have a Kurtosis of 1.96 or roughly two. In case you are wondering I have never seen a kurtosis number lower than one, and these may not exist but don't hold me to that. Even a chart spread with a "hole" or "U" formation of scores, will still have a curtosis number of 1.2 or something. OK, I think you're up to speed now.

No comments: