!@#%#$ Math
Oct. 3rd, 2002 12:27 amOkay, I can't figure out the math on this problem. Can anyone help me?
I have a bunch'a numbers. Response times from a web server. I round off these response times into nice numbers. Closest 100 milliseconds or sumpthin. For every 100 millisecond interval, I count the number of occurences, giving me a pretty graph, like this.
Now I want to do standard deviation stuff. I calculate the standard deviation of the response times. Turns out, in my case, to be about 300 milliseconds. Big spread. Clustered in the [0s-0.5s] range.
Now, standard deviations are ideal for drawing the dreaded bell curves:
And I know that for each sigma, we're covering a larger percentage of the occurrences. In the following picture, for example:
the red area is supposed to cover something like 68% of the data, if the arrow marks out one sigma.
What I ultimately want to do is to render a graph with the standard deviation curve superimposed over the bar chart:
My question: how the hell do I calculate meaningful Y values for the standard deviation curve? I figure that the mid-point should be 50% of my total number of occurrences. How do I get the other points?
(Did I mention that I didn't do all that well in statistics? Or Fourier Analysis, but that's a whole 'nuther story).
(no subject)
Date: 2002-10-03 03:16 am (UTC)So: I think the thing you're describing as a "standard deviation curve" (ie the bell curve thingie) is in fact a frequency distribution. That's what I've heard the term "bell curve" generally applied to, anyway. And if that's so, then the little bar graph you started with is pretty much the same thing. If you mark a point at the appropriate coordinate rather than drawing a bar for each frequency count, and then join the dots... voila, you have a frequency distribution curve.
Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.
The smaller your sample is compared to the total overall population its drawn from (in this case, that would be the population of all responses that webserver has ever given or will ever give), the more it's going to deviate from that lovely neat bell curve, simply due to randomness and sample selection factors. So with a relatively small sample of a a few dozen observations, your frequency distribution will be all over the place. If you took several hundred observations and graphed them, the graph would probably be a fair bit closer to a normal distribution (assuming that web server responses do actually fit that pattern). The greater the number of observations you took, the smoother and more even and neat your graph would probably be, and the more closely it would resemble the form of the overall population distribution of that variable.
And that's why we have inferential statistical tests, incidentally - they are tools for comparing data from a sample with data from another sample or with known population data, and attempting to determine whether or not both samples were drawn from the same original population.
Does that make sense?
(no subject)
Date: 2002-10-03 05:43 am (UTC)Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.
Yeah, but I guess what I want to do is graph what the normal distribution curve would have been if the data had normal distribution. I would'a thought that there'd be some way to graph the bell-curve given a certain average value and a certain standard deviation, but damned if I can remember how.