!@#%#$ Math
Oct. 3rd, 2002 12:27 amOkay, I can't figure out the math on this problem. Can anyone help me?
I have a bunch'a numbers. Response times from a web server. I round off these response times into nice numbers. Closest 100 milliseconds or sumpthin. For every 100 millisecond interval, I count the number of occurences, giving me a pretty graph, like this.
Now I want to do standard deviation stuff. I calculate the standard deviation of the response times. Turns out, in my case, to be about 300 milliseconds. Big spread. Clustered in the [0s-0.5s] range.
Now, standard deviations are ideal for drawing the dreaded bell curves:
And I know that for each sigma, we're covering a larger percentage of the occurrences. In the following picture, for example:
the red area is supposed to cover something like 68% of the data, if the arrow marks out one sigma.
What I ultimately want to do is to render a graph with the standard deviation curve superimposed over the bar chart:
My question: how the hell do I calculate meaningful Y values for the standard deviation curve? I figure that the mid-point should be 50% of my total number of occurrences. How do I get the other points?
(Did I mention that I didn't do all that well in statistics? Or Fourier Analysis, but that's a whole 'nuther story).
(no subject)
Date: 2002-10-03 12:39 am (UTC)(no subject)
Date: 2002-10-03 05:44 am (UTC)Cheeky.
Oh and: Nice image, Lisa.
(no subject)
Date: 2002-10-03 03:16 am (UTC)So: I think the thing you're describing as a "standard deviation curve" (ie the bell curve thingie) is in fact a frequency distribution. That's what I've heard the term "bell curve" generally applied to, anyway. And if that's so, then the little bar graph you started with is pretty much the same thing. If you mark a point at the appropriate coordinate rather than drawing a bar for each frequency count, and then join the dots... voila, you have a frequency distribution curve.
Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.
The smaller your sample is compared to the total overall population its drawn from (in this case, that would be the population of all responses that webserver has ever given or will ever give), the more it's going to deviate from that lovely neat bell curve, simply due to randomness and sample selection factors. So with a relatively small sample of a a few dozen observations, your frequency distribution will be all over the place. If you took several hundred observations and graphed them, the graph would probably be a fair bit closer to a normal distribution (assuming that web server responses do actually fit that pattern). The greater the number of observations you took, the smoother and more even and neat your graph would probably be, and the more closely it would resemble the form of the overall population distribution of that variable.
And that's why we have inferential statistical tests, incidentally - they are tools for comparing data from a sample with data from another sample or with known population data, and attempting to determine whether or not both samples were drawn from the same original population.
Does that make sense?
(no subject)
Date: 2002-10-03 05:43 am (UTC)Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.
Yeah, but I guess what I want to do is graph what the normal distribution curve would have been if the data had normal distribution. I would'a thought that there'd be some way to graph the bell-curve given a certain average value and a certain standard deviation, but damned if I can remember how.
What I remember from stats
Date: 2002-10-03 04:54 am (UTC)Here's a useful site: http://www.robertniles.com/stats/
More: http://www.math.hmc.edu/~gu/math142/mellon/curves_and_surfaces/curves/bell.html
I'm not awake yet, so I might be confusing what you're asking for. Are you saying that the distribution you have is not normal, and you want to find a way to make it normal?
What you have does not look like a normal distribution since everything since you have a higher density on the right than to the left.
I have a whole department full of statisticians here, so let me know if you still need help.
Re: What I remember from stats
Date: 2002-10-03 05:32 am (UTC)Oooo, this formula looks promising:
But what's capital-E in the equation?
What I'm trying to do is show the normal distribution compared to the actual frequency, so that people can get an immediate visual sense of the fact that the data isn't normalized (for whatever reason). If the data is reasonably normalized, I'd expect the bars to line up neatly under the curve. Make sense?
Re: What I remember from stats
Date: 2002-10-03 06:54 am (UTC)2.718282
Re: What I remember from stats
Date: 2002-10-03 07:08 am (UTC)That's capital-E? I thought that was little-e.
Re: What I remember from stats
Date: 2002-10-03 07:13 am (UTC)http://www.ruf.rice.edu/~lane/hyperstat/A25726.html
Gives the same thing with a small e.
(no subject)
Date: 2002-10-03 09:43 am (UTC)"She's already got her answer in the final provided formula I believe.
She has the standard deviation from her data and knows what the median point is. What she wants to do is create a normal distribution curve from that information and then superimpose that over the data she has. this will show readers how her data differs from a normal distribution with the same deviation and median. I believe the formula she has will do that for her.
Just looked it up and yes, that is the correct formula. Understanding and Using Statistics: Basic Concepts (2nd edition) by Schmidt pg 124 "
(no subject)
Date: 2002-10-03 09:01 pm (UTC)Tell Pooch I said "Thanks!"
"Math is hard"
Date: 2002-10-03 02:07 pm (UTC)"Let's go shopping!"
(This completely gratuitous post brought to you by pain, frustration and he absolute sense of bogglement that anybody can understand this stuff. Wow.)
(no subject)
Date: 2002-10-03 05:22 pm (UTC)Hurray! Thanks for your help, everyone. Here's a sample of the type of graph that I'm trying to produce (you need .svg support in your browser to see this).
(no subject)
Date: 2002-10-03 05:29 pm (UTC)Ooops. My web server doesn't know the MIME type for .svg. That's annoying.
(no subject)
Date: 2002-10-03 05:55 pm (UTC)Try this, then:
MIME type
Date: 2002-11-19 03:36 am (UTC)My, what a nice graph. What does it tell us about how we should behave with regard to our choices?
Manny001