!@#%#$ Math

Oct. 3rd, 2002 12:27 am
bcholmes: (Default)
[personal profile] bcholmes

Okay, I can't figure out the math on this problem. Can anyone help me?

I have a bunch'a numbers. Response times from a web server. I round off these response times into nice numbers. Closest 100 milliseconds or sumpthin. For every 100 millisecond interval, I count the number of occurences, giving me a pretty graph, like this.

Now I want to do standard deviation stuff. I calculate the standard deviation of the response times. Turns out, in my case, to be about 300 milliseconds. Big spread. Clustered in the [0s-0.5s] range.

Now, standard deviations are ideal for drawing the dreaded bell curves:

And I know that for each sigma, we're covering a larger percentage of the occurrences. In the following picture, for example:

the red area is supposed to cover something like 68% of the data, if the arrow marks out one sigma.

What I ultimately want to do is to render a graph with the standard deviation curve superimposed over the bar chart:

My question: how the hell do I calculate meaningful Y values for the standard deviation curve? I figure that the mid-point should be 50% of my total number of occurrences. How do I get the other points?

(Did I mention that I didn't do all that well in statistics? Or Fourier Analysis, but that's a whole 'nuther story).

(no subject)

Date: 2002-10-03 12:39 am (UTC)
From: [identity profile] futabachan.livejournal.com
There's a very simple solution: just paste in some random graph that you've used elsewhere, and claim that it's really the graph of your response times. Hey, it worked for Bell Labs....

(no subject)

Date: 2002-10-03 05:44 am (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Cheeky.

Oh and: Nice image, Lisa.

(no subject)

Date: 2002-10-03 03:16 am (UTC)
ext_6279: (Default)
From: [identity profile] submarine-bells.livejournal.com
It's possible I'm misunderstanding what you're asking, or haven't figured out what you're trying to achieve here. But I'll give it a go anyway, and if I've misunderstood... well, you're no worse off that you were before I tossed my ten cents worth in, hey? :-)

So: I think the thing you're describing as a "standard deviation curve" (ie the bell curve thingie) is in fact a frequency distribution. That's what I've heard the term "bell curve" generally applied to, anyway. And if that's so, then the little bar graph you started with is pretty much the same thing. If you mark a point at the appropriate coordinate rather than drawing a bar for each frequency count, and then join the dots... voila, you have a frequency distribution curve.

Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.

The smaller your sample is compared to the total overall population its drawn from (in this case, that would be the population of all responses that webserver has ever given or will ever give), the more it's going to deviate from that lovely neat bell curve, simply due to randomness and sample selection factors. So with a relatively small sample of a a few dozen observations, your frequency distribution will be all over the place. If you took several hundred observations and graphed them, the graph would probably be a fair bit closer to a normal distribution (assuming that web server responses do actually fit that pattern). The greater the number of observations you took, the smoother and more even and neat your graph would probably be, and the more closely it would resemble the form of the overall population distribution of that variable.

And that's why we have inferential statistical tests, incidentally - they are tools for comparing data from a sample with data from another sample or with known population data, and attempting to determine whether or not both samples were drawn from the same original population.

Does that make sense?

(no subject)

Date: 2002-10-03 05:43 am (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Now, that won't give you a nice neat bell-curve, because those bell curve type graphs (a.k.a. "the normal distribution") generally show up when looking at data from very large samples, or for populations as a whole. And that's assuming your population does actually take the normal distribution form with respect to the variable that you're measuring - not all populations do.

Yeah, but I guess what I want to do is graph what the normal distribution curve would have been if the data had normal distribution. I would'a thought that there'd be some way to graph the bell-curve given a certain average value and a certain standard deviation, but damned if I can remember how.

What I remember from stats

Date: 2002-10-03 04:54 am (UTC)
From: [identity profile] sara-wolfe.livejournal.com
I *think* you already have a "curve that's slanted to the right". If I remember my terminology correctly (only been three months since I took stats, and already forgot!), but I think that's what it is. It is, as one of the people on here noted, a way to draw up frequencies.

Here's a useful site: http://www.robertniles.com/stats/

More: http://www.math.hmc.edu/~gu/math142/mellon/curves_and_surfaces/curves/bell.html

I'm not awake yet, so I might be confusing what you're asking for. Are you saying that the distribution you have is not normal, and you want to find a way to make it normal?

What you have does not look like a normal distribution since everything since you have a higher density on the right than to the left.

I have a whole department full of statisticians here, so let me know if you still need help.

Re: What I remember from stats

Date: 2002-10-03 05:32 am (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Oooo, this formula looks promising:



But what's capital-E in the equation?

What I'm trying to do is show the normal distribution compared to the actual frequency, so that people can get an immediate visual sense of the fact that the data isn't normalized (for whatever reason). If the data is reasonably normalized, I'd expect the bars to line up neatly under the curve. Make sense?

Re: What I remember from stats

Date: 2002-10-03 06:54 am (UTC)
From: [identity profile] the-fury.livejournal.com
Natural base of logarithms...

2.718282

Re: What I remember from stats

Date: 2002-10-03 07:08 am (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

That's capital-E? I thought that was little-e.

Re: What I remember from stats

Date: 2002-10-03 07:13 am (UTC)
From: [identity profile] the-fury.livejournal.com
Heh. Yeah, it's supposed to be. Probably be a typo on their part.

http://www.ruf.rice.edu/~lane/hyperstat/A25726.html

Gives the same thing with a small e.

(no subject)

Date: 2002-10-03 09:43 am (UTC)
ext_26535: Taken by Roya (Default)
From: [identity profile] starstraf.livejournal.com
Pooch says...
"She's already got her answer in the final provided formula I believe.

She has the standard deviation from her data and knows what the median point is. What she wants to do is create a normal distribution curve from that information and then superimpose that over the data she has. this will show readers how her data differs from a normal distribution with the same deviation and median. I believe the formula she has will do that for her.


Just looked it up and yes, that is the correct formula. Understanding and Using Statistics: Basic Concepts (2nd edition) by Schmidt pg 124 "

(no subject)

Date: 2002-10-03 09:01 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Tell Pooch I said "Thanks!"

"Math is hard"

Date: 2002-10-03 02:07 pm (UTC)
From: [identity profile] the-siobhan.livejournal.com

"Let's go shopping!"


(This completely gratuitous post brought to you by pain, frustration and he absolute sense of bogglement that anybody can understand this stuff. Wow.)

(no subject)

Date: 2002-10-03 05:22 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Hurray! Thanks for your help, everyone. Here's a sample of the type of graph that I'm trying to produce (you need .svg support in your browser to see this).

(no subject)

Date: 2002-10-03 05:29 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Ooops. My web server doesn't know the MIME type for .svg. That's annoying.

(no subject)

Date: 2002-10-03 05:55 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

Try this, then:

MIME type

Date: 2002-11-19 03:36 am (UTC)
From: (Anonymous)
It opened beautifully for me in Adobe.

My, what a nice graph. What does it tell us about how we should behave with regard to our choices?

Manny001

Profile

bcholmes: (Default)
BC Holmes

February 2025

S M T W T F S
      1
2345678
9101112131415
16171819202122
2324252627 28 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios