bcholmes: (Default)
[personal profile] bcholmes

I'm looking for database wisdom.

Database modellers often refer to "third normal form". Does this imply a first and second normal form. How are these levels defined?

(no subject)

Date: 2003-05-29 01:52 pm (UTC)
From: [identity profile] kalikanzara.livejournal.com
Ah, that brings back memories...

I don't remember well enough to describe it, but I did find a tutorial:

http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104

Yep

Date: 2003-05-29 02:06 pm (UTC)
From: [identity profile] the-fury.livejournal.com
All normal forms are cumulative.

1st. duplicate columns removed and new tables created based on any recurring data based on primary key. e.g. remove dependant_1, ... , dependant_4 and put them in their own dependants table.

2nd. Common groupings are removed and placed into a different table based on foreign keys. e.g. instead of placing city, province, and postal code you can prefill a table with postal code, province, and city and refer to it through the unique postal code.

3rd. All data is dependant on primary key. Everything else is removed.

(no subject)

Date: 2003-05-29 02:08 pm (UTC)
ext_26535: Taken by Roya (Default)
From: [identity profile] starstraf.livejournal.com
http://www.devshed.com/Server_Side/MySQL/Normal/Normal1/page4.html

1st - seperate out to tables
2nd - eliminate redundant data
3rd - make all realate to primary key

http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104

(no subject)

Date: 2003-05-29 02:29 pm (UTC)
From: [identity profile] purplesofa.livejournal.com
This is database knowledge. For database wisdom, you would have to ask "What data modelling rules are suitable for my situation?"

Third normal form is usually thought "sufficient" to optimize an operational database for transactions. (Fourth and fifth normal forms are mostly academic exercises.) Normalization can be done intuitively by eliminating redundancy.

Analytical databases should be optimized for query performance (fewer joins) so 3rd normal form does not apply. (Kimball's "dimensional" model is my favourite strategy for analytical databases, but opinions vary.)

In reality, database designs are often denormalized a bit for physical implementation. Controlled redundancy may have advantages.

I could go on and on...

(no subject)

Date: 2003-05-29 03:53 pm (UTC)
From: [identity profile] hellsop.livejournal.com
concur, regarding a little bit of denormalization.

An example of this kind of denormalization comes out really well with regard to province/state codes.

To be technically correct for the form, an address would not be allowed to put a province abbreviation directly into the database. Instead, there would be a code for the province, that combined with the country, would be keys into another table that contained the province name and its abbreviation. Similarly, the country wouldn't be stored as a country name or abbreviation in an address record, it would instead be a foreign key to a country table that would have the abbreviation, country name, etc. in it. outputing an address record to a label or display would require fetching the address record, then fetching the country and province records to get the abbreviations for the display.

Rarely does that level of strictness become handy for a lot of elements, and addresses will store the abbreviations directly, so that only one table fetch is required instead of three.

Several years ago, a coworker had the joy of aiding a group that was firmly convinced that rigorous normal form was desirable. They build the databs thusly and started importing data from the system that this one was to replace.

The first thousand records got loaded in about two seconds. The second thousand took about 10 seconds. The thrid thousand took about 45 seconds, and the fourth thousand about three minutes. They gave up trying to load data after approximately 10,000 records and it was possible to read every record completely during the time that it took to load it. They got 0.1% done.

(no subject)

Date: 2003-05-30 08:55 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

This is database knowledge. For database wisdom, you would have to ask "What data modelling rules are suitable for my situation?"

You are a sofa of great wisdom.

Re:

Date: 2003-05-30 09:14 pm (UTC)
From: [identity profile] purplesofa.livejournal.com
okkkkk...... (I don't really identify as the sofa, more as the owner of the sofa.)

I hope you weren't relying on a little LJ conversation to design a major database. A data architect's experience is called for in these matters.

(no subject)

Date: 2003-05-31 10:00 pm (UTC)
ext_28663: (Default)
From: [identity profile] bcholmes.livejournal.com

I hope you weren't relying on a little LJ conversation to design a major database.

No. Although I've had more than my fair share of designing databases. Like most computer knowledge I have, I learned databases through School of Hard Knocks. Often I'm missing some basic definitions, and this was one of them.

Re:

Date: 2003-06-01 06:51 pm (UTC)
From: [identity profile] purplesofa.livejournal.com
To be honest, I also learned information architecture by osmosis and experience. I have never written a test that required me to memorize the 5 normal forms. Which is a good thing, because every time I try, I get a brain-ache. (Odd, given that I found algebra easy in university, and normalization is just applied set theory.)

What I have learned, by listening to and reading a lot of more experienced data geeks, is that normalization is a crock. There are various examples of normalized-but-ridiculous design. There are more circumstances to denormalize than to normalize. The normalization rules say nothing about subtyping/inheritance, which is used to remove redundancy in many circumstances (but cannot be implemented in RDBMSes). And scientific normalization can only be done correctly after the semantics of the data are artfully defined.

Third normal form is mainly useful when cleaning up "really messy" or "flat" or "legacy" databases, and it should be treated as a general goal not as a nirvana state.

Now if only I could convince the Evil Project Manager of all the things I just said. He thinks normalization is nirvana, and that I am an incompetent loony for modelling "intuitively". I wish I had solid mathematical arguments for my position!

Profile

bcholmes: (Default)
BC Holmes

February 2025

S M T W T F S
      1
2345678
9101112131415
16171819202122
2324252627 28 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios