Database geekery
May. 29th, 2003 04:22 pmI'm looking for database wisdom.
Database modellers often refer to "third normal form". Does this imply a first and second normal form. How are these levels defined?
I'm looking for database wisdom.
Database modellers often refer to "third normal form". Does this imply a first and second normal form. How are these levels defined?
(no subject)
Date: 2003-05-29 01:52 pm (UTC)I don't remember well enough to describe it, but I did find a tutorial:
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104
Yep
Date: 2003-05-29 02:06 pm (UTC)1st. duplicate columns removed and new tables created based on any recurring data based on primary key. e.g. remove dependant_1, ... , dependant_4 and put them in their own dependants table.
2nd. Common groupings are removed and placed into a different table based on foreign keys. e.g. instead of placing city, province, and postal code you can prefill a table with postal code, province, and city and refer to it through the unique postal code.
3rd. All data is dependant on primary key. Everything else is removed.
(no subject)
Date: 2003-05-29 02:08 pm (UTC)1st - seperate out to tables
2nd - eliminate redundant data
3rd - make all realate to primary key
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104
(no subject)
Date: 2003-05-29 02:29 pm (UTC)Third normal form is usually thought "sufficient" to optimize an operational database for transactions. (Fourth and fifth normal forms are mostly academic exercises.) Normalization can be done intuitively by eliminating redundancy.
Analytical databases should be optimized for query performance (fewer joins) so 3rd normal form does not apply. (Kimball's "dimensional" model is my favourite strategy for analytical databases, but opinions vary.)
In reality, database designs are often denormalized a bit for physical implementation. Controlled redundancy may have advantages.
I could go on and on...
(no subject)
Date: 2003-05-29 03:53 pm (UTC)An example of this kind of denormalization comes out really well with regard to province/state codes.
To be technically correct for the form, an address would not be allowed to put a province abbreviation directly into the database. Instead, there would be a code for the province, that combined with the country, would be keys into another table that contained the province name and its abbreviation. Similarly, the country wouldn't be stored as a country name or abbreviation in an address record, it would instead be a foreign key to a country table that would have the abbreviation, country name, etc. in it. outputing an address record to a label or display would require fetching the address record, then fetching the country and province records to get the abbreviations for the display.
Rarely does that level of strictness become handy for a lot of elements, and addresses will store the abbreviations directly, so that only one table fetch is required instead of three.
Several years ago, a coworker had the joy of aiding a group that was firmly convinced that rigorous normal form was desirable. They build the databs thusly and started importing data from the system that this one was to replace.
The first thousand records got loaded in about two seconds. The second thousand took about 10 seconds. The thrid thousand took about 45 seconds, and the fourth thousand about three minutes. They gave up trying to load data after approximately 10,000 records and it was possible to read every record completely during the time that it took to load it. They got 0.1% done.
(no subject)
Date: 2003-05-30 08:55 pm (UTC)This is database knowledge. For database wisdom, you would have to ask "What data modelling rules are suitable for my situation?"
You are a sofa of great wisdom.
Re:
Date: 2003-05-30 09:14 pm (UTC)I hope you weren't relying on a little LJ conversation to design a major database. A data architect's experience is called for in these matters.
(no subject)
Date: 2003-05-31 10:00 pm (UTC)I hope you weren't relying on a little LJ conversation to design a major database.
No. Although I've had more than my fair share of designing databases. Like most computer knowledge I have, I learned databases through School of Hard Knocks. Often I'm missing some basic definitions, and this was one of them.
Re:
Date: 2003-06-01 06:51 pm (UTC)What I have learned, by listening to and reading a lot of more experienced data geeks, is that normalization is a crock. There are various examples of normalized-but-ridiculous design. There are more circumstances to denormalize than to normalize. The normalization rules say nothing about subtyping/inheritance, which is used to remove redundancy in many circumstances (but cannot be implemented in RDBMSes). And scientific normalization can only be done correctly after the semantics of the data are artfully defined.
Third normal form is mainly useful when cleaning up "really messy" or "flat" or "legacy" databases, and it should be treated as a general goal not as a nirvana state.
Now if only I could convince the Evil Project Manager of all the things I just said. He thinks normalization is nirvana, and that I am an incompetent loony for modelling "intuitively". I wish I had solid mathematical arguments for my position!