Advanced Normalization
After 3NF (Third Normal Form), all
normalization problems involve only tables
which have three or more columns and all the
columns are keys. Many practitioners argue
that placing entities in 3NF is generally
sufficient because it is rare that
entities that are in 3NF are not also in
4NF and 5NF. They further argue that the
benefits gained from transforming
entities into 4NF and 5NF are so slight
that it is not worth the effort.
However, advanced normal forms are
presented because there are cases where
they are required.
Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF) is a
more rigorous version of the 3NF deal
with relational tables that had (a)
multiple candidate keys, (b) composite
candidate keys, and (c) candidate keys
that overlapped .
BCNF is based on the concept of
determinants. A determinant column is
one on which some of the columns are
fully functionally dependent.
A relational table is in BCNF if and
only if every determinant is a candidate
key.
Fourth Normal Form
A relational table is in the
fourth normal form (4NF) if it is in
BCNF and all multivalued dependencies
are also functional dependencies.
Fourth normal form (4NF) is based on
the concept of multivalued
dependencies (MVD). A Multivalued
dependency occurs when in a relational
table containing at least three columns,
one column has multiple rows whose
values match a value of a single row of
one of the other columns. A more formal
definition given by Date is:
given a relational table R with
columns A, B, and C then
R.A —>> R.B (column A
multidetermines column B)
is true if and only if the set of
B-values matching a given pair of
A-values and C-values in R depends
only on the A-value and is
independent of the C-value.
MVD always occur in pairs. That
is R.A —>> R.B holds if and only if
R.A —>> R.C also holds.
Suppose that employees can be
assigned to multiple projects. Also
suppose that employees can have multiple
job skills. If we record this
information in a single table, all three
attributes must be used as the key since
no single attribute can uniquely
identify an instance.
The relationship between emp# and
prj# is a multivalued dependency because
for each pair of emp#/skill values in
the table, the associated set of prj#
values is determined only by emp# and is
independent of skill. The relationship
between emp# and skill is also a
multivalued dependency, since the set of
Skill values for an emp#/prj# pair is
always dependent upon emp# only.
To transform a table with multivalued
dependencies into the 4NF move each MVD
pair to a new table. The result is shown
in Figure1.
Figure 1: Tables in
4NF
Fifth Normal Form
A table is in the fifth normal
form (5NF) if it cannot have a
lossless decomposition into any number
of smaller tables.
While the first four normal forms are
based on the concept of functional
dependence, the fifth normal form is
based on the concept of join dependence.
Join dependency means that an table,
after it has been decomposed into three
or more smaller tables, must be capable
of being joined again on common keys to
form the original table. Stated another
way, 5NF indicates when an entity cannot
be further decomposed. 5NF is complex
and not intuitive. Most experts agree
that tables that are in the 4NF are also
in 5NF except for "pathological"
cases. Teorey suggests that true
many-to-many-to-many ternary relations
are one such case.
Adding an instance to an table that
is not in 5NF creates spurious results
when the tables are decomposed and then
rejoined. For example, let's suppose
that we have an employee who uses design
skills on one project and programming
skills on another. This information is
shown below.
|
emp#
|
prj#
|
skill
|
|
|
1211
|
11
|
Design
|
|
1211
|
28
|
Program
|
Next we add an
employee (1544) who uses programming
skills on Project 11.
|
emp#
|
prj#
|
skill
|
|
|
1211
|
11
|
Design
|
|
1211
|
28
|
Program
|
|
1544
|
11
|
Program
|
Next, we project
this information into three tables
as we did above. However, when we
rejoin the tables, the recombined
table contains spurious results.
|
emp#
|
prj#
|
skill
| |
|
|
1211
|
11
|
Design
| |
|
1211
|
11
|
Program
| <<—spurious data |
|
1211
|
28
|
Program
| |
|
1544
|
11
|
Design
| <<—spurious data |
|
1544
|
11
|
Program
| |
By adding one new
instance to a table not in 5NF, two
false assertions were stated:
Assertion 1
- Employee 1211 has been assigned
to Project 11.
- Project 11 requires programming
skills.
- Therefore, Employee 1211 must
use programming skills while
assigned to Project 11.
Assertion 2
- Employee 1544 has been assigned
to project 11.
- Project 11 needs Design skills.
- Therefore, Employee 1544 must
use Design skills in Project 11.
|