Database Relationships Explained

From Dreamwidth Notes
Jump to: navigation, search

Database Relationships

Databases only store things in tables. However, most data as stored and handled in applications that use them (for example, Dreamwidth) comes in more complicated forms. These can often be broken down into simple tables with simple relationships mimicking the data structures used in code.

This guide lists some of the relationships between databases. Because this guide doesn't assume any knowledge of databases, I'm including a short description here as I use these terms further down.

One to One

(Also: HasOne)

Table A has a relationship to table B in that each entry in A has one and only one corresponding entry in B.

You can tell a one to one relationship because either table A has a field which references the primary key of table B, or table B's primary key is the same as table A (e.g. Database Table: userbio).

One to Many

(Also: BelongsToMany)

Table A has a relationship to table B such that many entries in B can refer to one entry in A.

You can tell a one to many relationship because table B has a field which references the primary key in table A and isn't B's primary key.

Many to Many or Link Tables

(Also: HasAndBelongsToMany)

Table A has a relationship to table B such that many entries to A can map to many entries in B.

Unlike the other relationships, a many to many relationship requires a third table, C, to define the relationship. C is a link table and has a field containing a reference to a primary key in table A and a reference to a primary key in B.

Things are further complicated by the fact that link tables can also contain data about the relationship. For example, user_schools is a link table defining the many to many relationship between schools and users. It also has two extra fields, year started and year ended.

Tree

Trees, threaded tables or Dynamic Acyclic Graphs (DAGs) are when an entry in table A has a relationship to another entry in table A which is considered its parent. Many entries can have the same parent and the topmost parent, the root of the tree, has no parent. Entries with parents are said to be their children, and two entries with a common parent are said to be siblings.

You can tell a table is a tree because it will have a field called some variant of parent_id which stores the primary key of the parent entry. Tree tables may also have further fields which store the order of the siblings (if it is important).

Comments in Dreamwidth are stored as trees.

Graph

A graph refers to the mathematical concept of a graph, and not to the kind of graph that is a chart, which is what most people commonly think of when they hear the word graph.

A graph has vertices and edges, and is the perfect concept for storing networks. An example of a network would be a railway network, where the train stations are the vertices and the train services operating between the stations are the edges.

The friends system in Dreamwidth is perfectly represented by a graph in which users are the vertices and the subscriptions/trusted access relations between users are the edges. In this case, the table users holds the vertices while wt_edges holds the edges.

A graph (in databases) is a special case of a many to many mapping where instead of a table C mapping together an entry in table A and an entry in table B, C maps an entry in table A onto another entry in table A.

Property Lists

Property lists are a form of dynamic tables.

Modern databases are required (by the first normal form) to have the same number of fields and the same field names for all entries in a table. This is good for storing data where you can know the size and structure of the table beforehand. However, if you can't know the structure of the table beforehand (because of the nature of the data you will be storing), then this requirement is a sincere pain. This is particularly true of properties.

For example, we have a user called user1. We want to add the ability to store twitter accounts into Dreamwidth. So the logical course of action would be to add a twitter field into the user table. If we have a lot of users, this adds a lot of bloat into our database. We can lessen that a bit by creating a new one to one mapping table called usertwitter. However, imagine we have to manipulate the user tables every time we wanted to add a property. This is very invasive and costly to do for large databases. And what would happen if we needed to delete a property? We could solve that problem by having changing fields per table entry, by the first normal form forbids that and all modern databases use the first normal form. What do we do?

Enter property lists. These consist of two tables, a property list, where we define the property name, it's default value and a primary key to identify it. We can also store more data (e.g. Dreamwidth typically stores long descriptions in its property lists). The second table maps a property list key to a value.

The arrangement above is a one to one mapping (each property list entry has one value). This is good for global properties (which Dreamwidth doesn't use). However, the relationship becomes many to many because Dreamwidth does use them defined per user or per journal entry. So, the user property table maps a user to a property, and each user can have many properties set, and each property type has a different value for different users.

All property lists in Dreamwidth follow a broad convention. The table *proplist contains the property list while *prop contains the property mapping and value. E.g. journal entries are stored in logprop and logproplist.