Database Table Overview

From Dreamwidth Notes
Revision as of 20:19, 20 August 2009 by Aphenine (Talk | contribs)

Jump to: navigation, search

Note: This page has been started by a complete newbie, who is trying to make sense of the database table structure as she goes and is writing the information down. As a result, the details may be inaccurate and so on. Hopefully, it's still better than nothing to the equally clueless! ;-) Aphenine 22:36, 28 July 2009 (UTC)

Introduction

All the information that is entered into Dreamwidth (like with most web apps), gets stored in a database. So understanding the table structure can help you understand how Dreamwidth is storing things. That's the theory, anyway. It also helps in debugging, extending Dreamwidth and so on. So it's good to know, even if you're not planning on altering the tables or the interfacing code directly.

This section provides an overview on the tables, what they do and how they map to what you see. Enough information so that you get up and running, anyway. That's the plan.

Database Relationships

Databases only store things in tables. However, most data as stored and handled (for example, as Dreamwidth does in Perl) comes in more complicated forms. These can often be broken down into simple tables with simple relationships mimicking the data structures used in code.

This guide lists some of the relationships between databases. Because this guide doesn't assume any knowledge of databases, I'm including a short description here as I use these terms further down.

One to One

(Also: HasOne)

Table A has a relationship to table B in that each entry in A has one and only one corresponding entry in B.

You can tell a one to one relationship because either table A has a field which references the primary key of table B, or table B's primary key is the same as table A (e.g. Database Table: userbio).

One to Many

(Also: BelongsToMany)

Table A has a relationship to table B such that many entries in B can refer to one entry in A.

You can tell a one to many relationship because table B has a field which references the primary key in table A and isn't B's primary key.

Many to Many or Link Tables

(Also: HasAndBelongsToMany)

Table A has a relationship to table B such that many entries to A can map to many entries in B.

Unlike the other relationships, a many to many relationship requires a third table, C, to define the relationship. C is a link table and has a field containing a reference to a primary key in table A and a reference to a primary key in B.

Things are further complicated by the fact that link tables can also contain data about the relationship. For example, user_schools is a link table defining the many to many relationship between schools and users. It also has two extra fields, year started and year ended.

Tree

Trees, threaded tables or Dynamic Acyclic Graphs (DAGs) are when an entry in table A has a relationship to another entry in table A which is considered its parent. Many entries can have the same parent and the topmost parent, the root of the tree, has no parent. Entries with parents are said to be their children, and two entries with a common parent are said to be siblings.

You can tell a table is a tree because it will have a field called some variant of parent_id which stores the primary key of the parent entry. Tree tables may also have further fields which store the order of the siblings (if it is important).

Comments in Dreamwidth are stored as trees.

Graph

A graph refers to the mathematical concept of a graph, and not to the kind of graph that is a chart, which is what most people commonly think of when they hear the word graph.

A graph has vertices and edges, and is the perfect concept for storing networks. An example of a network would be a railway network, where the train stations are the vertices and the train services operating between the stations are the edges.

The friends system in Dreamwidth is perfectly represented by a graph in which users are the vertices and the subscriptions/trusted access relations between users are the edges. In this case, the table users holds the vertices while wt_edges holds the edges.

A graph (in databases) is a special case of a many to many mapping where instead of a table C mapping together an entry in table A and an entry in table B, C maps an entry in table A onto another entry in table A.

Property Lists

Property lists are a form of dynamic tables.

Modern databases are required (by the first normal form) to have the same number of fields and the same field names for all entries in a table. This is good for storing data where you can know the size and structure of the table beforehand. However, if you can't know the structure of the table beforehand (because of the nature of the data you will be storing), then this requirement is a sincere pain. This is particularly true of properties.

For example, we have a user called user1. We want to add the ability to store twitter accounts into Dreamwidth. So the logical course of action would be to add a twitter field into the user table. If we have a lot of users, this adds a lot of bloat into our database. We can lessen that a bit by creating a new one to one mapping table called usertwitter. However, imagine we have to manipulate the user tables every time we wanted to add a property. This is very invasive and costly to do for large databases. And what would happen if we needed to delete a property? We could solve that problem by having changing fields per table entry, by the first normal form forbids that and all modern databases use the first normal form. What do we do?

Enter property lists. These consist of two tables, a property list, where we define the property name, it's default value and a primary key to identify it. We can also store more data (e.g. Dreamwidth typically stores long descriptions in its property lists). The second table maps a property list key to a value.

The arrangement above is a one to one mapping (each property list entry has one value). This is good for global properties (which Dreamwidth doesn't use). However, the relationship becomes many to many because Dreamwidth does use them defined per user or per journal entry. So, the user property table maps a user to a property, and each user can have many properties set, and each property type has a different value for different users.

All property lists in Dreamwidth follow a broad convention. The table *proplist contains the property list while *prop contains the property mapping and value. E.g. journal entries are stored in logprop and logproplist.

Tables by Concept

This section attempts to map the concepts you know (for example, from having used the site) with the database tables that represent them, to make finding things easy.

Users

Details about a user are held in the table Database Table: user and this forms the centre of how user information is stored about users.

There are many other tables which help in storing information about users, many (but not all) of them prefixed by the phrase user. We divide these other tables into three groups for ease of description:

  • One to one and one to many tables, which basically hold extra information about each user.
  • Link tables which link users to other concepts (like schools in the schools table, for example).
  • User Property Lists (see #Property Lists above).

Also, there is a table dealing with linking users to users (in a graph structure) called wt_edges. This is described further under the friends concept.

Extra Information

The tables holding extra information on a user are:

  • Database Table: birthdays: One to one mapping of the user's next birthday. Original birthday information is held in the user table.
  • Database Table: userbio: One to one mapping of a user biography onto a user.
  • Database Table: userlog: One to many mapping of user events (e.g. account create) onto a user, showing the time the event happened. This table logs user activity.

Link Tables to Other Concepts

These tables link the concept of a user with other concepts.

User Proplists

Database Table: userproplist and Database Table: userproplite2 form a proplist pair for defining user properties (such as the name of a twitter account).

Posts

Thanks to the history of how Dreamwidth developed, the name for a journal (or journal posts) is in fact a log inside the databases (see #BradLog for further details). Therefore, the tables starting with log hold the journal posts, and any other tables with log in the refer to journals, unless they refer to the process of logging data that DW generates (e.g. the table userlog). As you can see, there's a little room for confusion.

Posts are split across two tables: log2 and logtext2. The former defines the post and all its details (e.g. security settings), while the second one actually has the text that makes up a journal post.

There is also a proplist for individual journal entries in the pair of tables logproplist and logprop. These allow the setting of properties for individual journal entries.

Comments

Thanks to the history of how Dreamwidth developed, the name for comments is in fact talk inside the databases (and also in the name of the comment bml pages, see #BradLog for further details).

Comments are held in two tables, Database Table: talk2 and Database Table: talktext2. The actual text is held in talktext2, and talk2 holds the structure of the comment, along with who posted it.

Tags

Thanks to the history of how Dreamwidth developed, inside the databases tags are called keywords (see #BradLog for further details).

The table keyword itself is legacy and does not appear to be used anymore. Tags are instead held in userkeywords and usertags.

The table usertags appears to be a tree but holds no tag names, while userkeywords holds tag names but no structure.

Tags are mapped to individual posts in the logtags table.

Appendix

BradLog