Explanations

From Dreamwidth Notes
Revision as of 11:29, 25 January 2015 by Rahaeli (Talk | contribs)

Jump to: navigation, search


Why do we do it that way?

There's a lot of stuff in the code that isn't always obvious at first glance, and the way of doing things can be confusing or bizarre. There are also a number of code-design choices that aren't always intuitive, but really do have very good reasons for doing it that way -- reasons people generally only learn the first time they have a pull request bounced back to them with a request for revision. This is an attempt to get some of them out of our heads and onto the screen. Some of them are issues that will come up in pull requests; some of them are just a "this is why we do things this way" documentation of design decisions or common conventions.

Why entry IDs are not in sequential order

When an entry ID is assigned -- the "1239582.html" part of an entry URL -- it isn't sequential from the last entry posted to the account. That is, entries in your account aren't numbered "1.html", "2.html", etc: they're assigned a number that's the journal entry number (the 'jitemid') + a random number (the 'anum') between 1 and 256.

This is done for two reasons: to slow down bots, spiders, and spammers from going through all entries in an account one-by-one, and to prevent someone from being able to quickly tell that they can't see an entry in a journal. (It prevents people from being able to tell at a glance that you posted an entry that they're filtered out of reading.)

Why you can't just change text in a translation string in a patch

If you want to change some text that appears on a page, and the page has already been English-stripped, you can't just change the text in the translation file. That is, if you have code that's referencing the translation string "example.foo.string", and you want to change the text in example.foo.string, you can't just edit en.dat so the version of the string in en.dat is different. Instead, you have to change the call to the string by removing the old string and referencing a new one (in this case, convention would be to change the code so it's referencing "example.foo.string2" and to put "example.foo.string" in deadphrases.dat) and put your new text in the new string.

This is because in many cases, the version of the text that's in the code is not always the version of the text that's on the live site -- site administrators can edit the text "on the fly" by using the site's translation system to change the text that's shown to users of dreamwidth.org. That doesn't change the text string in the code, though. To avoid overwriting all those changes that have been made over time, the site admins don't allow texttool.pl, the script that loads and manages translation strings, to overwrite the version of the text that's displayed on the live site: we assume that if the version in Github and the version on the live site are different, the version on the live site is the preferred version. So, if you only change the translation string in the associated textfile, it will never be loaded onto the live site.

Newly-created strings, however, are loaded without any problem -- so all text changes have to be done that way.

Why we rate-limit failed logins

If you input a wrong password more than 3 times, the login page starts to implement a "backoff" -- it won't let you try again until five minutes has passed. (And on subsequent login failures, the rate limit increases.) This is to prevent automated bots from trying to break into accounts by trying every password in a password file or dictionary. (Sometimes people ask "but who would want to break into my account on a journal site" -- the answer is, spammers. Accounts that already exist, that have been frequently linked to, and that have a history of content have more "search engine juice": breaking into those and using them to post spam gives the spammers a wider reach.)

Why the base directory is $LJHOME instead of $DWHOME

...and why there are a few more references to LJ in the code or in variable names: basically, those things were so deeply baked into the code that if we changed things, it had the possibility of introducing a ton of bugs for no good reason. Since end-users never see those things -- they're development-only -- we felt that the risk of introducting bugs (many of which had the potential for being very subtle and hard to diagnose) was too high to muck about with things that had been working for a decade.

Why you can only scroll back so far on the reading page or the Recent Entries page

Performance reasons. Older entries have a much lower chance of being cached, so the further back you scroll, the more you have to hit the database directly instead of pulling an entry out of the cache. Disabling pages that call larger number of entries, such as the reading page or the Recent Entries page once you get past a certain skip value, reduces the server load.

User-level action logging: when to use infohistory vs userlog

There are two systems for logging "this user did this thing at this time" type events: statushistory (/admin/statushistory, uses prop 'historyview') and userlog (/admin/userlog, uses prop canview:userlog). There isn't really a great rubric for picking one over the other. Roughly speaking, statushistory is for system-level events that were instituted by the application itself or by a site admin (suspension/unsuspension, payments, priv addition/deletion, many console commands, etc) and userlog is for account-level changes that were instituted by the user (entry deletion, icon deletion, community maintainer changes), but there are a few exceptions. (Renames go in statushistory, for instance.)

If you can't figure out which logging system to use, go ahead and ask somebody. Factors to keep in mind: with statushistory, you can search by account ("show me all the things that have been done to the account [info]denise"), by event type ("show me all recent suspends"), or by admin who took the action ("show me all actions [info]denise has taken, on any account"), while userlog can only search by target account ("show me all the things that have been done to the account [info]denise"). Userlog shows all things that have been done to that account, back to account creation, while statushistory is limited to the last 1000 actions. Statushistory only logs the action (and any notes generated by the code/any comments included in the console command), while userlog also logs the IP address and uniq cookie that the request came from. These, and various other factors, can influence which logging option makes the most sense.

dversion: old database revisions

You may occasionally see reference to 'dversion' in the code. This stands for 'data version', and was used on LJ for times when the site's data structure changed in such a way as to be incompatable with the old way of doing things and the change was done slowly over a period of time to avoid slamming the DB too hard with one mass update. Incrementing the dversion means that you can check whether a particular user was converted to the new way of doing things or not, so you know what style of data you're getting back for a particular account.

This is mostly legacy: Dreamwidth has only made one dversion change, from 8 to 9. There are still some lingering remnants in the code, however, although we have tried to clean that up a bit. For historical purposes, the old dversion changes were:

  • 0: the first version of LJ, in which there were no database clusters.
  • 1: the first pass at database clustering, in which user journals were clustered but icons weren't.
  • 2: the first version with all user data fully clustered.
  • 3: conversion to 'weekuserusage', an old (now obsolete) way of tracking user activity.
  • 4: clustering of the userproplite2 and userproplist tables, and allowing userprops to be stored on both the global cluster and the user's cluster ('multihomed' userprops)
  • 5: clustering of the S1 styles and S1 style overrides
  • 6: clustering of memories and memory keywords, plus friend groups
  • 7: clustering of all icons with keywords and support for icon comments
  • 8: clustering of polls

We then incremented to dversion 9, changing how icons were stored and accessed to allow for icon renaming.


Outdated terminology that we can't shake

In some cases, you may see people who have worked on the code for ages use terminology that's really outdated. Some of these include:

lastn

Occasionally used instead of 'Recent' or 'Recent Entries'. Under S1, the original customization system, 'lastn' stood for "last N entries" -- ie, the page that showed the N most recent entries, based on what the journal owner had chosen for the number of entries to show.

"friend"-related terms

When we broke LJ's concept of 'friend' into the component parts of "I want to read you" (subscription) vs "I want to authorize you to read my locked stuff" (access), we changed a lot of stuff. You may still sometimes hear:

  • friends page: the Reading page
  • friendsfriends: the Network
  • friends list or flist: the Circle
  • friend group: access (or, sometimes but rarely, subscription) filter
  • checkfriends: the API function to query whether or not new entries had been posted to the friends page

userpic

The user-facing text and all the documentation (should) refer to them as 'icons', which is what we standardized on, but on LJ they were called 'userpics' and 'icons' interchangeably for so long that some of us can't shake calling them 'userpics'. You should use 'icon', though.