Difference between revisions of "English-stripping"

From Dreamwidth Notes
Jump to: navigation, search
(Unknown data: Add summary of plural handling)
(Unknown data: s/sophie/mark/ where appropriate)
 
(15 intermediate revisions by 6 users not shown)
Line 1: Line 1:
[[Category:Development]]
+
{{Update|text=BML::ml is deprecated and you should use LJ::Lang::ml instead. LJ::Lang::get_text isn't explained.}}
 +
 
 
English-stripping a page refers to the process of taking out hardcoded English text from the BML pages, giving them an ID you can use to refer to the string, and then putting the original English text in a lookup file. In this way, you're stripping the BML files of any English text, hence the name. This is useful because by doing this, it's easy to support multiple languages; the text for different languages is held in the database and can be looked up by the aforementioned ID.
 
English-stripping a page refers to the process of taking out hardcoded English text from the BML pages, giving them an ID you can use to refer to the string, and then putting the original English text in a lookup file. In this way, you're stripping the BML files of any English text, hence the name. This is useful because by doing this, it's easy to support multiple languages; the text for different languages is held in the database and can be looked up by the aforementioned ID.
  
Although Dreamwidth Studios itself won't be supporting any language other than English, it's still important to learn how to English-strip pages as it means our Site Copy team can change text as necessary on the site without having to go through the code, and also because we want other users of the code to be able to implement other languages if they want to with the minimum of hassle. (for both of these reasons, we're also going to be replacing the current translation system with something better - although to be perfectly honest, that's not going to be ''too'' hard.)
+
Developers with experience on the LiveJournal code may refer to this as "translation" or "the translation system". Although Dreamwidth Studios itself won't be supporting any language other than English, it's still important to learn how to English-strip pages as it means our Site Copy team can change text as necessary on the site without having to go through the code, and also because we want other users of the code to be able to implement other languages if they want to with the minimum of hassle.
  
== Glossary ==
+
= Glossary =
  
 
First, a bit of explanation about some of the terms we're going to use:
 
First, a bit of explanation about some of the terms we're going to use:
Line 10: Line 11:
 
* '''String''': this refers to a piece of text. For example, this sentence can be considered a string. We'll normally use this when referring to the text that a multi-language ID refers to (defined below).
 
* '''String''': this refers to a piece of text. For example, this sentence can be considered a string. We'll normally use this when referring to the text that a multi-language ID refers to (defined below).
  
== The Anatomy of a Multi-Language ID ==
+
* '''Multi-Language IDs''': what replaces English text in a file. There are two types of ID - global IDs (which can be used by any BML page) and page-specific IDs (which are only valid on one page). When you English-strip a file, you will almost always be using page-specific IDs, but it's helpful to know about global IDs anyway.
 
+
The IDs that replace English text in a BML page are referred to as Multi-Language IDs. There are two types of ID - global IDs (which can be used by any BML page) and page-specific IDs (which are only valid on one page). When you English-strip a file, you will almost always be using page-specific IDs, but it's helpful to know about global IDs anyway.
+
  
=== Global IDs ===
+
== Global IDs ==
  
 
Global IDs are, for the most part, defined in <code>bin/upgrading/en.dat</code>. However, for features specific to Dreamwidth Studios (and unusable by any other site using our code), any corresponding global IDs will be defined in <code>bin/upgrading/en_DW.dat</code> instead. For example, the Tropospherical sitescheme strings are stored in <code>en_DW.dat</code> since Tropospherical is specific to DWS, and because the strings appear in every page, page-specific IDs can't be used.
 
Global IDs are, for the most part, defined in <code>bin/upgrading/en.dat</code>. However, for features specific to Dreamwidth Studios (and unusable by any other site using our code), any corresponding global IDs will be defined in <code>bin/upgrading/en_DW.dat</code> instead. For example, the Tropospherical sitescheme strings are stored in <code>en_DW.dat</code> since Tropospherical is specific to DWS, and because the strings appear in every page, page-specific IDs can't be used.
Line 26: Line 25:
 
Each section name should be lower-case and use only letters, digits, and the underscore and hyphen characters. (There aren't actually any set rules for the characters you can use in IDs in the code, but this is how it's been done so far.) The number of sections in an ID is arbitrary, as are the section names themselves. However, you should always have at least two sections in an ID for ease of use.
 
Each section name should be lower-case and use only letters, digits, and the underscore and hyphen characters. (There aren't actually any set rules for the characters you can use in IDs in the code, but this is how it's been done so far.) The number of sections in an ID is arbitrary, as are the section names themselves. However, you should always have at least two sections in an ID for ease of use.
  
=== Page-specific IDs ===
+
== Page-specific IDs ==
  
 
Page-specific IDs are defined in a file of the same name as the page it applies to with the additional extension <code>.text</code>. For example, for a page '''htdocs/login.bml''', the corresponding page-specific ID file will be '''htdocs/login.bml.text'''.
 
Page-specific IDs are defined in a file of the same name as the page it applies to with the additional extension <code>.text</code>. For example, for a page '''htdocs/login.bml''', the corresponding page-specific ID file will be '''htdocs/login.bml.text'''.
Line 37: Line 36:
  
 
Again, the actual names and number of sections is arbitrary, but you should always have your IDs follow the structural flow of the content of the page for ease of use.
 
Again, the actual names and number of sections is arbitrary, but you should always have your IDs follow the structural flow of the content of the page for ease of use.
 
== The Anatomy of a .text File ==
 
 
There isn't too much to learn about how a '''.text''' file works - it's pretty straightforward. For each ID referenced in the page, you put the name of the ID, an equals sign (=), then the English text stripped from the file. (We'll talk about how precisely to do that in the next few sections.) Ideally, you should have one string correspond to one unbroken line of English. (This doesn't mean just one ''sentence'' - it's perfectly valid to have whole paragraphs under one ID. Just make sure you don't have any HTML in a string, unless it's part of a sentence. (ie, don't include wrapping <code>&lt;p&gt;</code> tags, etc.)
 
 
For example, the <code>.createaccount.header</code> page-specific ID referred to in the last section is defined in the '''.text''' file like so:
 
 
<nowiki>.createaccount.header=Not a <?sitename?> member?</nowiki>
 
 
(the <code>&lt;?sitename?&gt;</code> part of this is a [[BML]] tag; for more information on these, see the linked page.)
 
 
It's possible to have a multi-line string in a '''.text''' file. You should never need to do this in a page-specific ID, but if you do, you simply replace the equals sign with two less-than signs (&lt;&lt;), and end the string with a dot on its own line. For example, here's the definition for the global ID <code>email.invitecoderequest.accept.body</code>:
 
 
<nowiki>email.invitecoderequest.accept.body<<</nowiki>
 
Your request for invites has been granted. You can view all your invite codes here:
 
 
<nowiki>  [[invitesurl]]</nowiki>
 
.
 
  
 
All IDs should be listed in alphabetical order, if possible.
 
All IDs should be listed in alphabetical order, if possible.
  
== English-stripping ==
+
= How to English-strip =
 
+
You might think, after learning the above, that English-stripping a page is fairly easy - and in theory, it is. In practice, however, you need to know at least something about how both Perl and HTML work.
+
 
+
=== How to English-strip: The Theory ===
+
 
+
In theory, English-stripping a page in BML is easy. BML has specific tags for English-stripping, which means that in a ''normal'' BML page you would normally be able to follow a simple set of steps:
+
 
+
* Identify the text to be stripped. For example, you may have a line in your BML page that says:
+
<nowiki><p>Enter your invite code below:</p></nowiki>
+
: In this part, the English text to be stripped is "Enter your invite code below:".
+
 
+
* Cut and paste this out, and replace it with an <code>&lt;?_ml ... _ml?&gt;</code> tag, where "..." represents a multi-language ID, as described above. Remember, page-specific IDs always begin with a dot.
+
 
+
: For example, this one might be <code>.createaccount.enter_invite_code</code>, in which case your replaced line will be:
+
<nowiki><p><?_ml .createaccount.enter_invite_code _ml?></p></nowiki>
+
 
+
* Put the line into the corresponding '''.text''' file, as described above:
+
 
+
.createaccount.enter_invite_code=Enter your invite code below:
+
 
+
* You're done. Rinse and repeat.
+
 
+
=== How to English-strip: The Reality ===
+
 
+
Unfortunately, English-stripping is rarely as easy as the theory goes, for several reasons:
+
 
+
* Most of the Dreamwidth BML files are actually glorified Perl scripts and have virtually nothing in them that isn't Perl code of some description, and the above may not work.
+
* Even when this isn't the case, sometimes you'll want to be able to specify parts in the text that you don't know what the value will be when you're doing the stripping - for example, the username of the currently logged-in user. This isn't possible using the <code>&lt;?_ml ... _ml?&gt;</code> tag, and needs to be done with Perl.
+
* Some things that look like English text that should be stripped are actually signals to the code to take a certain action. For example, you can safely assume that any value within a hidden INPUT tag is probably not one that should be stripped, regardless of how English it looks. (You can always ask one of our resident code gurus in [[IRC]] if you're not sure, though.)
+
 
+
So it's pretty much assured that in order to be able to English-strip, you need to know a little about how Perl works. (Not too much!) I'll go over these basics here.
+
  
==== A basic example ====
+
== A basic example ==
  
In Perl, literal text strings (that is, text which is mostly left unchanged) are represented by surrounding quote marks. For example:
+
In Perl, literal text strings are represented by surrounding quote marks. For example:
  
 
<source lang="perl">
 
<source lang="perl">
Line 99: Line 49:
 
</source>
 
</source>
  
The "\n" in this example is called a 'newline', and signals to Perl that it should start a new line when it encounters it. (This won't appear as a new line in a browser unless a tag like '''&lt;p&gt;''' or '''&lt;br&gt;''' is used, but can be helpful to keep the source tidy.)
+
The "\n" in this example is called a 'newline', and signals to Perl that it should start a new line when it encounters it.
  
 
The string itself may be surrounded on the same line by other Perl code, such as:
 
The string itself may be surrounded on the same line by other Perl code, such as:
Line 107: Line 57:
 
</source>
 
</source>
  
In these examples, the string is highlighted in red. Your aim here is to get this string English-stripped.
+
In these examples, the string is highlighted in red. Your aim here is to get this string English-stripped using the <code>LJ::Lang::ml</code> Perl function. The function itself isn't used in a literal Perl string, so it doesn't need quotes around it. However, the newline "\n" character *does* need to be in a literal string with quotes around it, which means you need to combine the two. This is done using a dot - <code>.</code> - which is how you tell Perl to combine a literal string and something else. This is how it would end up:
 
+
Now, in cases like this, where the entire string is English text and is an unbroken line, you can normally actually do this according to the theory:
+
  
 
<source lang="perl">
 
<source lang="perl">
$ret .= "<?_ml .createaccount.enter_invite_code _ml?>\n";
+
$ret .= LJ::Lang::ml( ".createaccount.enter_invite_code" ) . "\n";
 
</source>
 
</source>
  
<!-- FIXME: Add information about $ML{ ... } here, I completely forgot about it on the first draft of this. -->
+
Then put the line into the corresponding '''.text''' file:
  
Be sure to keep the quote marks and newline around it; these are important, and you shouldn't add any of these to your string in the '''.text''' file. Then, you can add the line to the '''.text''' file as described in the theory example, and you're done.
+
  .createaccount.enter_invite_code=Enter your invite code below:
  
(note that while this is generally okay, there are some cases where it may not work. In those cases, you should use the Perl function described in the next section instead.)
+
Always make sure you don't have any HTML in a string, unless it's part of a sentence. (ie, don't include wrapping <code>&lt;p&gt;</code> tags, etc.).
  
==== The BML::ml Perl function ====
+
== Updating strings ==
  
If you try the above on a piece of code and it doesn't work, you may need to use the <code>BML::ml</code> Perl function instead. This performs much the same task as the <code>&lt;?_ml ... _ml?&gt;</code> tag, but in Perl.
+
If working on a bug that requires the content of a string to be updated, you must create an entirely new string. [https://github.com/fhocutt/dw-free/commit/dc4f49ffda345967e1afa7ee7733716566f3f17d#commitcomment-5891643 The details are a bit arcane], but the upshot is that this is necessary to ensure the database gets updated.
  
The way you'd use the function for the above string is as follows:
+
Say you have <code>example.string = This is an example</code>. To update it, instead of simply editing the '''.text''' file to <code>example.string = This is a different example</code>, you must change to (e.g.) <code>example.string2 = This is a different example</code>, and change the corresponding instance of <code>example.string</code> in the parent file.
<source lang="perl">
+
BML::ml( ".createaccount.enter_invite_code" )
+
</source>
+
  
The function itself isn't used in a literal Perl string, so it doesn't need quotes around it. However, the newline "\n" character *does* need to be in a literal string with quotes around it, which means you need to combine the two. This is done using a dot - <code>.</code> - which is how you tell Perl to combine a literal string and something else. This is how it would end up:
 
  
<source lang="perl">
+
== Multi-line strings ==
$ret .= BML::ml( ".createaccount.enter_invite_code" ) . "\n";
+
</source>
+
  
==== Unknown data ====
+
It's possible to have a multi-line string in a '''.text''' file. Simply replace the equals sign with two less-than signs (&lt;&lt;), and end the string with a dot on its own line:
 +
 
 +
<nowiki>email.invitecoderequest.accept.body<<
 +
Your request for invites has been granted. You can view all your invite codes here:
 +
 +
  [[invitesurl]]
 +
 
 +
.</nowiki>
 +
 
 +
== Unknown data ==
  
 
Sometimes, there will be parts of a string which contain information that you can't specifically know when you're English-stripping, such as the username of the logged-in user. For example, the code might say something like:
 
Sometimes, there will be parts of a string which contain information that you can't specifically know when you're English-stripping, such as the username of the logged-in user. For example, the code might say something like:
  
 
<source lang="perl">
 
<source lang="perl">
$ret .= "$u->{user}, enter your $LJ::SITENAMESHORT invite code:\n";
+
$ret .= "$u->{user}, send your $LJ::SITENAMESHORT invite code:\n";
 
</source>
 
</source>
  
 
In the actual HTML output, this might look something like:
 
In the actual HTML output, this might look something like:
  
  sophie, enter your Dreamwidth invite code:
+
  mark, send your Dreamwidth invite code:
  
 
Even though the "$u->{user}" and "$LJ::SITENAMESHORT" parts in this example are highlighted in red, you can tell they're pieces of data by the dollar sign; anything in a string that starts with a dollar sign is data that needs to be kept somehow. You don't ''need'' to understand what the names mean in order to English-strip them, just the way they're used. (Of course, if you do understand them, it'll be easier to give them meaningful labels.)
 
Even though the "$u->{user}" and "$LJ::SITENAMESHORT" parts in this example are highlighted in red, you can tell they're pieces of data by the dollar sign; anything in a string that starts with a dollar sign is data that needs to be kept somehow. You don't ''need'' to understand what the names mean in order to English-strip them, just the way they're used. (Of course, if you do understand them, it'll be easier to give them meaningful labels.)
  
(yes, the above example is rather contrived, since if you're logged in you wouldn't be entering an invite code anyway. Just bear with me on this one, it's only an example.)
+
In order to use a piece of data in a multi-language string, you need to assign it a label. For example, for the first piece, let's call it "username". Then, you take the data part exactly as written (including the dollar sign), and combine the two with <code>=&gt;</code>. If you have multiple pieces of data, as above, you can use commas to separate them:
 
+
In order to use a piece of data in a multi-language string, you need to assign it a label. For example, for the first piece, let's call it "username". Then, you take the data part exactly as written (including the dollar sign), and combine the two with <code>=&gt;</code>:
+
 
+
<source lang="perl">
+
username => $u->{user}
+
</source>
+
 
+
The above means that the label 'username' should have the value of whatever $u->{user} comes out to be.
+
 
+
If you have multiple pieces of data, as above, you can use commas to separate them; simply copy the above format and separate them with a comma. For example, let's assign the site name a label of "sitename":
+
  
 
<source lang="perl">
 
<source lang="perl">
Line 172: Line 113:
 
</source>
 
</source>
  
You then need to use the <code>BML::ml</code> function, described above, and in addition to giving it the multi-language ID, you need to also give it the data itself:
+
You then need to use the <code>LJ::Lang::ml</code> function and give it both the multi-language ID and the data itself:
  
 
<source lang="perl">
 
<source lang="perl">
$ret .= BML::ml( ".createaccount.enter_invite_code",
+
$ret .= LJ::Lang::ml( ".invites.send_invite_code",
 
                   { username => $u->{user}, sitename => $LJ::SITENAMESHORT }
 
                   { username => $u->{user}, sitename => $LJ::SITENAMESHORT }
 
               ) . "\n";
 
               ) . "\n";
 
</source>
 
</source>
  
Note that in this example, I've split the line into three lines in the middle of the line. Perl is perfectly happy with this as long as you do it in the right place - for example, not in the middle of a literal string. You could write the above as a single line if you wanted:
+
Note that in this example, I've split the line into three lines in the middle of the line. Perl is perfectly happy with this as long as you do it in the right place - for example, not in the middle of a literal string. You could write the above as a single line if you wanted but it's not really very easy to read and shorter lines are generally preferred.
  
<source lang="perl">
+
The key thing to note here is that after the multi-language ID, I've put a comma, then pasted the notation we constructed above after it, while still inside the parentheses of the LJ::Lang::ml() function. After that, we continue on as normal - the closing parenthesis, and the newline character.
$ret .= BML::ml( ".createaccount.enter_invite_code", { username => $u->{user}, sitename => $LJ::SITENAMESHORT } ) . "\n";
+
</source>
+
 
+
...but it's not really very easy to read, so for this example I've split it.
+
 
+
The key thing to note here is that after the multi-language ID, I've put a comma, then pasted the notation we constructed above after it, while still inside the parentheses of the BML::ml() function. After that, we continue on as normal - the closing parenthesis, and the newline character.
+
  
 
We're now done for the Perl part of it, and we now need to add the text to the '''.text''' file. Fortunately, this is a lot easier; when you need to put data in a string, simply refer to its label surrounded by two square brackets. The above string would be represented in the '''.text''' file as follows:
 
We're now done for the Perl part of it, and we now need to add the text to the '''.text''' file. Fortunately, this is a lot easier; when you need to put data in a string, simply refer to its label surrounded by two square brackets. The above string would be represented in the '''.text''' file as follows:
  
  <nowiki>.createaccount.enter_invite_code=[[username]], enter your [[sitename]] invite code:</nowiki>
+
  <nowiki>.invites.send_invite_code=[[username]], send your [[sitename]] invite code:</nowiki>
  
 
We're done. Yay!
 
We're done. Yay!
  
===== Plurals and numbers =====
+
== Plurals and numbers ==
  
These are described in excruciating detail in [Embedding plural forms into translations], but here's a quick example:
+
These are described in excruciating detail in [[Embedding plural forms into translations]], but here's a quick example:
  
 
<source lang="perl">
 
<source lang="perl">
Line 207: Line 142:
  
 
<source lang="perl">
 
<source lang="perl">
$ret .= BML::ml( ".inbox.num_msgs",
+
$ret .= LJ::Lang::ml( ".inbox.num_msgs",
 
                   { num => $num, plural => ( $num != 1 ) ? 's' : '' }
 
                   { num => $num, plural => ( $num != 1 ) ? 's' : '' }
 
               );
 
               );
Line 217: Line 152:
  
 
<source lang="perl">
 
<source lang="perl">
$ret .= BML::ml( ".inbox.num_msgs", { num => $num );
+
$ret .= LJ::Lang::ml( ".inbox.num_msgs", { num => $num );
 
</source>
 
</source>
  
Line 224: Line 159:
 
This takes care of applying the rules for English plurals for you, and lets translators (with help from some magic in LiveJournal and Dreamwidth source code) handle it appropriately by just specifying a text string for their language, without having to muck around in the Dreamwidth source code - which is, after all, the goal of the translation system.
 
This takes care of applying the rules for English plurals for you, and lets translators (with help from some magic in LiveJournal and Dreamwidth source code) handle it appropriately by just specifying a text string for their language, without having to muck around in the Dreamwidth source code - which is, after all, the goal of the translation system.
  
==== Split strings ====
+
== Don't split sentences ==
  
If you've been paying attention, you might notice that the original string above:
+
For example, you '''should not''' do this in your '''.text''' file:
  
<source lang="perl">
+
  .createaccount.enter_invite_code.part1=, enter your  
$ret .= "$u->{user}, enter your $LJ::SITENAMESHORT invite code:\n";
+
</source>
+
 
+
...could just have easily have been written like this:
+
 
+
<source lang="perl">
+
$ret .= $u->{user} . ", enter your " . $LJ::SITENAMESHORT . " invite code:\n";
+
</source>
+
 
+
...that is, the actual data is no longer part of the literal text string, but joined. Perl will accept either way of doing it (with a few exceptions that aren't important here), and as such it's important to note that sometimes you'll come across examples of the latter sort. These are still all parts of the same string, and should be handled in exactly the same way as above. In particular, you should try to avoid having parts of sentences in a multi-language string. For example, you '''should not''' do this in your '''.text''' file:
+
 
+
  .createaccount.enter_invite_code.part1=, enter your
+
 
   
 
   
 
  .createaccount.enter_invite_code.part2= invite code:
 
  .createaccount.enter_invite_code.part2= invite code:
  
This is not only because it looks ugly, but also because some (human) languages don't work that way; for example, in some languages the name of the person you're addressing may need to come at the end, after the site name. By doing it this way, that becomes impossible. We should be as flexible as possible when it comes to where different terms may come in a sentence.
+
== Heredocs ==
 
+
==== Heredocs ====
+
  
 
Sometimes, you'll come across Perl constructs that look something like this:
 
Sometimes, you'll come across Perl constructs that look something like this:
Line 254: Line 175:
 
</source>
 
</source>
  
...followed by a block of text that isn't Perl code, followed by an "HTML" on its own line (or whatever was after the &lt;&lt; in the original line). This format of text is known in Perl parlance as a "heredoc". If you can replace any text in there with <code>&lt;?_ml ... _ml?&gt;</code> tags, and it works, you should do so. However, if any need the use of the <code>BML::ml</code> function, it's probably best to have someone look at it who codes Perl, since these require care in order to get right. (Of course, if you know Perl and know how to fix it, go ahead; otherwise, just make a note of it and move on.)
+
...followed by a block of text that isn't Perl code, followed by an "HTML" on its own line (or whatever was after the &lt;&lt; in the original line). This format of text is known in Perl parlance as a "heredoc". If you can replace any text in there with <code>&lt;?_ml ... _ml?&gt;</code> tags, and it works, you should do so. However, if any need the use of the <code>LJ::Lang::ml</code> function, it's probably best to have someone look at it who codes Perl, since these require care in order to get right. (Of course, if you know Perl and know how to fix it, go ahead; otherwise, just make a note of it and move on.)
  
==== HTML forms ====
+
== HTML forms ==
  
 
Sometimes, you'll come across text that needs to be stripped in HTML forms. For example, in the following example the contents of the <code>&lt;p&gt;</code> tag and the value of the submit button need to be English-stripped:
 
Sometimes, you'll come across text that needs to be stripped in HTML forms. For example, in the following example the contents of the <code>&lt;p&gt;</code> tag and the value of the submit button need to be English-stripped:
Line 287: Line 208:
 
If you do not find two submit buttons with the same 'name' attribute, but there is nonetheless still a 'name' attribute on at least one, make a note for the reviewer that this is the case, but go ahead and English-strip it as normal. (The reviewer will check to see whether the code is actually checking for this value.)
 
If you do not find two submit buttons with the same 'name' attribute, but there is nonetheless still a 'name' attribute on at least one, make a note for the reviewer that this is the case, but go ahead and English-strip it as normal. (The reviewer will check to see whether the code is actually checking for this value.)
  
== Fin ==
+
= Fin =
  
 
That's basically the guide for how to English-strip a page. Don't be too afraid of messing things up; a reviewer will tell you if you have anything wrong.
 
That's basically the guide for how to English-strip a page. Don't be too afraid of messing things up; a reviewer will tell you if you have anything wrong.
 +
 +
[[Category:Translation]]
 +
[[Category:Development]]

Latest revision as of 02:31, 11 December 2018

Needs Update: BML::ml is deprecated and you should use LJ::Lang::ml instead. LJ::Lang::get_text isn't explained.

English-stripping a page refers to the process of taking out hardcoded English text from the BML pages, giving them an ID you can use to refer to the string, and then putting the original English text in a lookup file. In this way, you're stripping the BML files of any English text, hence the name. This is useful because by doing this, it's easy to support multiple languages; the text for different languages is held in the database and can be looked up by the aforementioned ID.

Developers with experience on the LiveJournal code may refer to this as "translation" or "the translation system". Although Dreamwidth Studios itself won't be supporting any language other than English, it's still important to learn how to English-strip pages as it means our Site Copy team can change text as necessary on the site without having to go through the code, and also because we want other users of the code to be able to implement other languages if they want to with the minimum of hassle.

Glossary

First, a bit of explanation about some of the terms we're going to use:

  • String: this refers to a piece of text. For example, this sentence can be considered a string. We'll normally use this when referring to the text that a multi-language ID refers to (defined below).
  • Multi-Language IDs: what replaces English text in a file. There are two types of ID - global IDs (which can be used by any BML page) and page-specific IDs (which are only valid on one page). When you English-strip a file, you will almost always be using page-specific IDs, but it's helpful to know about global IDs anyway.

Global IDs

Global IDs are, for the most part, defined in bin/upgrading/en.dat. However, for features specific to Dreamwidth Studios (and unusable by any other site using our code), any corresponding global IDs will be defined in bin/upgrading/en_DW.dat instead. For example, the Tropospherical sitescheme strings are stored in en_DW.dat since Tropospherical is specific to DWS, and because the strings appear in every page, page-specific IDs can't be used.

A global ID looks something like this:

date.month.december.short

You'll notice this ID is split into several parts with dots. This helps to know precisely how the string is being used; ideally, each separate part should be a subset of the part before it in some way. In this example, month is part of date; december is a month, and short means the short version of how to say this month. (in this example, it's "Dec" in the English text; there's a corresponding long version too, which is simply "December").

Each section name should be lower-case and use only letters, digits, and the underscore and hyphen characters. (There aren't actually any set rules for the characters you can use in IDs in the code, but this is how it's been done so far.) The number of sections in an ID is arbitrary, as are the section names themselves. However, you should always have at least two sections in an ID for ease of use.

Page-specific IDs

Page-specific IDs are defined in a file of the same name as the page it applies to with the additional extension .text. For example, for a page htdocs/login.bml, the corresponding page-specific ID file will be htdocs/login.bml.text.

Page-specific IDs begin with a dot, and thereafter follow the same rules as global IDs. For example, one of the strings in the htdocs/login.bml.text file in dw-free has this ID:

.createaccount.header

Generally, in page-specific IDs, the names you'll use for your sections will correspond to the sections of the page in question. So this ID, for example, refers to the header of the section that invites the user to create an account if they don't already have one.

Again, the actual names and number of sections is arbitrary, but you should always have your IDs follow the structural flow of the content of the page for ease of use.

All IDs should be listed in alphabetical order, if possible.

How to English-strip

A basic example

In Perl, literal text strings are represented by surrounding quote marks. For example:

"Enter your invite code below:\n"

The "\n" in this example is called a 'newline', and signals to Perl that it should start a new line when it encounters it.

The string itself may be surrounded on the same line by other Perl code, such as:

$ret .= "Enter your invite code below:\n";

In these examples, the string is highlighted in red. Your aim here is to get this string English-stripped using the LJ::Lang::ml Perl function. The function itself isn't used in a literal Perl string, so it doesn't need quotes around it. However, the newline "\n" character *does* need to be in a literal string with quotes around it, which means you need to combine the two. This is done using a dot - . - which is how you tell Perl to combine a literal string and something else. This is how it would end up:

$ret .= LJ::Lang::ml( ".createaccount.enter_invite_code" ) . "\n";

Then put the line into the corresponding .text file:

.createaccount.enter_invite_code=Enter your invite code below:

Always make sure you don't have any HTML in a string, unless it's part of a sentence. (ie, don't include wrapping <p> tags, etc.).

Updating strings

If working on a bug that requires the content of a string to be updated, you must create an entirely new string. The details are a bit arcane, but the upshot is that this is necessary to ensure the database gets updated.

Say you have example.string = This is an example. To update it, instead of simply editing the .text file to example.string = This is a different example, you must change to (e.g.) example.string2 = This is a different example, and change the corresponding instance of example.string in the parent file.


Multi-line strings

It's possible to have a multi-line string in a .text file. Simply replace the equals sign with two less-than signs (<<), and end the string with a dot on its own line:

email.invitecoderequest.accept.body<<
 Your request for invites has been granted. You can view all your invite codes here:
 
   [[invitesurl]]

 .

Unknown data

Sometimes, there will be parts of a string which contain information that you can't specifically know when you're English-stripping, such as the username of the logged-in user. For example, the code might say something like:

$ret .= "$u->{user}, send your $LJ::SITENAMESHORT invite code:\n";

In the actual HTML output, this might look something like:

mark, send your Dreamwidth invite code:

Even though the "$u->{user}" and "$LJ::SITENAMESHORT" parts in this example are highlighted in red, you can tell they're pieces of data by the dollar sign; anything in a string that starts with a dollar sign is data that needs to be kept somehow. You don't need to understand what the names mean in order to English-strip them, just the way they're used. (Of course, if you do understand them, it'll be easier to give them meaningful labels.)

In order to use a piece of data in a multi-language string, you need to assign it a label. For example, for the first piece, let's call it "username". Then, you take the data part exactly as written (including the dollar sign), and combine the two with =>. If you have multiple pieces of data, as above, you can use commas to separate them:

username => $u->{user}, sitename => $LJ::SITENAMESHORT

You then surround the whole thing with braces:

{ username => $u->{user}, sitename => $LJ::SITENAMESHORT }

You then need to use the LJ::Lang::ml function and give it both the multi-language ID and the data itself:

$ret .= LJ::Lang::ml( ".invites.send_invite_code",
                   { username => $u->{user}, sitename => $LJ::SITENAMESHORT }
               ) . "\n";

Note that in this example, I've split the line into three lines in the middle of the line. Perl is perfectly happy with this as long as you do it in the right place - for example, not in the middle of a literal string. You could write the above as a single line if you wanted but it's not really very easy to read and shorter lines are generally preferred.

The key thing to note here is that after the multi-language ID, I've put a comma, then pasted the notation we constructed above after it, while still inside the parentheses of the LJ::Lang::ml() function. After that, we continue on as normal - the closing parenthesis, and the newline character.

We're now done for the Perl part of it, and we now need to add the text to the .text file. Fortunately, this is a lot easier; when you need to put data in a string, simply refer to its label surrounded by two square brackets. The above string would be represented in the .text file as follows:

.invites.send_invite_code=[[username]], send your [[sitename]] invite code:

We're done. Yay!

Plurals and numbers

These are described in excruciating detail in Embedding plural forms into translations, but here's a quick example:

$ret .= "You have $num message" . ( ( $num != 1 ) ? 's' : '' ) . " in your inbox.";

You could use a variable for the plural, like this:

$ret .= LJ::Lang::ml( ".inbox.num_msgs",
                   { num => $num, plural => ( $num != 1 ) ? 's' : '' }
               );
.inbox.num_msgs=You have [[num]] message[[plural]] in your inbox.

However, this would still be baking English into the source code - not actual English text in this case, but English grammar in the form of singular and plural inflections. Instead, you can use:

$ret .= LJ::Lang::ml( ".inbox.num_msgs", { num => $num );
.inbox.num_msgs=You have [[num]] [[?num|message[messages]] in your inbox.

This takes care of applying the rules for English plurals for you, and lets translators (with help from some magic in LiveJournal and Dreamwidth source code) handle it appropriately by just specifying a text string for their language, without having to muck around in the Dreamwidth source code - which is, after all, the goal of the translation system.

Don't split sentences

For example, you should not do this in your .text file:

.createaccount.enter_invite_code.part1=, enter your 

.createaccount.enter_invite_code.part2= invite code:

Heredocs

Sometimes, you'll come across Perl constructs that look something like this:

$ret .= <<HTML;

...followed by a block of text that isn't Perl code, followed by an "HTML" on its own line (or whatever was after the << in the original line). This format of text is known in Perl parlance as a "heredoc". If you can replace any text in there with <?_ml ... _ml?> tags, and it works, you should do so. However, if any need the use of the LJ::Lang::ml function, it's probably best to have someone look at it who codes Perl, since these require care in order to get right. (Of course, if you know Perl and know how to fix it, go ahead; otherwise, just make a note of it and move on.)

HTML forms

Sometimes, you'll come across text that needs to be stripped in HTML forms. For example, in the following example the contents of the <p> tag and the value of the submit button need to be English-stripped:

<form method="post" action="create.bml">
    <input type="hidden" name="mode" value="codesubmit">
    <p>Enter your invite code:</p>
    <input type="text" name="invite" size="20" maxlength="20">
    <input type="submit" value="Create Account">
</form>

Notice that we do *not* want to English-strip:

  • name="invite": The word "invite", although English, is being used here as a field name, and is not shown to the user. The code will be looking out for a field called "invite" no matter what language the user is using, so this must not be changed.
  • value="codesubmit": Same as above - this value is used by the code.

However, the value of a submit button (in this case, "Create Account") *is* shown to the user (as this is what's shown on the button itself), and thus needs to be English-stripped, despite being in a value attribute.

There is one exception to this. If you find a submit button with a "name" attribute, check to see if there are any more. If any two submit buttons have the same 'name' attribute, such as:

<input type="submit" name="action" value="Rename">
<input type="submit" name="action" value="Delete">

...then do not English-strip it, and make a note for whoever reviews your patch that this will need to be fixed. (This is because the code will be checking for the value of whatever button is clicked on, and that prohibits English-stripping from taking place; if you were to English-strip the value here, it would no longer work for non-English users.)

If you do not find two submit buttons with the same 'name' attribute, but there is nonetheless still a 'name' attribute on at least one, make a note for the reviewer that this is the case, but go ahead and English-strip it as normal. (The reviewer will check to see whether the code is actually checking for this value.)

Fin

That's basically the guide for how to English-strip a page. Don't be too afraid of messing things up; a reviewer will tell you if you have anything wrong.