Search/Sphinx

From Dreamwidth Notes
Revision as of 10:24, 5 February 2013 by Afuna (Talk | contribs)

Jump to: navigation, search

This page documents the process you will need to go through to setup the Sphinx search system that Dreamwidth uses. This is not an easy process, and the documentation is probably going to need some iterations to get to a very useful state.

I only suggest setting up the search system if you have a good amount of time to mess around with things. If you need some help, feel free to grab me anytime and I'll help out.

Software Installation

First, you will need to setup the Sphinx software. Very first, you should make sure you have some packages (Ubuntu Intrepid):

apt-get install libpath-class-perl libmysqlclient15-dev g++

The instructions are different depending on your version of Ubuntu, so choose the appropriate version:

Jaunty/9.04 and older

Installing File::SearchPath and Sphinx::Search

There are two Perl packages that you will have to download:

http://search.cpan.org/CPAN/authors/id/T/TJ/TJENNESS/File-SearchPath-0.05.tar.gz
http://search.cpan.org/CPAN/authors/id/J/JJ/JJSCHUTZ/Sphinx-Search-0.12.tar.gz

Now, you need to build these. They are standard Perl packages which you can build with dh-make-perl. Do File::SearchPath first (and then install it) and then you can build Sphinx::Search (and install it).

Installing the Sphinx package

You will need to download the Sphinx package:

http://sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz

The Sphinx package itself is a standard project style. Setup and installation looks something like this:

tar -zxvf sphinx-0.9.8.1.tar.gz
cd sphinx-0.9.8.1/
./configure
make
make install

Karmic/9.10 and newer

Installing File::SearchPath and Sphinx::Search

From Ubuntu 9.10 and up, these Perl packages are available in the packaging system. You can install them in one step:

apt-get install libfile-searchpath-perl libsphinx-search-perl

Installing the Sphinx package

You will need to download the Sphinx package:

http://sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz

The Sphinx package itself is a standard project style. Setup and installation looks something like this:

tar -zxvf sphinx-0.9.9.tar.gz
cd sphinx-0.9.9/
./configure
make
make install


It is important that you match up the versions of the Perl packages and the Sphinx package; otherwise, your searches will silently fail due to incompatibilities in the APi.


Assuming that all works, you should have everything installed that you need to get the search system setup. Moving on!

Configuration

There are several points to configure. Let's start with the configuration of your database.

Database

You will need to create a new database. Something like this process will work:

CREATE DATABASE dw_sphinx;
GRANT ALL ON dw_sphinx.* TO dw@'localhost' IDENTIFIED BY 'dw';

USE dw_sphinx;

Now, you have to make a table. Use these CREATE TABLE statements.

The table is a pretty straightforward table. It just stores the posts, who they're by, where they're at, and some basic security information. Note that this table has the full uncompressed subject and text of the entries, so it can get rather large.

Site

Configuring your site is next. This involves adding a new section to your %DBINFO hash, like this:

sphinx => {
    host => '127.0.0.1',
    port => 3306,
    user => 'dw',
    pass => 'dw',
    dbname => 'dw_sphinx',
    role => {
        sphinx_search => 1,
    },
},

You also need to add a configuration elsewhere in the file that tells your system where the search daemon will be. Port 3312 is the default:

# sphinx search daemon
@SPHINX_SEARCHD = ( '127.0.0.1', 3312 );

That's it for site configuration. Once you have the above two options in, then your site will do all the right things to make the search active. Of course, we still have to configure Sphinx itself...

Sphinx

Left this to last as it's probably the trickiest. The first step is, assuming you're going to be running Sphinx as root (I do), to make the directory it needs:

mkdir /var/data

Now, we need to setup the configuration file. By default, sphinx will look for the file in /usr/local/etc/sphinx.conf. To make sure that this is the case, check that the file /usr/local/etc/sphinx.conf.dist exists; this is the sample configuration file that comes with the distribution, and you should put your configuration file in the same directory. If you still cannot find it, try running "indexer --quiet"; it will fail if it didn't find a config file, but will helpfully tell you where it tried to look.

Use Dreamwidth's sphinx configuration.

That's right. It's long. But it's actually almost identical to the configuration file that comes with Sphinx. I had to do a lot of tweaking to figure out the right combination of values for UTF-8 support and the like, but the rest is pretty straightforward.

Testing

To make sure that your test setup is working, once you have all of the configuration done, try to run the indexer (as root).

indexer --all

You should see it spit out some stuff saying it's collecting documents, and if all goes well, you should see files in /var/data. You can also test search from the command line:

search -q some words

That's the gist of it. If all of that seems to be working, then you should be able to skip down to the usage section.

Usage

The search system requires a few components to be up and running for it to actually work. In a nutshell, you should make sure the following are running at all times:

bin/worker/sphinx-copier -v
bin/worker/sphinx-search-gm -v

The former is a TheSchwartz job (so you need to have that configured) and the latter is a Gearman worker (and you need that configured too). Once those are running, have to make sure the search daemon is running. Again, as root:

searchd --console

All of these command lines are "foreground debug mode" versions. If you drop the arguments, the workers/searchd will spawn themselves off into the background and disappear from sight.

Now, once you have those components up and running, you can do one of two things to actually get data in the system to search. You can use the manual copier or you can go post on one of your paid accounts. The manual copier is probably easiest if you don't have a zillion accounts on the system:

bin/schedule-copier-jobs

Just run that and it will get your sphinx-copier busy copying data into the dw_sphinx database you made earlier. You can see if it works by watching the output of the sphinx-copier, it should say something about inserting posts. You can then go to the dw_sphinx database and select from posts_raw to see the data is actually in the system.

Once you have data in the system, you have to index. This is pretty easy, just run the indexer again (as root):

indexer --all

Once that's done, you can restart searchd, and you should be able to search for things from the command line or from your site.