Difference between revisions of "Reference/Search"

From Dreamwidth Notes
Redirect page
Jump to: navigation, search
(Sphinx)
(merged into Sphinx/Search, but makes more sense to redir to base Search page)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
There are two forms of search on Dreamwidth installations: both are optional and require further setup.
+
#REDIRECT [[Search]]
 
+
User search searches for users matching certain characteristics. Text search searches through entries and comments. This page will focus on text search.
+
 
+
 
+
= User Search =
+
Not heavily documented, instructions for setup can be found in [[Set_up_UserSearch]].
+
 
+
= Text Search =
+
Dreamwidth uses Sphinx, an open-source search package, to implement text search. Search is available on the [http://dreamwidth.org/search search page]. There are two modes of search: site search, and per-journal search.
+
 
+
Site search only shows public content. Journal search may contain locked content, following the regular behavior for whether you can see the locked content or not. That is, if you can see it on the journal, then you can find it with search. If you can't see it on the journal, then you won't see it in the search results. There's also an option to search by comments. Only comments made on paid users' journals are indexed for technical reasons (site load).
+
 
+
Text search is resource-intensive and is a separate system from the main Dreamwidth site. This makes it possible to run on a different machine from the webservers on a production site. You don't have to worry about this too much on a development server where it's basically just you on the site. Still, be warned that it might be good to only turn on the search workers when you're testing something specific.
+
 
+
== Installation ==
+
 
+
You'll need to install the Sphinx package and a couple of Perl modules that make it easy for us to use Sphinx:
+
 
+
=== Installing the Sphinx package ===
+
 
+
You will need to download the Sphinx package:
+
 
+
  wget http://sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
+
 
+
And then install it:
+
 
+
  tar -zxvf sphinx-0.9.9.tar.gz
+
  cd sphinx-0.9.9/
+
  ./configure
+
  make
+
  make install    # as root
+
 
+
=== Installing File::SearchPath and Sphinx::Search ===
+
 
+
These are available via Ubuntu's package system, so:
+
 
+
  apt-get install libfile-searchpath-perl libsphinx-search-perl
+
 
+
It is important that you [http://search.cpan.org/~jjschutz/Sphinx-Search-0.22/lib/Sphinx/Search.pm#VERSION match up the versions] of the Perl packages and the Sphinx package; otherwise, your searches will silently fail due to incompatibilities in the APi. Assuming that all works, you should have everything installed that you need to get the search system setup. Moving on!
+
 
+
== Setup ==
+
 
+
=== Database ===
+
 
+
You will need to create a new database. Something like this will work, but make sure to adjust the username and password:
+
 
+
  CREATE DATABASE dw_sphinx;
+
  GRANT ALL ON dw_sphinx.* TO dw@'localhost' IDENTIFIED BY '__YOURPASSWORD__';
+
  USE dw_sphinx;
+
 
+
Now you have to create the tables:
+
 
+
<gist>ba988dfd02e49822246f</gist>
+
 
+
The table is a pretty straightforward table.  It just stores the posts, who they're by, where they're at, and some basic security information.  Note that this table has the full (compressed) subject and text of the entries, so it can get rather large.
+
 
+
=== Site Configuration ===
+
 
+
Configuring your site is next.  This involves adding a new section to your %DBINFO hash, like this:
+
 
+
  sphinx => {
+
      host => '127.0.0.1',
+
      port => 3306,
+
      user => 'dw',
+
      pass => '__YOURPASSWORD__',
+
      dbname => 'dw_sphinx',
+
      role => {
+
          sphinx_search => 1,
+
      },
+
  },
+
 
+
You also need to add a configuration elsewhere in the file that tells your system where the search daemon will be.  Port 3312 is the default:
+
 
+
  # sphinx search daemon
+
  @SPHINX_SEARCHD = ( '127.0.0.1', 3312 );
+
 
+
That's it for site configuration.  Once you have the above two options in, then your site will do all the right things to make the search active.  Of course, we still have to configure Sphinx itself...
+
 
+
=== Sphinx ===
+
 
+
The first step, assuming you're going to be running Sphinx as root, is to make the directory it needs:
+
 
+
  mkdir /var/data
+
 
+
Now, we need to setup the configuration file.  By default, sphinx will look for the file in `/usr/local/etc/sphinx.conf`. To confirm, run this:
+
 
+
  indexer --quiet
+
 
+
It will fail if it didn't find a config file, but will helpfully tell you where it tried to look. Now we know where the config file is, we need to replace it with this:
+
 
+
<gist>f50f0604a064db0464ad</gist>
+
 
+
That's right.  It's long.  But it's actually almost identical to the configuration file that comes with Sphinx.  There are a lot of tweaks in it to figure out the right combination of values for UTF-8 support and the like, but the rest is pretty straightforward.
+
 
+
Make sure to customize `sql_user` and `sql_pass` in the configuration files to match what you used earlier.
+
 
+
To make sure that your test setup is working, once you have all of the configuration done, try to run the indexer (as root).
+
 
+
  indexer --all
+
 
+
You should see it spit out some stuff saying it's collecting documents, and if all goes well, you should see files in /var/data. You won't be able to search yet because you haven't placed any data in your search database, but you'll at least have confirmed that you have Sphinx configured properly.
+
 
+
== The Search Process ==
+
 
+
Getting content into search requires a few things:
+
 
+
* a separate database containing entry/comment text - we have a separate database for the text of entries/comments that we want to be searchable. We copy each entry/comment when it's posted or edited from the main database into the search database
+
 
+
* a Sphinx index - doing a search on raw text is painfully slow, so Sphinx processes the contents of the search database further, creating an index of the words.  Processing the text this way also makes it possible for  "test" to turn up "tests", "testing", etc
+
 
+
 
+
Getting search results involves a couple more:
+
 
+
* ??
+
 
+
* a search worker
+
 
+
 
+
 
+
[[Category: Development]][[Category: Dreamwidth Installation]][[Category: Reference]]
+

Latest revision as of 03:05, 28 June 2013