Difference between revisions of "Importer Testing"

From Dreamwidth Notes
Jump to: navigation, search
(Redlinking a page that I'm about to create.)
m (Obsolete wasn't substed)
 
(7 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 +
 +
<div class="warnbox"><b>Warning</b>: The following information is obsolete, and may quite possibly be incorrect.  Information posted to official communities should be assumed to be accurate.  If you have fresh information, please update this article (and remove this textbox!)<br><br><font size="-1"><center>This box was added on {{CURRENTMONTHNAME}} {{CURRENTDAY}}, {{CURRENTYEAR}}.</center></font></div>[[Category: Obsolete]]
 +
 
braindump = on
 
braindump = on
  
Testing the importer is straightforward.  First, make sure your database is up to date and all of your code is checked out, etc.  (See [[TheSchwartz Setup]] to learn about setting up the TheSchwartz database so that this will work.)  Now, try to do this:
+
Testing the importer is straightforward.  First, make sure your database is up to date and all of your code is checked out, etc.  (See [[TheSchwartz Setup]] to learn about setting up the TheSchwartz database so that this will work.)   
 +
 
 +
Also make sure that you have enabled beta tools on your server: $ENABLE_BETA_TOOLS = 1;
 +
 
 +
Step 1:
 +
Now go to http//your.site.com/misc/import.bml, and fill out the form, to schedule an import.
 +
 
 +
You can also schedule an import from the command line:
  
 
  # schedule an import
 
  # schedule an import
Line 8: Line 18:
 
That schedules an import task for user xb95 on site livejournal.com with the given password.  It will import the data to the target xb95_on_dw.  Huzzah.
 
That schedules an import task for user xb95 on site livejournal.com with the given password.  It will import the data to the target xb95_on_dw.  Huzzah.
  
 +
 +
Step 2:
 
Now you need to fire up (and keep this running) the scheduler:
 
Now you need to fire up (and keep this running) the scheduler:
  
 
  # keep scheduler running foreground, and watch the log
 
  # keep scheduler running foreground, and watch the log
  bin/import-scheduler --foreground &
+
  bin/worker/import-scheduler --foreground &
 
  tail -f logs/import-scheduler.log
 
  tail -f logs/import-scheduler.log
  
With that running in one window you should see some noise that says it's scheduling jobs.  Woot!  That's good.  But now you need to have the jobs that actually get stuff done.
+
Note that if you are importing entries, you will need to run this twice -- the first time because it will schedule all the jobs that entries depends on, the second time, to schedule the entry imports. If you are importing comments, you will need to run this thrice: the first time to import tags/friends/etc, the second time to import entries, the third time to import comments.
 +
 
 +
 
 +
Step 3:
 +
 
 +
With that running in one window you should see some noise that says it's scheduling jobs.  Woot!  That's good. (If you see an error instead that says "worker died: can't insert job (probably duplicate job)", you probably haven't [[TheSchwartz Setup|set up your TheSchwartz database]].) But now you need to have the jobs that actually get stuff done.
  
 
  # start TheSchwartz worker manually
 
  # start TheSchwartz worker manually
  bin/worker/content-importer
+
  bin/worker/content-importer --verbose
  
 
You will see any noise that happens.  Warnings/STDERR output will be dumped to the console.  (Which generally means we need to have better debugging output for the importer, sending to STDERR in what is supposed to be a daemon is not really going to be that useful...)
 
You will see any noise that happens.  Warnings/STDERR output will be dumped to the console.  (Which generally means we need to have better debugging output for the importer, sending to STDERR in what is supposed to be a daemon is not really going to be that useful...)
  
 
braindump = off
 
braindump = off

Latest revision as of 07:20, 27 January 2010

Warning: The following information is obsolete, and may quite possibly be incorrect. Information posted to official communities should be assumed to be accurate. If you have fresh information, please update this article (and remove this textbox!)

This box was added on November 24, 2024.

braindump = on

Testing the importer is straightforward. First, make sure your database is up to date and all of your code is checked out, etc. (See TheSchwartz Setup to learn about setting up the TheSchwartz database so that this will work.)

Also make sure that you have enabled beta tools on your server: $ENABLE_BETA_TOOLS = 1;

Step 1: Now go to http//your.site.com/misc/import.bml, and fill out the form, to schedule an import.

You can also schedule an import from the command line:

# schedule an import
bin/test/schedule-import -u xb95 -p SOMEPASSWORD -t xb95_on_dw -s livejournal.com

That schedules an import task for user xb95 on site livejournal.com with the given password. It will import the data to the target xb95_on_dw. Huzzah.


Step 2: Now you need to fire up (and keep this running) the scheduler:

# keep scheduler running foreground, and watch the log
bin/worker/import-scheduler --foreground &
tail -f logs/import-scheduler.log

Note that if you are importing entries, you will need to run this twice -- the first time because it will schedule all the jobs that entries depends on, the second time, to schedule the entry imports. If you are importing comments, you will need to run this thrice: the first time to import tags/friends/etc, the second time to import entries, the third time to import comments.


Step 3:

With that running in one window you should see some noise that says it's scheduling jobs. Woot! That's good. (If you see an error instead that says "worker died: can't insert job (probably duplicate job)", you probably haven't set up your TheSchwartz database.) But now you need to have the jobs that actually get stuff done.

# start TheSchwartz worker manually
bin/worker/content-importer --verbose

You will see any noise that happens. Warnings/STDERR output will be dumped to the console. (Which generally means we need to have better debugging output for the importer, sending to STDERR in what is supposed to be a daemon is not really going to be that useful...)

braindump = off