Difference between revisions of "Workers"

From Dreamwidth Notes
Jump to: navigation, search
m (fixing typo in name)
(Clarifying, and also idiot proofing something I kept getting wrong)
Line 1: Line 1:
Workers are programs that don't run under control of Apache, because they handle tasks that don't need to happen synchronously to a user's request (whether triggered by one or not), or because running them under Apache would consume excessive resources.
+
Workers are programs that don't run under control of Apache. They are processes which happen asynchronously, either because they use a lot of resources (e.g. search), with a contact external sites (e.g. importer).
  
 
Some workers process [[TheSchwartz]] or [[Gearman]] requests. Those normally run under control of worker-manager (see below). Some workers are manual, and depending on their purpose, can run either under control of worker-manager or be started by cron, or even just once when {re)starting the server. (Manual workers are a mixed lot, and there's no "one size fits all" option for them.)
 
Some workers process [[TheSchwartz]] or [[Gearman]] requests. Those normally run under control of worker-manager (see below). Some workers are manual, and depending on their purpose, can run either under control of worker-manager or be started by cron, or even just once when {re)starting the server. (Manual workers are a mixed lot, and there's no "one size fits all" option for them.)
Line 14: Line 14:
  
 
If you run the worker without <code>--verbose</code> flag, it will continue running in the background even after you're done and have logged out. You probably don't want to do this! If you want run multiple workers at the same time, or have them keep running, use the worker-manager described below.
 
If you run the worker without <code>--verbose</code> flag, it will continue running in the background even after you're done and have logged out. You probably don't want to do this! If you want run multiple workers at the same time, or have them keep running, use the worker-manager described below.
 +
 +
Most of the time, if you are developing a worker, you will have a separate shell where you run the worker and can see its error messages as you work. Remember that every time you edit the code run by the worker (whether it is directly in the worker or just the libraries the worker uses) you need to kill and restart the worker process itself, or the worker will not see your changes.
  
  

Revision as of 20:42, 15 June 2013

Workers are programs that don't run under control of Apache. They are processes which happen asynchronously, either because they use a lot of resources (e.g. search), with a contact external sites (e.g. importer).

Some workers process TheSchwartz or Gearman requests. Those normally run under control of worker-manager (see below). Some workers are manual, and depending on their purpose, can run either under control of worker-manager or be started by cron, or even just once when {re)starting the server. (Manual workers are a mixed lot, and there's no "one size fits all" option for them.)

Starting and stopping workers

Individually

When developing, you can start and debug individual workers from the shell prompt, by typing:

$LJHOME/bin/worker/(worker-name) --verbose

The --verbose or -v flag makes sure that the worker stays in the foreground and provides extra debugging text. To stop the worker when you're done, use ^C.

If you run the worker without --verbose flag, it will continue running in the background even after you're done and have logged out. You probably don't want to do this! If you want run multiple workers at the same time, or have them keep running, use the worker-manager described below.

Most of the time, if you are developing a worker, you will have a separate shell where you run the worker and can see its error messages as you work. Remember that every time you edit the code run by the worker (whether it is directly in the worker or just the libraries the worker uses) you need to kill and restart the worker process itself, or the worker will not see your changes.


Using worker-manager and workers.conf

bin/worker-manager takes care of starting, stopping, and restarting workers to maintain the numbers of workers set in etc/workers.conf for each host it runs on. The etc/workers.conf syntax is pretty straightforward and described in the file itself. To start bin/worker-manager for normal use (daemonized), use:

$LJHOME/bin/worker-manager

Or, if you want it to stay in the foreground and display progress/debug messages, use:

$LJHOME/bin/worker-manager --debug

In both cases, killing it will also kill the workers it started.

If you find that you do not want to use some workers (for testing, temporarily, or long-term), you may turn them off in the configuration. In /etc/workers.conf, setting the count to 0 will not start any of that worker. Commenting out the line on which that worker appears (with a #) will not start any of that worker. These are best for temporary changes. For a permanent change, you may also entirely remove the line on which that worker appears .

Using cron or at server boot time

Please refer to your OS documentation. The crontab(1) and init(8) manpages are probably most relevant.

List of workers

Below is a list of all the workers in the $LJHOME/bin/worker directory.

Note: I probably got some of that wrong. Any corrections welcome, especially where I say "not sure".

Workers in alphabetical order of filename
Filename Type Category Function and notes
birthday-notify Manual ESN Fire LJ::Event::Birthday ESN events for users whose birthdays are soon
content-importer TheSchwartz Importer Import all content
content-importer-lite TheSchwartz Importer Import some content (everything except entries and comments)
content-importer-verify TheSchwartz Importer Verify import password
directory-meta Manual Directory/user search Update data used in searching for users
distribute-invites TheSchwartz Invite codes Email distributed invite codes
embeds TheSchwartz Embed codes Grab titles from embed APIs
esn-cluster-subs TheSchwartz ESN Notification delivery stage 2
esn-cluster-subs-mass TheSchwartz ESN Notification delivery stage 2 (mass)
esn-filter-subs TheSchwartz ESN Notification delivery stage 3
esn-filter-subs-mass TheSchwartz ESN Notification delivery stage 3 (mass)
esn-fired-event TheSchwartz ESN Notification delivery stage 1
esn-fired-event-mass TheSchwartz ESN Notification delivery stage 1 (mass)
esn-process-sub TheSchwartz ESN Notification delivery stage 4
esn-process-sub-mass TheSchwartz ESN Notification delivery stage 4 (mass)
expunge-users Manual Account deletion Expunge (permanently) accounts deleted more than 60 days ago
import-scheduler Manual Importer Queue importing jobs in the right order - oneshot as of 20110925, but see bug 1491
incoming-email TheSchwartz Incoming email Handles email posting, email support requests, email support follow-up
latest-feed TheSchwartz Latest things Heavy lifting for the Latest Things page
lazy-cleanup TheSchwartz Journal management Postpone some of the work of entry deletion
load-friends-gm Gearman Old (LJ) friends system Not used anymore, see bug 3971
paidstatus Manual Payments Process paid carts, expired paid accounts, and paid accounts expiring soon
ping-hubbub TheSchwartz Outbound syndication Notify PubSubHubbub of DW journal updates
process-esn TheSchwartz ESN Notification delivery stages 1-4 combined
process-esn-mass TheSchwartz ESN Notification delivery stage 1-4 combined (mass)
process-eventlogrecord TheSchwartz Disabled Not used currently, see bug 3963
process-privacy TheSchwartz Entry privacy Process mass entry privacy changes
resolve-extacct Gearman External accounts Determine the account type (personal/community/syndicated)
schedule-synsuck Manual Syndication Queue TheSchwartz jobs for updating syndicated accounts
search-constraints Gearman Directory/user search Something to do with partial search results? Not sure
search-lookup Gearman Directory/user search Initiate search? Not sure
search-updater Manual Directory/user search Not sure what it does, still used but has some bitrot. See bug 3968
send-email TheSchwartz Sending email Send email to users
send-email-mass TheSchwartz Sending email Send email to users (mass)
shop-creditcard-charge Gearman Payments Charge a user's credit card
sphinx-copier TheSchwartz Full-text search Keep the full-text search database in sync with actual entries (see bug 3966)
sphinx-search-gm Gearman Full-text search Interface to the core sphinx search
stats-collection Manual Site stats Collect/compile statistics
subscribe-hubbub Manual Syndication Subscribe to hubbub for syndicated accounts so DW is notified of updates to feed sources
support-notify TheSchwartz Support board Email users about support answers they received or support actions they subscribed to
synsuck TheSchwartz Syndication Updates syndicated account from feed source, started by bin/worker/schedule-synsuck or through the hubbub system
sysban-gm Gearman Site administration Retrieve sysban entries by type
t-memlimit Manual Testing Memory stress test
taglib-gm Gearman Journal content Retrieve user's tags
talklib-gm Gearman Journal content Update entry comment count, needed... not sure
userpic-resize-gm Gearman Userpics Resize userpics larger than maximum height or width
xpost TheSchwartz Crossposting Crossposts new or edited entries