Difference between revisions of "Importing"

Revision as of 22:01, 10 January 2009

Start

Basic download script base:

http://codex.journal-press.com/Ljdump

I don't know anything about scripting, so I don't know if either of the below would be better or worse for our purposes. But

ljmigrate is based on the ljdump code, captures usericons as well, and has the ability to import to lj-based systems.
ljsm is a python script which captures usericons and posted images, in addition to entries and comments.

Examples

Good to look at Schwartz workers for reference. Possibilities:

Worker for privacy conversion -- long running task limited to not kill the DB. bin/worker/process-privacy, LJ::MassPrivacy
bin/worker/paidstatus - the schwartz 'shell', that declares the job. it imports the module DW::Worker::PaidStatus
cgi-bin/DW/Worker/PaidStatus.pm - the actual worker; jobs can either succeed, fail permanently, fail temporarily. note the helper subs at the bottom of that module that define the schwartz parameters for retries, etc.
cgi-bin/DW/Worker/Payment.pm - search for PaidStatus, line 90 -- you'll see the calling convention for creating a new schwartz job and passing arguments
bin/test-pay - a generic script for testing the payment system

Rough Implementation Guide

create a new DW::Worker::ContentDownloader module (or something, name doesn't matter much)
create the simple bin/worker/contentdownloader that wraps the module; this is almost a carbon copy of existing bin/worker/paidstatus etc
create a test script that lets you insert a job. the test script can be really simple, something like: bin/test-content-downloader --insert-job xb95, and then the script would have four lines of code, one of which is that line 90 from Payment.pm referenced above, i.e., create the job, give it the arguments
then, you can test that your pipeline works by running the worker in debug mode... um, bin/worker/contentdownloader --debug I think. You probably have to do perl -I/home/dw/cgi-bin bin/worker/contentdownloader --debug to get the include path setup right
anyway, once all that works - your job is working, your module is getting called, etc, then you can pretty easily drop in a call out to the download script. system("/path/to/python", "/path/to/script", "--username", "xb95"); and then worry about error checking, timeout checking, and all the million failure cases

Ideal Process

run the content downloader,
copy data to mogile
mark the user as 'downloaded'
run content importer

Other Ideas

exor674 I think one of the things we'd need to add for the import feature is a external-logid logprop, so if somebody tries to import twice you won't get twice of the entries -- AND the systen can know "oh DW entry 1 was originally livejournal.com-exor674-5827377212221211"
"Imported from" tag and link
Options: "import tags, tag prefix, tag suffix" "import comments"? "import friend groups, tag prefix, tag suffix" "import all as *security*"

Importing Comments

Copyright issues. Idea from azurelunatic:

Supposing comments belonging to others were imported and privately posted, and comments belonging to the user were imported and posted according to their current screening setting in proper threads with the private comments. This would allow a maximum amount of the user's own content to show up.

Supposing also that the comment screening mechanism were modified to handle privately-posted imported comments. This would be done according to the screening settings on the original post: if unscreened on the original, someone OpenID-authenticated as the original comment poster could elect to own and unprivatescreen comments belonging to them.

After this point, the journal owner could screen/unscreen the comment at will.

To make this work better, imported comments should be listed somewhere that the OpenID owner of the comments could find them and possibly mass-unprivatescreen/take ownership.

@@ Line 4: / Line 4: @@
 * http://codex.journal-press.com/Ljdump
+I don't know anything about scripting, so I don't know if either of the below would be better or worse for our purposes. But
+* [http://www.kelpheavyweaponry.com/trac/ljmigrate/wiki/WikiStart ljmigrate] is based on the ljdump code, captures usericons as well, and has the ability to import to lj-based systems.
+* [http://ljsm.feechki.org/index_en.html ljsm] is a python script which captures usericons and posted images, in addition to entries and comments.
 == Examples ==

Difference between revisions of "Importing"

Revision as of 22:01, 10 January 2009

Contents

Start

Examples

Rough Implementation Guide

Ideal Process

Other Ideas

Importing Comments

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Important Info

Wiki Navigation

Main Categories

Tools

Other Places