Wednesday, December 08, 2010

Tools for combining BibTeX, PDFs, and e-Readers

Quick links:
UPDATE: If you are looking for a simpler solution that updates PDF's in your BibTeX database with the corresponding BibTeX data, check out updatepdf.pl. Be careful though; the old PDF::API2 module only supports Adobe PDF 1.4 (i.e., compatibility-mode PDF's).
UPDATE: Apparently, you can also manage your Kindle collections efficiently using emacs, using Calibre, or manually so long as you're okay with possibly having to reboot your Kindle after every change. So it would be possible for me to add Kindle collections.json support to my fix-pdf-tags script; I just don't anticipate having the time to do that in the near future (plus, I don't own a Kindle anymore, and so I wouldn't be able to test it).
In trying to migrate to e-Readers, I've been experimenting with both the Amazon Kindle 3 and the Sony Daily Reader PRS-950. They both have nice features... I don't have time to go into a review, but I'll give a teaser...
  • Kindle PRO's: It has a nice web browser (that works on 3G too!), and makes it super easy to get new content onto the device. In fact, you can even download PDF's from Dropbox via the nifty browser (although e-mailing to your Amazon Kindle e-mail address is convenient too). Plus, Amazon makes for a nice e-Book store -- lots of the books I would want.
  • Kindle CON's: However, PDF's really have to be in compatibility mode (Acrobat 1.4). Otherwise, the Kindle will miss all of the metadata. More importantly, it is basically impossibledifficult to manage collections through the USB. So if you have hundreds of PDF's, you'll spend days tagging them via the clunky Kindle keyboard.
    • UPDATE: Apparently, the Kindle uses a simple SHA-1 hash of the file's full path as a key in the collections.json file that is accessible via USB. Consequently, you can manage your collections data more efficiently. You can do so with an emacs script or with a calibre plugin or manually. However, you may have to reboot your Kindle every time you make a change. At least with the older Kindles, the collections.json file was only read on boot. It's possible that the newer Kindles are smart enough to refresh collections data every time the USB is unplugged (like the Sony does), but I honestly don't know. I have a feeling that Hannes, the author of bibtex-kindle, knows though.
  • Sony Daily PRO's: It has the optical touch screen. It doesn't require compatibility-mode PDF's. It has a large screen. It has terrific page viewing options. In theory, the PDF note options are very nice, but e-Reader notes just seem tedious to me in general regardless of interface. More importantly, it is essentially an "open" platform so long as you are OK with a little bit of reverse engineering. It is easy to write a few scripts to manage your XML files, and so keeping your PDF's organized is easy for your average script kiddy.
  • Sony Daily CON's: The optical touch screen means the screen is sunk down so far that it the chassis casts a small shadow around the edge of the screen. The Sony case-with-light isn't as nice as the Kindle's case-with-light. The Sony Bookstore doesn't have as many books (or at least the books I care about). The zoom modes leave much to be desired. In PDF's that work fine on the Kindle, trying to click a word for dictionary lookup often leads to selecting a phrase (and there's nothing you can do about it).
And there's plenty more to talk about, but those are the quick things off the top of my head. So it looks like I'm probably going to keep both... so I can have a diversity of e-Books available to me. Plus, the e-Book experience is a little nicer on the Kindle, but the PDF experience might be a little nicer on the Sony. It's hard to tell.

But what this post is really about is a utility I've put together that automatically manages my research PDF collection on either the Kindle or the Sony Reader. In particular,
  • It updates PDF's with metatags to match author/editor/title information from a central BibTeX database.
  • If you invoke it with a "kindle" argument, it converts PDF's to 1.4 so the Kindle can read the metatags.
  • If you invoke it with a "reader" argument, it also automatically generates categories based on file hierarchy (i.e., the folders in which your PDF files live). In fact, symbolic links indicate that multiple tags should be applied to the same file (i.e., the target of the symbolic link).
So maybe that will be helpful to someone (at least as an example to generate some ideas). The project started out as something customized for me, but I've tried to make the documentation clear (see the chunk at the top to start). Plus, most of the important custom information (paths, preferences, etc.) are at the top of the script.

Check out the most recent version of my fix_pdf_tags script; it resides in my bibtex_to_pdf Mercurial repository where you can view its change history.

P.S. I know that Calibre is an existing software package that has very similar aims and a nice graphical environment. However, it really is a poor choice for managing PDF research. Plus, the Calibre folks have basically written off Kindle users as poor schmucks with hobbled readers. More importantly for me, I'm much happier with scripted solutions that can be fired off quickly.

3 comments:

Anonymous said...

You are right, Calibre is indeed a poor choice for technical paper management, it is important for me to update hundreds of PDF meta-data in place and Calibre wants me to save the file to disk before this can happen.

Even so it does not seem to be working I had to resort to Jabref.

Anonymous said...

I developed an emacs add-on to a) enrich a pdf with title and author metadata and b) generate collections.json out of a bibtex file. https://github.com/hannesm/bibtex-kindle

Cheers,

Hannes

Ted said...

Hannes -- I think the correct github URL is:

https://github.com/hannesm/bibtex-kindle

The one you gave had a typo that was causing a 404 error.

I didn't realize the Kindle used JSON objects for its collections information. Is that a new feature? Can you only find that on the new Kindles?