Phase Portrait: Perl

Showing posts with label Perl. Show all posts

Wednesday, February 16, 2011

More Quick Reference Cards

Remember that time when I was quick reference card happy [1, 2, 3, 4, 5]? Today, I accidentally found another good quickref card blog post by someone else (Refcards by Michael Goerz; see the original post for the source code and other versions of these cards):

He also includes a few quick reference cards not written by him (Subversion (SVN), GDB, GNU Emacs, MySQL).

Wednesday, December 08, 2010

Tools for combining BibTeX, PDFs, and e-Readers

Quick links:
fix_pdf_tags BibTeX+PDF+e-Reader management script

bibtex_to_pdf Mercurial repository

UPDATE: If you are looking for a simpler solution that updates PDF's in your BibTeX database with the corresponding BibTeX data, check out updatepdf.pl. Be careful though; the old PDF::API2 module only supports Adobe PDF 1.4 (i.e., compatibility-mode PDF's).

UPDATE: Apparently, you can also manage your Kindle collections efficiently using emacs, using Calibre, or manually so long as you're okay with possibly having to reboot your Kindle after every change. So it would be possible for me to add Kindle collections.json support to my fix-pdf-tags script; I just don't anticipate having the time to do that in the near future (plus, I don't own a Kindle anymore, and so I wouldn't be able to test it).

In trying to migrate to e-Readers, I've been experimenting with both the Amazon Kindle 3 and the Sony Daily Reader PRS-950. They both have nice features... I don't have time to go into a review, but I'll give a teaser...

Kindle PRO's: It has a nice web browser (that works on 3G too!), and makes it super easy to get new content onto the device. In fact, you can even download PDF's from Dropbox via the nifty browser (although e-mailing to your Amazon Kindle e-mail address is convenient too). Plus, Amazon makes for a nice e-Book store -- lots of the books I would want.
Kindle CON's: However, PDF's really have to be in compatibility mode (Acrobat 1.4). Otherwise, the Kindle will miss all of the metadata. More importantly, it is ~~basically impossible~~difficult to manage collections through the USB. So if you have hundreds of PDF's, you'll spend days tagging them via the clunky Kindle keyboard.
- UPDATE: Apparently, the Kindle uses a simple SHA-1 hash of the file's full path as a key in the collections.json file that is accessible via USB. Consequently, you can manage your collections data more efficiently. You can do so with an emacs script or with a calibre plugin or manually. However, you may have to reboot your Kindle every time you make a change. At least with the older Kindles, the collections.json file was only read on boot. It's possible that the newer Kindles are smart enough to refresh collections data every time the USB is unplugged (like the Sony does), but I honestly don't know. I have a feeling that Hannes, the author of bibtex-kindle, knows though.
Sony Daily PRO's: It has the optical touch screen. It doesn't require compatibility-mode PDF's. It has a large screen. It has terrific page viewing options. In theory, the PDF note options are very nice, but e-Reader notes just seem tedious to me in general regardless of interface. More importantly, it is essentially an "open" platform so long as you are OK with a little bit of reverse engineering. It is easy to write a few scripts to manage your XML files, and so keeping your PDF's organized is easy for your average script kiddy.
Sony Daily CON's: The optical touch screen means the screen is sunk down so far that it the chassis casts a small shadow around the edge of the screen. The Sony case-with-light isn't as nice as the Kindle's case-with-light. The Sony Bookstore doesn't have as many books (or at least the books I care about). The zoom modes leave much to be desired. In PDF's that work fine on the Kindle, trying to click a word for dictionary lookup often leads to selecting a phrase (and there's nothing you can do about it).

And there's plenty more to talk about, but those are the quick things off the top of my head. So it looks like I'm probably going to keep both... so I can have a diversity of e-Books available to me. Plus, the e-Book experience is a little nicer on the Kindle, but the PDF experience might be a little nicer on the Sony. It's hard to tell.

But what this post is really about is a utility I've put together that automatically manages my research PDF collection on either the Kindle or the Sony Reader. In particular,

It updates PDF's with metatags to match author/editor/title information from a central BibTeX database.
If you invoke it with a "kindle" argument, it converts PDF's to 1.4 so the Kindle can read the metatags.
If you invoke it with a "reader" argument, it also automatically generates categories based on file hierarchy (i.e., the folders in which your PDF files live). In fact, symbolic links indicate that multiple tags should be applied to the same file (i.e., the target of the symbolic link).

So maybe that will be helpful to someone (at least as an example to generate some ideas). The project started out as something customized for me, but I've tried to make the documentation clear (see the chunk at the top to start). Plus, most of the important custom information (paths, preferences, etc.) are at the top of the script.

Check out the most recent version of my fix_pdf_tags script; it resides in my bibtex_to_pdf Mercurial repository where you can view its change history.

P.S. I know that Calibre is an existing software package that has very similar aims and a nice graphical environment. However, it really is a poor choice for managing PDF research. Plus, the Calibre folks have basically written off Kindle users as poor schmucks with hobbled readers. More importantly for me, I'm much happier with scripted solutions that can be fired off quickly.

Tuesday, March 17, 2009

Perl script that generates CSV and BerkeleyDB versions of LTWA list

UPDATE: If you're wondering why I just didn't save the LTWA database and search it every time I needed to abbreviate a journal, it's because I wanted to optimize for speed downstream. That is, I did all the possible processing now to speed things up later. I've also augmented the script to do the opposite (and save the results in other smaller files) so that I can tell the downstream process to take longer (and possibly have better results with less spurious entries in the hash table). I've also updated the downstream script to save any successful lookups locally to speed up successive runs. Again, contact me if you want more details.

I'm sick of looking up ISO 4 standard journal abbreviations from the List of Title Word Abbreviations (LTWA) hosted at ISSN's LTWA online. The most annoying thing about LTWA online is that you can't get one big list unless you have them mail you a paper copy (for a price). So you have to resort to clicking each letter and waiting for the list for that letter to come up.

So I wrote a Perl script that automatically cURLs each LTWA online page down, processes it, and generates both CSV and BerkeleyDB (BDB) hash files containing a list of words and their associated official LTWA abbreviation. I use the BDB file in another script to automatically generate BibTeX database files for each of my journal papers (that script first checks a list of known-good journal abbreviations before trying to generate the abbreviation itself).

There were several challenges to such a task, and the list isn't perfect. I focused on one-word entries. For more complicated abbreviations, I figured I'd lean on my list of known-good journal abbreviations. That still left LTWA entries like "psycholog-" and "bulletin-" which use "-" to imply "and any other character." So I used a typical /usr/share/dict/words list to generate a list of English words that matched each pattern. Because such lists don't usually include plurals, I used Lingua::EN::Inflect to generate plurals and then took all of the plurals that included the singular (i.e., that would also match the LTWA pattern).

So that works well for me. Someday I might put the script and/or the files it produces on-line. For the moment, if you want any of these, contact me and let me know. I'll share.

Wednesday, July 18, 2007

MATLAB Quick Reference Cards (and more)

UPDATE: I list an AMSTeX reference card below. There is also an AMSLaTeX reference card available at refcards.com.

This is meant to be a follow-up to the "TeX Reference Card (and others)" post.

I found a list of a bunch more quick reference cards, which include applications/packages like MATLAB, MATLAB toolboxes, Perl, MFC, MySQL, Linux, UNIX, Vi, Vim, Windows, AMSTeX, TeX, and a bunch more...

However, I was just looking for MATLAB quick reference cards. So, here are these (some of which did not come from the above site):

So, that's nice. I recommend one of the bottom two [i.e., 1, 2].

Pages