Friday, February 18, 2011

Dr. Bernoulli gets a job: Mathematics of the Job Search – Faculty Version

I recently found out that the Duke Computer Science department had 404 applicants for the open position in their department. I mentioned that to a CS professor from a different university, and he didn't seem surprised by that number. Moreover, when you think about how many "faculty candidate" lectures there usually are within a CS-like department each hiring season, and you consider that those interviewees are likely a small selection of the total applicants, then 404 starts sounding reasonable.

When there are 404 applicants who each have PhD degrees, publications, and possible post-doctoral or existing faculty appointments, let's also assume that the objective function that each department is maximizing is pretty flat. If you don't like that assumption, then assume we have no prior information, and so we will maximize entropy and assume that each applicant has a 1/404 chance of being picked for the job (in reality, this probability is itself conditioned on whether the state steps in and has a hiring freeze... so the real probability might be closer to 1/1000). So that is a very low number. Can we fight low probability with high volume of applications?

Assume we apply to N schools where the probability of getting an offer is
p = 1/404
at each of them. Then the probability of not getting an offer from each of them is
1 - p = 403/404,
and so the probability of not getting an offer from all of them is
(1 - p)N = (403/404)N.
So finally we arrive at the probability of getting an offer from at least one of them, which is
1 - (1 - p)N = 1 - (403/404)N.
Hypothetically speaking, let's say you apply to N = 50 such positions. Then you have a
1 - (403/404)N = 1 - (403/404)50 ≈ 11.65%
probability of getting the offer. Of course, if you were paying attention, you remember that p (1/404) is very small in this example. Consequently, the (1 - (1 - p)N) curve looks linear for a wide region around the origin. So even though you remember your fourth-grade math teacher teaching you that you cannot additively accumulate probabilities (i.e., your probability of getting a job is not (N × p)), in this small-p case, it is a pretty decent approximation. In particular, even with our ostensibly large N, it is the case that
N × p = (50)(1/404) ≈ 12.38%,
which is pretty close to our slightly more dismal 11.65%.

In December, I ran into a woman who just got finished submitting all of her faculty positions. She said she applying to just 10 of them because she was exhausted and figured she was just practicing this round. Setting N = 10 reduces your chances to 2.45%. Having said that, the distribution across the applicant pool is certainly not flat. Her home institution, research, adviser, and other factors make her a very attractive candidate who will likely do well with such a low N... In fact, she was recently interviewed at a university near me (that, again, may have to deal with hiring freezes, etc., in the near future).

Now, in my case... Maybe I should burn my CV and dust off my résumé... I hope I'm not too old and outdated.

Wednesday, February 16, 2011

Microsoft's version of Google Scholar?

I accidentally ran into this today. Microsoft has their own version of Google Scholar:Among other things, it will even generate aggregate statistics on a particular author you're interested in (number of publications, number of citations, etc.). In fact, it can even generate similar aggregates for journals and conferences (which augments the metrics you can already get from ISI databases).

Unfortunately, some of the candy tools (co-author graph and path – tools that have little functionality but lots of coolness) require Silverlight. I tried running them with Moonlight, which crashed Firefox but seemed to work in Chrome. I say "seemed" because the Silverlight/Moonlight applet loaded fine but was populated with no information. Moreover, doing searches within the applet also returned no information. However, I haven't tried it on a Windows (nor Wine) machine for comparison, and so maybe co-author graphs/paths just aren't ready for production yet. I realized yesterday that it might be wrong to interpret MAS as a product for research so much as a product still being developed within Microsoft Research.
LIBRARY ACCESS UPDATE: As of March 11, 2011, it is very possible that your university's library proxy is not yet configured to allow access through Microsoft Academic Search. If you try to access the search engine through your library proxy and it fails at the MAS address, try it again at the Journalogy address. Strangely, these two names resolve to the same address, but neither is a CNAME. Moreover, neither uses a HTTP redirect to the other. Regardless, many library proxies search their database of allowable hosts by name, and so trying either name may help. If neither name works, contact your library and have them add MAS.

natbib-compatible BibTeX style (BST) file for Springer LNCS publications

UPDATE: Additionally, if you prefer "References" instead of "Bibliography", you have to re-define the \bibname macro in your document (e.g., just after you include natbib). Just add a
\rewewcommand\bibname{\refname}
or a
\renewcommand\bibname{References}
and you should be up and running with your desired bibliography name.
For some reason, the folks over at Springer do not like to make their BibTeX style (BST) files natbib compatible. This omission seems egregious when considering the kind of authors that would submit to journals and conferences that use LNCS-style formatting. Springer does provide LaTeX support files for LNCS, but the included BST file is not natbib compatible (because it was not built with natbib and author–year extensions turned on, which are needed in natbib even when numbered references are used). What makes things more difficult is that it is produced by hacking another BST (titto.bst) that was originally generated properly using makebst, and many of the hacks were already implemented natively in merlin.mbs. So much more of the postfix BST language was hacked directly rather than would have been necessary if the correct docstrip driver options were picked in the first place. So that makes it more challenging to reproduce a refactored version with additional natbib functionality without worrying about introducing regressions.

Nevertheless, I've done my best, and I've made the the natbib-compatible result splncsnat.bst available for download. I was able to remove the need for some of the manual editing by a smarter choice of docstrip options, but I still ended up having to create a patch on top of a stock docstrip-generated BST (the docstrip driver splncsnat-unpatched.dbj and patch splncsnat.patch are also available). Hopefully that helps someone out there.
  • splncsnat.bst: natbib-compatible BST file for Springer LNCS-type publications
    Download and place splncsnat.bst in same directory as document's TeX source code. In the TeX preamble of the document, use
    \usepackage[numbers]{natbib}
    \bibliographystyle{splncsnat}
    and then in the text use macros like
    \cite{smith77}       % to get a "[1]" in the text
    \citep{smith77} % to get a "[1]" in the text
    \citet{smith77} % to get a "Smith [1]" in the text
    \citeauthor{smith77} % to get a "Smith" in the text
    The normal \bibliography{FILENAME} can be used at the end of the text where the BBL will be inserted by BibTeX.
  • splncsnat-unpatched.dbj: docstrip driver used to generate splncsnat-unpatched.bst
  • splncsnat.patch: patch used to generate splncsnat.bst from splncsnat-unpatched.bst (assuming version 4.20 [2007/04/24 (PWD, AO, DPC)] of merlin.mbs)
Of course, you need only splncsnat.bst to get up and running.

More Quick Reference Cards

Remember that time when I was quick reference card happy [1, 2, 3, 4, 5]? Today, I accidentally found another good quickref card blog post by someone else (Refcards by Michael Goerz; see the original post for the source code and other versions of these cards):He also includes a few quick reference cards not written by him (Subversion (SVN), GDB, GNU Emacs, MySQL).

Friday, February 11, 2011

Scripts to get Mercurial up and running on OSU CSE machines (SunOS and Linux)

UPDATE: A day after I posted this, Mercurial and Python (and Git) were added as optional subscriptions for users of these machines. So login to your desired machine and execute subscribe, then select MERCURIAL and whatever PYTHON is available (version 2.4.x or higher). Quit subscribe to save your changes, and re-login (of course, you can also do the same thing with GIT).
If you are a student, staff member, or faculty member in the Computer Science and Engineering department at The Ohio State University, you may have found yourself wanting to use a DVCS like Mercurial (hg) for SCM. Unfortunately, the version of Python that comes bundled on these enterprise systems prevents installing Mercurial, and some other issues on the SunOS system (like the lack of round() in the math library) prevent building a recent version of Python 2 that is needed for installing Mercurial. There are ways around this mess, and I have done my best to automate them within a script.

So give it a shot:Download the appropriate script to your desired target machine. Next, edit the script (e.g., using pico, nano, vi, or emacs) to verify that the INSTALLDIR location at the top of the script is what is desired – if you are going to run the script on both types of machines, your INSTALLDIR must be different in the two scripts. Then run the script on the machine (e.g., ./install_hg_osu_cse_sun.sh) and follow the instructions. The script is interactive, and so you will be able to manage its behavior as it runs. Be sure to follow its instructions at the end about setting your PATH and PYTHONPATH; if you ran the script on both types of machines, you will have to be clever in your script RC/profile file to set these differently based on the machine you are on – I recommend using uname to detect the different machine type.

After that, you should have a working Mercurial. In the Linux script, you may adjust the Mercurial and Python 2 versions downloaded, but in the SunOS script, you need to leave the Python 2 version alone as later versions of Python will not build on the SunOS machines (due to the problems with the old math library). On either machine, if you are adventurous, you can use the installed Mercurial to clone the stable Mercurial repository (hg-stable) and keep your installed Mercurial up-to-date with the very latest stable version.

Updated LaTeX document class for Ohio State University (OSU) graduate school dissertation and thesis documents

Back in 1996, The Ohio State University Electrical and Computer Engineering (ECE) department made available LaTeX2e support files including a document class that complied with the graduate school's format for dissertations (see samples pages, guidelines, templates, and other resources from the graduate school). The resulting osudissert96.cls and osudissert96-mods.sty from the ECE department was kept up to date through 1998, but it was left to lapse out of compliance after several format updates from the graduate school (including a recent one in 2009). Additionally, the graduate school only officially supports helping students with documents "typeset" in Microsoft Word (and even their Word templates may require a more recent version than they claim on the website).

So back when I put together my dissertation (which has source code available to review) in 2010, I updated those old ECE templates for the 2010 format. I tried to make them backward compatible with the old osudissert96 to make them nice drop-in replacements for anyone using the outdated versions. You can find them at:For the most part, the old osudissert96 documentation still applies. However, it might be better just browsing through the sample and/or using the sample as a template for your own document. To get the sample up and running,
  1. Unzip sample-osudissert10.zip.
  2. Unzip osudissert10.zip.
  3. Put the CLS and STY files from osudissert10.zip into the same directory as the files from sample-osudissert10.zip
  4. Build the sample dissertation with:
    1. pdflatex Thesis.tex
    2. bibtex Thesis.aux
    3. pdflatex Thesis.tex
    4. pdflatex Thesis.tex
  5. Review the resulting Thesis.pdf file, which also includes documentation on how to get your own dissertation up and running.
There is also a README file in sample-osudissert10.zip that basically says the same as above. Experts may just need the files in osudissert10.zip, but it will still be useful to see the quick reference in Appendix B of the sample dissertation. Note that the documentclass is still called osudissert96.cls even though the zip file is called osudissert10.zip; this choice was made for compatibility with old dissertations using the old files.

I hope that helps someone out there. I probably won't be monitoring the graduate school format policies now that I am not in graduate school anymore, but I am usually happy to help with "how-to-modify" questions over e-mail (if I have time). Good luck!

Thursday, February 10, 2011

PlugBox Linux on my Pink PogoPlug: 4GB USB partition was the key

So I just installed Plugbox Linux on my pink PogoPlug v2, which was selling for about $50 on Amazon when I bought it last week.

The only problem I had was solved when I put my entire Plugbox Linux distribution on a partition of my USB stick that was only 4GB (or smaller). I used an ext3 file system, but an ext2 file system works as well (and probably FAT too). I made the partition bootable in the partition table, but that isn't necessary with the way the current version of u-Boot looks for the kernel. u-Boot looks for a partition with /boot/uImage on it. Others say that they've put /boot on a small 32MB partition by itself, but I had to put everything on one partition.

To figure all of this out, I did enable the netcat console. That let me watch u-Boot as it booted before it hands things off to the kernel on uImage. However, if I would have used a 4GB partition (or smaller) to begin with, I don't think I would have even bothered setting up the netcat console.

Hopefully everything keeps working!