Science, meet productivity

  • Archive
  • RSS
  • Ask me anything

The R script that will change your life

If you use the R programming language, you probably know how much of a pain it is to keep your packages updated. You’ve run update.packages(...) on the few that you want to keep up to date, but it’s a pain in the neck to do that for every package, every time. Thankfully, where there’s a will, there’s a way!

When R starts up, it looks at your .Rprofile file (if you have one), and runs it. Mine looks like this:

#!/usr/bin/Rscript

options("repos"="http://cran.stat.ucla.edu/")
library(utils)
update.packages(ask = FALSE)
my.packages = c("CvM2SL2Test", "MASS", "verification", "gtools", "ROCR",
        "RColorBrewer", "heatmap.plus", "gmodels", "gplots",
        "profr", "proftools",
        "colorRamps")
to.download = which(!my.packages %in% rownames(installed.packages()))
if( length(to.download) > 0){
    install.packages(my.packages[to.download], clean=TRUE, dependencies=TRUE)
}

This script has three awesome features:

  1. It will update ALL the packages I have without asking.
  2. It has an A-list of packages that I always want to have.
  3. It iterates over the A-list and makes sure they’re updated.

It may be a little redundant, but having a few fail-safes never hurt anyone.

Feel free to steal this .Rprofile for your own usage.

    • #R
    • #programming
    • #Rprofile
    • #automation
    • #updating packages
  • 2 weeks ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

How to fail (and sometimes win) at graduate fellowships

The title of this post is rather tongue-in-cheek, but is also meant to be a reality check. In applying to fellowships, you should apply to as many as possible because in all likelihood, you will not get one. You need to try as hard as possible to get one, combing over your essays and reading them aloud before you submit, but keep this reality in mind.

This year I applied to six (6) graduate fellowships and won one, plus a Finalist for one. Last year I also applied to six (6) and won zero, though got Honorable Mention for one. Below is a table of all the fellowships I’ve applied to in the past two years, and their results. If the cell is empty, then I did not apply that year.

In the spirit of sharing, such as those PIs who have publicly posted their funded and unfunded grant proposals, I’m posting ALL my essays. For each year I applied, I’ve put all my documents associated with each application so you can read them and compare the successful and unsuccessful essays. Some are rather personal, but that’s the nature of the essay. Also, the NSF one includes a ratings sheet so you can delve into the mind of the assessors as well. I realize the file sizes are huge but blame Microsoft Word 2011, not me.

After the table, I’ll discuss advice for each fellowship.

Fellowship 2011-2012 2012-2013
DOE Computational Science Grad F’ship no award [16.1MB PDF] no award [18.5MB PDF]
Ford Foundation Predoctoral F’ship no award [94KB PDF]
Hertz Foundation F’ship no award [28.8MB PDF] Finalist [33.2MB PDF]
National Defense Sci&Eng Grad F’ship no award [114KB PDF] Awarded [9.2MB PDF]
National Physical Science Consortium ?? [10.1MB PDF]
NSF Grad Research F’ship Honorable Mention [580KB PDF]
Paul & Daisy Soros F’ship for New Americans no award [13.1MB] no award [661KB PDF]
SMART Scholarship no award [115KB PDF]

Below are the tips I have for each of these fellowships, but keep in mind that I did not win most of them.

DOE Computational Science Graduate Fellowship

I can’t give advice for this one, other than to apply. The first time I around I just talked about science, the second year I emphasized how my research needs lots of awesome computers but none of those strategies seemed to work. Those people also require a course plan, so maybe they though mine wasn’t serious enough. Or maybe my letters of recommendation sucked. No clue.

Honestly I thought I had a pretty good shot at this, since I got to the Finalist stage for the Hertz Fellowship, but it goes to show you that each of these fellowship associations have subtly different standards.

Ford Foundation Predoctoral Fellowship

This fellowship is aimed at those who are going to increase diversity in the professoriate, whether by being a minority themselves, or by participating in programs that encourage minorities to pursue research and academia. I applied as a member of the latter group.

I have to admit my mistake on this one. My undergrad can submit transcripts electronically, and I didn’t check with them to see if they accepted e-transcripts. I got an email in February saying my application was incomplete, which sent me into a panic and I logged on to the transcripts website and overnighted a transcript to their office. But then I read the email again and saw that they “encourage you to apply and wish you continued success in all of your academic endeavors.” :(

Conclusion: check with the fellowship organization about their accepted transcript formats and whether they have received all your materials

Hertz Foundation Fellowship

This is definitely one of the most prestigious fellowships. The first year I applied to this, I was not very scientifically mature, which you can see from my application. But in my 2012-2013 application, I had a vivid idea of what I would like to see in the future of science, and how we were going to get there.

There are very few resources (save for here, here, and here) for the Hertz so I’ll detail my journey below.

Note that the filename of the fellowship document is Olga.Botvinnik_Hertz_2012-2013_v6.pdf. I did, indeed go through six versions.

The Interviews

Round 1 Interview

On Nov. 16 2012, I received an email indicating I had been selected for a first-round interview! Last year, I received the rejection notice on Nov. 11 2012. Given that their due date is usually Nov 1st, it was nice to at least get a quick turnaround. An excerpt from the “Reject” email:

Thank you for your application to the Hertz Foundation Fellowship Program for graduate study in the applied sciences.  We greatly appreciate your interest in the Hertz Foundation Fellowship Program and we strongly encourage your continuing contributions to science and technology.

We received over 600 applications this year and were able to choose only about 25% of those for personal interviews.  Unfortunately, we were not able to select you for an interview and therefore, you will not be part of the next round of our selection process.  We regret to inform you of this fact, particularly since the Foundation values the pursuit of applied science and engineering of creative young PhD candidates like yourself.

Your application was given serious consideration.  In addition to scholastic excellence, there are many other factors associated with the objectives of the Hertz Foundation that we take into account when selecting candidates for interviews.  Our decision was based on the total assessment.

Thank you for choosing a career in science and technology.  We are sure your efforts will make a difference in the world.

I asked for feedback, especially because one of my recommenders didn’t submit their letter on time, but they said they do not reconsider applications and provided this note from the president of the foundation about the decision process:

 We do not provide feedback on individual applications.   In our process, two reviewers read each application.  They look for not just academic excellence, but creativity, early demonstration of research capabilities, motivation and confidence, and the strong endorsement of those traits by referees.  We also look for leverage in the use of our resources.  If you are already well along in graduate school, we question what our contribution to an established research program may bring.

 If you were not selected for an interview, it does not mean that there was any defect in your application or that it was not considered and appreciated.  We rarely see an applicant that we would not choose to fund were resources available.   Like Rhodes, Marshall and Churchill, our success ratio is slightly under 3%.  Unlike them, we invest heavily in sequential one-on-one interviews that depend on the volunteer effort of many excellent scientists and engineers.  We must make a hard selection early on to optimize the use of these resources.

And the “Interview” email:

Congratulations! You have been chosen to advance to the interview stage of Hertz Fellowship selection process - as one of slightly more than 20% of this year’s applicants. This involves a formal technical interview with one or more of our interviewers - who are in most cases former Hertz Fellows, themselves. This interview generally lasts 45 to 60 minutes. It is patterned after the PhD oral exam and you may be asked to perform calculations, discuss your previous research work, and to demonstrate the breadth and depth of your technical knowledge. Please bring paper and pen in the event you’re asked to perform calculations during your interview.

If you have new scientific papers that were not submitted with your application, please bring 2 copies for your interviewer. The materials will then be added to your file before it is reviewed for consideration for a second interview.

The interview itself in Los Angeles on Jan 26th 2013, and I carpooled there from San Diego with a couple other people at UCSD who also were invited to interview. I dressed in a suit, since they said it will be a formal interview, and I was crazy nervous. But once I got into the room and started chatting with these crazy smart people, it wasn’t that bad. It just felt like a conversation about science. They asked me random questions like about how does Purell work and why it’s preferred over hand washing, about hydrogen bonding in glass, and some statistical questions. I had a really good time talking to them! I think being relaxed and taking my time with answering their questions made a huge difference.

Round 2 Interview

On Feb 1st 2013, I received an email notifying me of my Finalist status. Here’s some of the email.

We are pleased to notify you that you have been designated a Hertz Fellowship Finalist - one of 50 selected from a pool of over 700 very talented applicants. 

I was so excited! The day that I found out was the first day of my lab’s ski/snowboard retreat at Big Bear Lake/Mountain, and we were driving up. I was refreshing my email on my phone constantly until I found out at 11:55am.

I used Zimride/Craigslist rideshares to get there (I’m having car troubles which is why I’m avoiding driving it long distances). I was a lot more nervous for this one, and I think it showed. They asked me to explain how I “got here,” as in what was my journey in science. The questions where I really faltered were about where I have applied my own creativity in research, and some of the physics/stats questions. Though they were impressed that I had applied two years in a row, and pleased that I completely changed my application, because I guess they get people who submit the same thing. Which doesn’t make sense to me because if it didn’t work the year before, it probably won’t work now…

As they probed me about where I applied my creativity in research, at first I asked if I could talk about a teaching experience because I’ve taken creative approaches to presenting material. But they said no they weren’t really interested because after all it wasn’t like I invented inquiry-based learning (!). So then I racked my brain for where I have done something original in my research, and it dawned on me that most of my research has been following someone else’s instructions. I’ve been creative in the implementation of these instructions and coded up nice solutions, but I haven’t been the originator of the idea. They said “Impress us. This is where we want you to show off your accomplishments.” And.. I didn’t.

After the interview I realized that in my Bioinformatics Algorithms class I formulated a really interesting biological problem as a solid computational problem, and this had never been though of before. So emailed that to them, but maybe it came off as too desperate, or not good enough in the face of the accomplishments of my peers. So in any case, I was in the process of doing something really creative.

Another question they asked was about where I saw the frontier of science being in the future, and I said “non-destructive single cell genomics.” And they they said “Great! So how would you do that?” And I had to invent all kinds of technologies to accomplish this wacky future, on the spot. That was pretty draining.

With the stats questions, I should have got them because they were simple binomial distribution and heat diffusion through spherical volumes problems, but my brain was so dead from feeling inadequate about my lack of creativity, that I took a really long time to answer them, and then I didn’t even get them in the end.

Tips from a Hertz Fellow (not me)

  • Practice some scenarios of questions they may ask of your past research, or a general “tell me about yourself” question. This will help you be less nervous
  • Study the publications of your interviewers and try to find something in common. These are people with PhDs who are curious about the world, so it’s better to get them thinking about your topic than for them to sit and come up with “What can we ask that s/he probably doesn’t know?”-type questions, which are no fun.

Thanks to Christian for the tips!

Conclusions on Hertz Fellowship

  1. You need a strong vision for the future of science, supported by your current research.
  2. You need to have demonstrated creativity in your field by suggesting a new solution, approach, or technique that people go “wow, why didn’t I think of that before?” Simply following instructions to perform other’s research is not enough.
  3. Be quick on your feet with statistics, physics and differential equations.

National Defense Science and Engineering Graduate (NDSEG) Fellowship

Last year I adapted my NSF fellowship for the NDSEG. The NDSEG is more competitive, with ~200 awards out of ~3000 applicants (~6.7%, whereas the NSF is ~2k awards out of 20k applicants, 10%). While by stats the NDSEG is more “prestigious,” it is not as well-known. I’m not quite sure what makes a proposal for the NDSEG good, other than similar criteria to the NSF, so I suggest comparing my successful and unsuccessful fellowship applications. I’ve heard that you should describe potential military applications of your research in the NDSEG applications, but I didn’t do that so… who knows.

Update: Others have informed me that the NDSEG seems to go after recent accomplishments such as publications, whereas the NSF is more about your potential for future research. Thanks to Reid for the tip!

National Physical Science Consortium

This seems like a relatively unknown fellowship because I discovered it randomly. They also have a terrible website where you generate a key instead of a login, and then the key can change… I had to use 3 different keys for this application. And they basically say to use your NSF essays for this application, which is why the header in the file is “NPSC/NSF-Style.”

As for advice, no idea on this one … they still haven’t gotten back to me (?!), even after I emailed them. So I’m guessing this is a “No.”

National Science Foundation Graduate Research Fellowship Program (NSF GFRP)

In case you’re wondering, I didn’t apply to the NSF 2012-2013 because I have a Master’s degree, which makes the NSF GRFP hate me.

As the most well known of these fellowships, there are many fantastic resources for applying to the NSF GFRP (even more here, here, here, and here), so I will just give some general advice.

  1. Research Proposal
    1. Speak fluently about your topic. Sound like you actually know what you are doing. This will come when you have many people read your fellowship app and ask questions about the details. The actual topic doesn’t matter, but you need to convey that you are competent in your field.
    2. Broader Impacts: Your research should be benefitting the scientific community as a whole, whether you are depositing data publicly, contributing to open-source software, or training new people in science.
  2. Personal Statement
    1. Broader Impacts: Make sure you are doing some kind of outreach, either as a TA or a volunteer or something. And don’t just do it for the fellowship because that’s pretty obvious … You should have a history of outreach.
    2. Tell the story of WHO is going to be doing this research. Grad fellowships fund people, not projects.
  3. Previous Research
    1. If you feel “inadequate” because you didn’t do research in undergrad but had classes instead, talk about a class that opened your mind beyond the traditional curriculum and into the unanswered questions of research.
    2. Again, speak fluently about your topic.
    3. Broader Impacts: How have your research experiences benefitted the scientific community, either at large, or a few people? Talk about it.

Thanks to Alex for emphasizing Broader Impacts!

Outside of those three, the “topic” that you apply to is very important, as that determines who reads your application. In Bioinformatics, we’re in a bit of a no-man’s land since if we apply to Computer Science then they won’t understand the biology and say our algorithm is crap, and if we apply to Biology, then they’ll say we don’t understand the biology, and gloss over the computational details. So you have to find a delicate balance.

Update: Finally, the NSF is the only fellowship of these 8 (!) to give feedback, so if you only apply to one thing, apply to the NSF to get a feel for the process, and to receive feedback on your first application. Then in your next application, you will know what’s going on and can embark on a more serious application. Thanks to Reid for pointing this out!

Paul and Daisy Soros Fellowship for New Americans

The PDSoros Fellowship is very difficult to get, and prestigious. It is for immigrants and children of immigrants (I was born in Russia so I count as an immigrant). Like the Hertz Fellowship, they ask you to focus on times when you have applied creativity in your field, and since I’ve had that realization with the Hertz, they probably also saw that I haven’t been that creative. Unlike the Hertz, it is for people pursuing ANY graduate degree, so you are competing with people in law school, medical school, PhD programs, M.A., M.F.A., M.P.H., M.S., M.Eng., etc programs. So it is a very large pool.

Plus I didn’t proofread my essays enough. I looked through them later and saw typos and notes to myself, which is not professional.

Science, Mathematics, and Research for Transformation Scholarship

No idea. Didn’t get any feedback about the first time, only this email:

This message is to inform you that you were not chosen as one of the 130 Round 1 Finalists for this year’s award competition. Please note, a second round of up to 20 additional awards may be issued this month. Your application may be reconsidered for an award at that time. ASEE will keep you apprised of your status as new information becomes available. We appreciate your interest in the SMART Scholarship Program.

But I didn’t get an award for the second round either. I didn’t apply the second time because this fellowship requires a year of service in a Department of Defense lab after you graduate, and I wanted more freedom with my fellowships.

    • #graduate fellowships
    • #phd
    • #grad school
    • #essays
    • #writing
    • #nsf
    • #ndseg
    • #hertz
    • #doe csgf
    • #interviews
    • #paul and daisy soros
    • #ford foundation
  • 3 weeks ago
  • 6
  • Comments
  • Permalink
  • Share
    Tweet

Beautiful boxplots and sassy small multiples

Last week, I posted about implementing design principles in Python’s graphic library, matplotlib. I received lots of great feedback, and made another tutorial, specifically about boxplots and “small multiples”. Small multiples are a concept coined by Edward Tufte about having many plots using the exact same axes, allowing the reader to discover patterns by eye, though here I also add linear regression to aid the eye.

Enjoy the tutorial here!

    • #python
    • #programming
    • #GraphicDesign
    • #edwardtufte
    • #boxplots
    • #small multiples
    • #matplotlib
  • 4 weeks ago
  • Comments
  • Permalink
  • Share
    Tweet

In India, 'no frills' hospitals offer $800 heart surgery

nerdneha:

Using pre-fabricated buildings, stripping out air-conditioning and even training visitors to help with post-operative care, the group believes it can cut the cost of heart surgery to an astonishing 800 dollars.


It’s so great to see people making an effort to make healthcare affordable. I can’t see this catching on in the states (because of malpractice concerns, etc.) but, after reading so many depressing articles about why healthcare costs are so high, people like Narayana Hrudayalayagive me hope!

  • 4 weeks ago > nerdneha
  • 4
  • Comments
  • Permalink
  • Share
    Tweet

Implementing graphic design principles in Python's matplotlib

Tired of ugly 2d histograms and bar charts in matplotlib? Read about good chart design but not really sure how to implement it? Check out my talk/tutorial on implementing graphic design principles in Python’s matplotlib! I gave this talk on April 10th 2013 to UCSD’s Scientific Python User Group (aka “Python with Pizza”) and used principles from Edward Tufte’s books to improve plots in Python’s matplotlib.

Click the title to check it out and let me know what you think!

    • #python
    • #programming
    • #matplotlib
    • #tutorial
    • #edwardtufte
    • #graphicdesign
  • 1 month ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

Single-cell genomics will change the world

In the future, you will know exactly which strain of influenza, Hepatitis C, or HIV infected you, and if there are any other parasites floating around in your blood. If you’re not sure about the origins of your meat, you could sequence it to check for pathogens, the biodiversity of the cows it came from, and its environment to make sure it was truly grass-fed. When you get your hair dyed or permed, you’ll also get a scalp treatment which changes the genetics of your hair so it conforms to whatever change you’re getting. Bioterrorism officials will be able to detect virulent threats at the most minuscule amount.

All this sounds very science-fiction, but I believe we are hurtling towards an exciting future in which biology will play an unprecedented role, due to the current research in single-cell genomics.

Single cell genomics is an exploding field right now, due to DNA amplification techniques introduced in 2001 improved upon in 2012, and novel single-molecule sequencing methods which require no amplification at all. For most high-throughput sequencing, the genome-wide sequencing you may have heard about through large efforts such as the Personal Genome Project, the ENCODE project, or The Cancer Genome Atlas, you need a large amount of genetic starting material. This means that to get any usable results, you need to take a large population of cells (~100,000), grind them up, make thousands of copies of the DNA, then send the DNA for sequencing.

Right now we are treating tissues as if they are an entire country like the United States, making stereotypes about everyone that lives here based off a few interactions. But the US is composed of people, individuals who work hard and make tough decisions every day. We simply cannot assume everyone is exactly the same.

The advent of single-cell genomics completely changes this paradigm that we need thousands of cells to understand biology. Instead of treating a heterogenous soup of cells such as a tissue as a homogenous population, for example, assuming every cell in the heart or liver or kidney or cancer tumor is exactly the same (which we know from physiology to be completely false), we can study individual cells and their solo struggles.

From single-cell research, we can finally study tiny amounts of cells. And soon, technology will be good enough that individuals can afford their own sequencers. The Illumina MiSeq is the closest thing we have right now, but it’s still too expensive and the bioinformatics isn’t developed enough for a blood sample —> ??? —> profit!-type experience. In the future, we can sequence individual cells in your blood to predict your current viral load, a key component of health in someone living with HIV. You’ll be able to perform your own quality control of the food in your home, even checking which farm those “heirloom” tomatoes came from, or using an advanced protocol to check for environmental effects (pesticides, soil quality, etc) via epigenetics such as bisulfite methylation sequencing of DNA, histone modifications or methylated RNA (high-throughput methods not yet invented!). When you get your hair treated, you’ll provide your genome sequence and the hair technician will match your desired hair genes with your current genes, and use that to create a gene therapy so that you don’t have to come in for root touch-ups - your roots will already be the correct color or level of curliness! Finally, we’ll be able to detect any smidgen of genetic material lying around, and halt any bioterrorism threat in its tracks.

Single-cell genomics will change the world.

  • 2 months ago
  • Comments
  • Permalink
  • Share
    Tweet

The essence of the problem

Vanilla essence. Flickr: pejrm

In one of my classes, the main project is to formulate a biological problem as a computational problem. This formulation should be such that you can give your computational problem formulation to a “genius computer science slave” (aka you are the slave, and hopefully the genius will come) and they could come up with an algorithm to solve it.

I’m really excited about my problem formulation, which is a new method of sequence assembly, but every time I met with the professor to discuss my formulation, he wanted me to further simplify the problem. At first, I was too hung up on the quality of the sequencing reads, the fact that they are paired-end, etc. My professor prodded me towards a simpler and simpler formulation, the “essence of the problem.” And at this essence of the problem, the solution becomes obvious.

This reminds me of my post about Eric Lander, where he talked about “struggling with a problem” in his interview, and that at some point in the struggle, “the structure of the problem becomes clear, and the path through it becomes clear.” I couldn’t have figured out the essence of the problem without the struggle I went through.

Similarly, I see the “essence” of cello playing as playing these notes, in this order. I know this is a gross oversimplification, but this what my cello teacher prescribes as the first step in learning a new piece. And I agree with him - if you’re not comfortable playing these notes in this order, no amount of musicality or phrasing will save you. Once you have these notes down, you can then add bowings, rhythm, phrasing, etc to create a beautiful piece. Maybe the phrasing and performance just falls into place because of the order and rhythm of the notes. But you had to start at the essence first.

  • 2 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Startups, life, learning and happiness: The Anti-Todo List

Really like this concept of logging your finished items for the day rather than setting unreasonable goals to accomplish.

joelgascoigne:

For some time, I’ve gradually realised that my day is not only occupied by tasks from my todo list. Often, there are lots of other tasks which deserve time in my day just as much as those I have in my todo list. Previously, I found that these extra tasks detracted massively from my feeling…

  • 6 months ago > joelgascoigne
  • 26
  • Comments
  • Permalink
  • Share
    Tweet

How to set Helvetica as the default sans-serif font in Matplotlib

If you’re a typography junkie like me, you’re probably sick of seeing Bitstream Vera Sans as the typesetting font for Python’s plotting library, Matplotlib.

Just look at it:

So ugly, I know.

While Helvetica is a controversial font, it is simple, clean, and undisputedly easy to read. In my scientific figures, I’m not going for originality in the typography, I just want it clean and readable. And, as I use Helvetica in my own documents, I want my plot text and my document text to match, without having to go into Adobe Illustrator and change it all.

If you’ve tried to add Helvetica by editing your .matplotlibrc file

You’ve probably seen this error:

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/font_manager.py:1216: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to Bitstream Vera Sans

Adding Helvetica to the default font list

Helvetica is stored in OS X as a .dfont file which is inaccessible to Matplotlib, so we need to make it accessible. We will do this in six (6) steps.

1. Download and install Fondu to convert Mac-Helvetica to ttf-Helvetica

EDIT: changed source .tgz install to homebrew install.

To install Fondu, use homebrew, the “missing package manager for OSX.” It’s really fantastic, downloading and installing dependencies automagically for each package.

After you install homebrew, the command to install Fondu is,

brew install fondu

You may need to brew update if you are getting an error.

2. Find Helvetica on your system

Can use ‘FontBook.app’ to find where Helvetica is on your system, but in all likelihood it’s here:

/System/Library/Fonts/Helvetica.dfont

Need to convert this font file to .ttf via Fondu.

3. Find where matplotlib stores its data

To find out where matplotlib stores its data, we use this command. Note that these outputs are specific to my machine and Python installation.

$ python
>>> import matplotlib ; matplotlib.matplotlib_fname()
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/mpl-data/matplotlibrc

We need to put the .ttf in: matplotlib/mpl-data/fonts/ttf

$ cd /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/mpl-data/fonts/ttf/
$ sudo fondu -show /System/Library/Fonts/Helvetica.dfont  # need sudo access to get into the .dfont directory

If this is not your own machine, copy the Helvetica file to somewhere you can edit, and then run fondu:

$ mkdir ~/font_copies ;cp /System/Library/Fonts/Helvetica.dfont ~/font_copies
$ cd /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/mpl-data/fonts/ttf/
$ sudo fondu -show ~/font_copies/Helvetica.dfont 

Because we specified -show, which in this program is the equivalent of --verbose, we see all the files that are created. I don’t know why there’s a bunch of “Untitled” files but we’ll just leave them there. Now you should have a bunch of Helvetica*.ttf files in this directory. Let’s double-check:

$ ls -1 Helvetica*   # The '-1' argument forces the output to be a single column
Helvetica.ttf
HelveticaBold.ttf
HelveticaBoldOblique.ttf
HelveticaLight.ttf
HelveticaLightOblique.ttf
HelveticaOblique.ttf

BUT WAIT. Before you can go on happily using Helvetica, you need to set Helvetica as the default font in your .matplotlibrc file. Remember we found the location of your matplotlibrc file using matplotlib.matplotlib_fname() ?

4. Edit your .matplotlibrc file

The matplotlibrc file gets read in every time you import matplotlib, and this is where you can set customizations such as default fonts and colors. Since we’re good computer scientists, we’re going to copy the original .matplotlibrc file into a personal directory so it doesn’t get over written when we update matplotlib.

$ cp /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/mpl-data/matplotlibrc ~/.matplotlib/matplotlibrc

Now use your terminal-based editor of choice (mine is Emacs) to edit the file:

$ emacs -q ~/.matplotlib/matplotlibrc  # the '-q' doesn't load my .emacs file which has settings that aren't compatible with my terminal

Find the line:

#font.sans-serif     : Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif

This list of fonts is ordered in decreasing priority, meaning that Bitstream Vera Sans is used first, but if it’s not there, use Lucida Grande, if that’s not there, use Verdana, and so on. We want to uncomment it (remove the #), and make Helvetica the first priority:

font.sans-serif     : Helvetica, Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif

You can remove Helvetica from the end of the list, but it shouldn’t matter because the program will stop searching once it finds a font it knows.

5. Force matplotlib to re-scan the font lists and add Helvetica

Now we need to force matplotlib to re-create the font lists by removing the files.

$ rm ~/.matplotlib/fontList.cache ~/.matplotlib/fontManager.cache ~/.matplotlib/ttffont.cache
$ python -v  # watch all the imports fly by
>>> import matplotlib.pyplot as plt  # this will import and create the *.cache files we just removed

I was stuck at this:

dlopen("/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/ft2font.so", 2);
import matplotlib.ft2font # dynamically loaded from /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/ft2font.so

for quite some time, but be patient. It will finish :)

6. Plot your figures with Helvetica!

Helvetica!

What if you want to use something other than Helvetica?

If you have a .ttf file for the font

My favorite fixed-width font is Inconsolata-dz (this fixes the weird quotations of Inconsolata), which I use in all my terminal and code editing programs. If you installed the font before installing matplotlib, you can probably just use it. Let’s check the fontList file just to make sure:

$ grep Inconsolata ~/.matplotlib/fontList.cache
S'Inconsolata'
S'/Users/olgabotvinnik/Library/Fonts/Inconsolata.otf'
S'Inconsolata-dz'
S'/Users/olgabotvinnik/Library/Fonts/Inconsolata-dz.otf'
aS'/Users/olgabotvinnik/Library/Fonts/Inconsolata.otf'
aS'/Users/olgabotvinnik/Library/Fonts/Inconsolata-dz.otf'

Looks like Inconsolata’s there, so let’s try it out! To change your default font on the fly, add this to your code:

import matplotlib as mpl
mpl.rcParams['font.fixed-width'] = 'Inconsolata-dz'
mpl.rcParams['font.family'] = 'fixed-width'

If you do not have a .ttf file for the font

Say you’re a huge fan of the sans-serif font Gotham and want to use it for your plots. It’s a proprietary font and will probably not appear as a .ttf file. Follow the same steps 1-6, but find the location of your font using FontBook.app. Search for your font, and then either hit Command-R or click File —> Show in Finder. Then drag that file into Terminal and you’ll get the full path name. I use this Finder-Terminal drag and drop trick all the time!

To edit your final .pdf files in another program such as Illustrator

After producing your matplotlib figures, you may still want to tweak the axis or typography. To be able to do that, edit your ~/.matplotlibrc file from:

pdf.fonttype = 3

To:

pdf.fonttype = 42

And that’ll do it! Hat tip to Benjamin Reedlunn!

Appendix

The code and data used to create these files can be found here. To create the file, run the command:

python pondrfit_plots.py -f *.pondrfit -t 'Oct4 isoform 2' -c '#0080FF' -s

One of the hardest parts of this code was figuring out how to extract the start and end of the disordered regions (disorder score>0.5) when all I had was a sequential list of indices for each region. Turns out the trick is using both itertools and operator:

ranges = []
for k, g in groupby(enumerate(df_disordered.index), lambda (i,x):i-x):
    inds = map(itemgetter(1), g)
    ranges.append([inds[0], inds[-1]])

Where df_disordered.index is the indices of all the rows whose disordered score is greater than 0.5. This code is quite magical to me, as I’m still figuring out groupby and map but I will understand it!

OSX 10.7 (and older) Lion users

Apparently ‘fondu’ does not install on OSX 10.7 (I’m running 10.8) due to the XCode update in 10.8. One reader suggested FontForge as an alternative to convert between .dfont and .ttf formats. Based on Apple’s “Leave old OS’s behind” strategy, I’m guessing users of anything older than 10.7 will also need to use FontForge.

UPDATE: This same user informed me that FontForge does not perform the full conversion necessary for matplotlib in iPython (below). If you are running into a similar problem, I suggest finding someone with OSX 10.8 and getting them to do it. I cannot post the files myself because Apple technically prohibits tampering with system fonts.

iPython

If you use iPython notebooks, you may need to take an extra step for your rendered figures in ipython notebook --pylab inline to appear with Helvetica. I did the following:

  1. Shutdown and re-open my notebook. This purges all the imports. This can also be accomplished by interrupting and restarting the kernel from within the notebook, but I wanted a full reboot.
  2. Edit my both my ~/.bashrc and ./bash_profile (since I’m not sure which one matplotlib uses) to include the text: export MPLCONFIGDIR=$HOME/.matplotlib, which should tell matplotlib to look in that directory for the configuration files, such as your .matplotlibrc file.

Hope that helps!

    • #matplotlib
    • #helvetica
    • #tutorial
  • 6 months ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

How to use Google’s internet caching to your advantage

You’ve run into this problem: you search for something on the supposedly all-powerful Google search, click the link, but it’s dead. Nothing’s there. Not even a speck of information.

But Google said there was!

I ran into this exact problem for an assignment this week. We were supposed to use the Disordered Protein database DisProt and use the PONDR-FIT algorithm to find a protein that is predicted to be disordered, but is not in the DisProt database. However, the DisProt servers went down as of noon yesterday. With the assignment due today, what are we to do? Well, the due date was pushed back but I figured out how to search Disprot even though it was down, and learned a bit about Google’s internet caching (aka their plan to save all the knowledge ever publicly posted to the web.) in the process.

Viewing cached pages: step-by-step

  1. Do a Google search as usual
  2. Hover your cursor over the result you want, and you should see two arrowheads pointing to the right: » Click on them.
  3. A preview of your page should appear on the right.

Below is an example:

How to view cached pages

Turn on “Verbatim” mode to narrow your results

But what if you need to validate negative results, as we did for our assignment? Use “Verbatim” mode, which will stop auto-correcting your spelling to get you more result hits.

Without “Verbatim” mode

Without “Verbatim” mode, my “disprot oct4” query gets mangled to “disport oct4” and I get lyrics for some weird band I’ve never heard of.

To turn on “Verbatim” mode:

  1. Do a Google search as usual
  2. Click “Search Tools” in the bar below the search box
  3. Click “All results” (you want to filter so you don’t get all results)
  4. Click “Verbatim”

A visual tutorial is here:

With “Verbatim” mode

With “Verbatim” mode on, I get totally weird results which means that “oct4” is indeed nowhere to be found on DisProt!

Final note

Someone may have removed content from the internet for a reason such as copyright infringement. Please respect the owner of the intellectual property and do not use this method to crawl for copyrighted data.

    • #google
    • #disprot
    • #internet caching
    • #search tricks
    • #tutorial
  • 6 months ago
  • Comments
  • Permalink
  • Share
    Tweet
← Newer • Older →
Page 1 of 2

About

Avatar I'm Olga Botvinnik, a cello-playing, loose-leaf tea drinking, japanese pen-wielding matplotlib whisperer and bioinformatics and systems biology PhD student at UCSD.

Pages

  • olgabotvinnik.com
  • R Bloggers

Me, Elsewhere

  • @olgabot on Twitter
  • Linkedin Profile
  • olgabot on github

Twitter

loading tweets…

  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Pixel Union.

Powered by Tumblr