First Intro to Bioinformatics course

A few weeks ago I finished teaching the first Intro to Bioinformatics course at Fred Hutch Cancer Research Center as part of Erick Matsen's program. This ambitious seven week class sought to teach biologists the basics of computational analyses of biological data. So far, feedback has been great, and the waiting list for future courses is around 140 long!

Herein I'd like to share some of the challenges and strategy that came up in putting this course together.

Reflecting on the class

The biggest challenge in designing the curriculum is the breadth and depth of bioinformatics. An adept bioinformatician should be able to

  • navigate the Unix shell
  • read/write a couple programming languages
  • apply statistical methods
  • create data visualizations
  • deal with moderately sized to large data sets
  • above all, intuitively and iteratively apply all the above

Achieving this in full this takes years. Consequently, it was clear a seven week class would have to be strategic to be of any value.

The goals

The first step was deciding on a realistic but meaningful set of goals for a seven week class:

  • A good feeling for the overall gestalt of bioinformatics
  • Some of the basic skills necessary to start working with data sets
    • Unix shell and philosophy
    • Intro to programming with Python
    • Git for version control of data and code
  • Enough context, guidance and resources that students could continue to teach themselves independently

If we couldn't make a bioinformatics expert out of someone in 7 weeks, we could at least put them in a good position to get there on their own.

Distilling the essence

The first problem I had to solve was how to convey the big picture of what bioinformatics is, and how we approach it. I spent a great deal of time reflecting on this.

An analogy from our bioinformatics text I particularly liked related the skills of a bioinformatician to those of a jazz musician. As a musician myself, it's possible I'm biased in admiration of this analogy, but I think it accurately highlights the open-ended nature of bioinformatics. Both require a sense of exploration, and evade proceduralization.

Thankfully, most of my students were already biologists and knew how to explore questions of science through data and experiment. Emphasizing this connection and framing bioinformatics as an extension of their existing knowledge made it easier convey the gestalt to them.

Building curriculum around actions

I've spent years tutoring and teaching, but never designed a curriculum from scratch. Luckily, my friend and cofounder Colin Megill has done this extensively (from middle school class room to Fortune 500 company), and shared some valuable advice:

Build around what students do

While simple, this provided a sensible compass with which to guide the course, keeping students engaged, and ensuring the content stayed practical.

My specific approach:

  • In class, worked through examples illustrating aspects of the conceptual material, encouraging students to work along as well.
  • Occasionally ask students simple questions to keep them engaged and thinking critically.
  • Assign homework challenges, with solutions at the end of the slides.
  • Primary slides we didn't get to during class would also be homework.

These pieces did not instantly mesh smoothly. It took time to feel out how to balance and weave them together, but by the end we'd reached a nice flow.

Building on a single story

Learning bioinformatics is difficult enough without the incidental cognitive load of switching contexts in a very hands-on course. To avoid this, the classes focused on incrementally building analyses of a single real-world data set. Aside from reducing cognitive load, this

  • Gave students a sense of the iterative side of data analysis
  • Ensured the course material remained pragmatic
  • (Hopefully) gave students a sense of accomplishment, seeing what the covered skills could achieve

The data set we used was (shamelessly) taken from one of my Simian Foamy Virus papers. Aside from being a rich data set to work with, being intimately familiar with the story the data had to tell before hand made it easier to weave the beginnings of that story into the curriculum.

Other details about the class

The book we worked through for the class was Bioinformatics Data Skills, by Vince Buffalo. This book did an excellent job of conveying a very data centric big picture overview of bioinformatics, and in many ways was very in line with the work flows I had used as a bioinformaticist.

In particular, it did a great job of explaining the value of the Unix philosophy, the Unix environment and version control as applies to bioinformatics. I loosely based the first several classes around the material from various chapters, culling together just the most important high level information, leaving the rest for students to explore on their own.

Python over R

My one qualm with the book was the focus on R as a programming language. While I have a strong appreciation for R (in particular for data visualization, table-centric programming, and canned stats), I don't think it's a good general purpose language. Moreover, Erick and I had been keen on teaching Python as part of the course before the book had even come into the picture; the Hutch already had a number of introductory R classes, and Python is an excellent complement to R for bioinformatics.

Unfortunately, I had a hard time finding the right material to work from for the course. I didn't want to pull in another book for such a short class, and also found curricula from sources like Data Carpentry and Software Carpentry lacking the right focus or scope.

I decided to build from scratch in order to tailor the curriculum to better match the rest of the course. It was extra work, but the added flexibility was worth it, and it turned out to be a lot of fun.

Still, I wanted the students to have a resource to drawn on for further explanation and examples. For this, I chose the Codecademy Python course. While I have some reservations about it as primary instruction material, the scope and interactive UI were perfect as a complementary resource.


The biggest problems we had in the course were technical issues, especially earlier on. Each glitch was rarely a big deal in isolation, but when multiplied by 15 participants, the pace quickly ground to a halt.

We realized early on that an instructor's assistant tending to students with technical issues (as opposed to conceptual questions) was vital to keeping things moving.

There was still some challenge estimating the right amount of material for each hour class, but my feel for this got better as the classes went on.


I've really enjoyed getting back into teaching. For me, teaching has always been part of my learning process, helping me solidify what I already know by recapitulating it for others, and broadening my understanding by thinking about things from different perspectives.

We're scheduled to do the second iteration of this class in the Fall. I'm excited to go through the material again and smooth out the rough edges. With such a long waiting list, they're be plenty of opportunity to rework and optimize.

If you're interested in the material from this course, check out the website (and it's source code). While many of the slides are in need of additional context from the lecture, we've gotten feedback that folks with prior technical experience were able to benefit from working through them independently.