The Biologist Is In: How I Become a Computational Biologist.

My girlfriend was talking with a friend of hers who is trying to decide what major to go into in college. She is interested in biology and computer science, but is having a hard time deciding which specialty to go into. My girlfriend asked her, "Why not do both? Darren is a 'Computational Biologist'. Maybe I can get him to do a little write up of how he got where is now."

Such a discussion falls under the broad heading of biology which I have as the topic of my blog, so here goes…

I developed a strong interest in biology and computer science from an early age. My mother was a science teacher and my father was an engineer. We had biology textbooks and an early personal computer around when I was young. It wasn't inevitable that I became a computational biologist. I have four siblings who went on to become a computer scientist, a lawyer, an artist, and a public servant.

During high school, I spent much of my free time writing programs exploring different aspects of biology. I wrote programs to let me play with various kinds of cellular automata (CA) systems, which show the complex dynamics that life is known for. I wrote programs to let me play with various kinds of fractal systems, which show the recursive patterns seen in many plant and animal organs. I wrote a simple evolutionary biology system and explored the ecological impacts of changing mutation rates, energy supply, and other characteristics of the system. I was learning the highly ordered thinking needed to program computers, while thinking about biology.

In my first round of college, I majored in Biology. The computer science program at my school didn't allow non-CS majors to take any but the very intro class and the natural sciences program didn't allow students to double-major. I was stuck with biology course-work. This didn't bother me too much, as computers had always been a way for me to study biology. The idea of studying computers directly struck me as odd.

One summer during college at the University of Texas in Austin, I applied for a programming job with at one of the university research labs. I was told that because I wasn't a CS-major, they wouldn't hire me. This seemed unfair. I knew I had more programming experience than most of the CS-major students I knew. I kept programming for my own purposes during the rest of school.

Flickr page for these images.

I began work on a class of CA system which captures some of the mathematics involved in biological pattern formation systems (at right). At this time, I started seeing the underlying mathematics behind the different pattern systems I observed throughout biology. The splotches of color on the back of toads, or the stripes/spots on fish, or the stripes on snakes, all began to appear very simple.

After completing school, I moved to Minnesota and started working for a clinical blood testing lab. Technicians who came in later in the day could always know where I had been working because I would scribble in math on the blotting paper we covered work areas with to help contain any possible biohazardous spills. I continued to play with programming-biology.

Flickr page for these images.

I began implementing a class of CA systems called reaction-diffusion (RD) systems, which model differential equations that capture the physics of how chemicals diffuse and interact. Certain differential equation sets, like the Turing-RD system, spontaneously produce stripes and spots. Calculating the Turing-RD system with diffusional vectors that vary from place to place allowed me to generate the image at left. (It now strongly reminds me of the pattern of trails left by Caenorhabditis elegans worms as they travel over lab growth media.)

Flickr page for these images.

The next step was to work with sets of equations that describe more interesting biological systems. Hans Meinhardt's book, "The Algorithmic Beauty of Seashells" explores numerous biologically inspired RD systems which describe the patterns seen on various seashells.

All of these systems are calculated on rigid square pixel arrays, so I began developing a system for calculating RD systems on a more flexible cell network, where cells could distort, move, and replicate over time. I planned on using the system to explore concepts in development about how differential cell adhesion could drive the formation of tissue layers.

I continued working on these projects until I returned to school at the Genetics graduate program at the University of Minnesota. Until I complete my degree, my personal programming-biology projects have been set aside.

I struggled greatly with deciding how to choose which labs to apply for. I then, and now, am interested in biology in a very wide sense. Limiting myself to one speciality seemed the opposite of how I had approached biology before then. After rotating through a few labs and not finding my place, I spent some time soul-searching to try and determine my next step.

I remembered a professor who lectured one of the core curriculum classes about microarray analysis. Microarrays basically start as glass microscope slides with spots of very precisely defined DNA bonded to them as probes to examine some experimental DNA sample. The experimental DNA is fluorescently labelled and the amount of DNA bound to each spot can then be determined by measuring the amount of fluorescence from each spot on the slide. Typical microarrays contain thousands of spots, so the analysis of the resulting data is heavily reliant on computers and programming. I realized that I like playing with large datasets and approached the professor about joining her lab.

Even though I had no official credentials to suggest I had computational skills, my previous (decades-long) programming experience allowed me to understand and extend the tools used by the lab to analyze previously collected microarray data.

Soon, I began applying my programming skills to other tasks in the lab. I developed a flow cytometry protocol to determine the genome size of the Candida albicans strains we were working with and developed software to process the resulting data down to simple descriptions of biological relevance. When I joined the lab, flow cytometry was an untested technique for this organism and now the lab relies heavily on flow cytometry to track changes during experiments. The technique and computational tools allowed the lab to discover the existence of rare haploid C. albicans cells in the species long thought to be an 'obligate diploid'.

I designed a new microarray to reduce the cost and time needed to analyze C. albicans strains for genomic structural changes. The ~80,000 probes of the microarray were designed to measure the number of DNA copies as well as the alleles present across the C. albicans chromosomes.

Lately I've been designing/implementing a data analysis pipeline to deal with the very large datasets which are produced from whole-genome sequencing experiments. Filtering and compressing the gigabytes of data into relatively simple to interpret figures illustrating changes in DNA copy number and allelic composition requires integrating several tools previously published by others with large chinks of custom code into a single analysis pipeline. We just submitted the paper for this project and are waiting for the review process to complete.

I'm now actively writing my thesis for my PhD, which will be the first official credential illustrating my computational skills. Once I graduate, I won't have to argue with people who think that because I'm a biologist I wouldn't know anything about computers. I still have to work out my next steps, but I'm not worried about my future employment opportunities. Computational biologists are in high demand, in large part because so few people have both the computational and biological intuitions needed for the work.

If you're starting college and are interested in biology and computers, your next steps probably won't include a specialized computational-biology program. In general, there just aren't such programs right now. If you're lucky, your university will have a class or two which covers the topic of computational biology at more than a very basic level.

During college, I studied computers as a side-line to my official biology studies while taking as many math courses for electives as they would let me. I know another computational biologist who was in a computer science program and took as many biology courses as he could for his electives, in addition to his own side studies. Your path will probably look more or less like one of ours, with an official focus on biology of computer science and lots of personal effort dedicated to the other subject in your off-hours. You won't be able to simply rely on college advisors or departments to get you the experience to become a computational biologist. You will have to study long and hard outside of what the college programs are designed for.

You will probably have to continue your schooling with a graduate degree, as the credential you get upon completion of undergrad won't be an indication of your interest and capabilities in this cross-discipline niche. Graduate programs in science pay you to go to school, so you won't have to struggle with another five years of student loans or other outside financing. They don't pay highly, but they do pay plenty enough to get by. When you're done, in addition to the credential of a graduate degree, you will have authorship on one or more papers which highlight your particular style of thinking and skills to potential future employers.

By this time, it will be obvious that you are a Computational Biologist.

The Biologist Is In

Tuesday, August 19, 2014

How I Become a Computational Biologist.

No comments:

Post a Comment