|
|
An Interview with Two Bioinformaticians: Bioinformatics from the Computer Science and Biology Perspectives
Bioinformatics exists at the intersection between two main areas of specialization; computing science and biology. To give us an understanding of the field from both of these perspectives, I interviewed a bioinformatics expert from computing science, Anne Condon, Professor of Computer Science at UBC, and a biology bioinformatics expert, Ryan Brinkman, from Xenon Genetics, Vancouver.
What does bioinformatics mean to you?
|
|
Dr. Anne Condon |
|
Anne Condon (AC): Using computational approaches to gain a better understanding of organisms at the molecular level, and applying that understanding to improve our health and our environment.
Ryan Brinkman (RB): Using computers to help understand biology. Of course it is more than just putting data into a spreadsheet and making a pretty graph. I think the real push for bioinformatics started when biologists started getting really large datasets; the kind that no longer fits well into a spreadsheet. What biologists needed were people who understood databases, and how to put data in, and more importantly get data out, so we can answer those questions that we are trying to answer. This led to the development and exponential growth of databases like GenBank. So by extension, I would say most of bioinformatics is using computers to understand or process large biological datasets that are probably sitting in some sort of database. Of course there are lots of specializations in bioinformatics that this definition doesn't cover, like algorithm development, but there are those who would argue that is more computational biology than bioinformatics.
Why do you think that bioinformatics is important?
AC: Today, we can use computers to access much more biological data than ever before. You can learn a lot by analysing this data. For example, you can identify genes by comparing genomic data across organisms and identifying patterns in the data. Insights as to the structure of proteins can be obtained through computer analyses of the protein sequences. These approaches are a lot faster and a lot cheaper than relying solely on wet lab or X-ray crystallographic techniques. Of course, computational techniques are often not as reliable as getting the first-hand view of the molecules. Clearly, you need both worlds: you need to take advantage of the computer tools when you know the predictions are reliable and use the more expensive techniques (in the wet lab) when you cant get away from it.
|
|
|
Dr. Ryan Brinkman |
RB: Biologists have become really good at generating data. Biology has grown beyond the sticking pins into butterfly and placing them into a glass case stage. Now we sequence that butterfly's genome and put that data on the web for everyone to download. But once we have our huge pile of information (like DNA sequence), we need people who can put that butterfly sequence into a database, do the analysis and also put the raw data on the web so other biologists can also have access to it. Of course every biologist who wants that data probably needs to have at least a bit of bioinformatics experience before they can make much use of it. With all the data, you get more interesting ideas of research questions to ask, but since all that data is probably too much to go through by hand, you need to know computers at every stage of the game, from interacting with the database to get the data out, to writing a script to Blast each of the thousand sequences one at a time, to another program which feeds your interesting hits into a primer design program.
What do you think are the more interesting areas of bioinformatics?
AC: I think one interesting area is understanding the networks and chains of interactions among molecules within the cell. How does everything happen within the cell? In computer science, at a very low level, computer processors are circuits; there are inputs and outputs and there is an information flow through logic gates and eventually the answer is computed. In a cell, something roughly similar is happening and wed like to understand what those circuits are. I think that is a really interesting question.
Another important question is disease diagnosis. How might we classify diseases better using, say, genomic data from individuals? Can we provide better treatment if we have better classifications? And a third thing Im interested in is using biological molecules like DNA and RNA to design and build new molecules that you dont find in nature but might be useful as, say, in nanoscale technologies or maybe even as new drugs.
RB: Looking at what the academic world is publishing, microarray research is really hot right now, and people haven't quite figured out what the best way is (some might argue if we should even be trying) to store, process, and understand this data, but there are lots of interesting ideas.
In the next two or three years what will the important advances in the field be?
AC: The sequencing of the human genome has just been completed and in the next two or three years I expect progress will be made in identifying the genes. Right now, we dont even know how many genes we have! Also, in the next two or three years, well be learning more about the structure and function of proteins in the cell. Hopefully, in the longer term well be able to piece together that information to get a more complete picture of regulatory networks in the cell.
RB: Whole genome approaches. We are really good at working with the sequence of single genes at the moment. Technology is pushing towards whole genome sequencing/re-sequencing, and once we are there we will need tools to process that data.
3D Protein structure prediction from amino acid sequence is another area of research where we are getting much better and could have big implications for pharmaceutical research, for example.
What are the different things computer scientists and biologists bring to bioinformatics? What can each tell the other?
AC: Biologists understand what the important problems are and what information would help solve these problems. They bring a very good sense of the molecular data on the computer, and what types of errors get introduced in acquiring the data. They can look at the alignment of molecular sequences and know, maybe, whether it is high quality of not. Nowadays, many biologists are also good at using computer tools and databases to find and analyze the information they need.
What computer scientists bring is, first of all, an understanding of the potential and the limitations of the computer tools. If you run your data through a computer tool, in order to assess the results you need to understand how the results are being produced by the tool. The computer scientist understands what the computer is doing and can say, I wouldnt expect it work well on this data or I wouldnt expect it to work well when the sequences are too long. The computer scientist is also able to design new tools or improve on current tools, keeping both quality and efficiency considerations in mind.
RB: I would propose that your typical computer scientist and biologist think in different ways. In many respects, most biologists, at heart, are collectors of information who then use these collections to make discoveries, like the butterfly collector. Computer scientists, by nature, have to be much more process and detail orientated.
What is a typical day at work for you?
AC: My typical day . . . A lot of my time is spent teaching during the academic year. I teach a bioinformatics course or integrate bioinformatics into non-bioinformatics courses, which is kind of fun. Because of the growth in bioinformatics, much of my time is spent at meetings; planning programs, planning for hiring, managing programs, selecting students for programs. Curriculum development and planning for new programs has been a big part of things for me.
When you get to the research . . . I spend a certain amount of time every day getting updates in the literature, finding out what it new, sharing that with students or them sharing it with me and thinking about particular problems we are interested in. Right now we are interested in getting better predictions of RNA secondary structure.
Travel is another big thing - going to meet people and trying to figure out what is important to work on in the bigger community and how our work stacks up with others work.
RB: That is a tough question because one of the things I love about this job is that every day is so very different.
In the past meant things like looking at a particular region of the genome, getting a list of all what all genes, get an idea of what they do, and, if they look interesting, get all the exons for the region and design primers, hands on kind of stuff as we have a small bioinformatics group here. Of course all that usually takes more than a day. There also may be sequence to look at from another region under investigation, to see if we have found any variations. Working with bench scientists to improve our programs that do things like automatically find mutations, or store patient data. I also try and make it a point every day to review some the bioinformatics newsgroups and journals to keep up on things that will make my job easier. Finally it seems I'm always writing tiny scripts of one sort or another to do one-off kinds of things like move a whole bunch of sequence data around that we just got from a collaborator, or blast another huge chunk of sequence looking for similarity to one gene of interest.
What are the biggest challenges you face in your work?
AC: One of them is that, being in computer science, I have to get exposure to the biologists perspective. I have to go out and find the biologist to talk with and make sure I keep their perspective at the front of things. They arent in my building and so it takes time and effort to make this happen. There is never enough time! This is a fast-paced field and keeping on top of what is going on can be challenging, so our work stays relevant. Those are probably the main challenges.
RB: Keeping on top of all the new advances in bioinformatics that would make my job easier if I only knew about them. It is all about doing the least amount of work to get the maximum impact. If I can find a tool that does something, or something close to what I want that means I don't have to start from scratch.
How did you get into the field of bioinformatics?
AC: My background is in computer sciences, including design of algorithms and theoretical computer science. I got interested in this because of some work in an area called DNA computing. That idea is to build computers out of DNA. Len Adleman - the person who started that line of work - is a computer scientist who spent a quite a bit of time in a biology lab. His work really caught the attention of many people in the computer science community. But then, as I got more interested in some of the algorithmic problems that come up, I realized they are also relevant in the more traditional biological areas. You have to talk to chemists and biochemists if you want to understand DNA and RNA - you cant get the right level of understanding just from reading papers. I got more interested in their perspectives and problems.
RB: It wasn't until I started doing bioinformatics that I finally knew what I wanted to do in my life. It wasn't something that I set out to learn, or had heard about. I did my undergraduate degree in Biology and Biotechnology (even that took me the first few years of undergrad to figure out), but I was also something of a proto-geek. That is I liked messing about with computers, but I hadn't taken any classes or anything like that. My first real job out of undergrad was designing databases for a group of molecular biologists, but from there I was pointed to a job at the Genome Sequencing Center of St. Louis. This was ten years ago, and I think the job description was Scientific Programmer, but what I really was doing was bioinformatics. After doing that for a few years I knew I had found my true calling, but that I also wanted to do more interesting things then what I was doing, so I went back to school to get my Ph.D. in genetics and after that I was lucky enough to find the perfect job at Xenon which lets me do really interesting science, and get paid for it. I was motivated to get into bioinformatics because I found that I loved what I was doing. Everything is new, so there is no wrong way to do something because people haven't found the right way yet. That and one of the wonderful things about being a scientist is the feeling of discovery. As a bioinformatician you get first crack at the data that the biologist worked so hard to generate.
What advice would you give to a first year university student?
AC: At the early stages, it is very important to get the fundamentals in both the biological fields and the computer science fields. There is no way getting away from that. A lot of what people need is available right now in the traditional curricula; you can piece together classes in molecular biology, genetics, statistics, programming, database classes, and algorithm classes. Personally, I think that it is the people who have the background in both of those areas that are going be able to make the biggest contributions. It is sometimes a stretch for students to get depth in both areas, but I think this is important for students to do.
Another thing is networking. Find out where bioinformatics is done in the community and, if you find the time, go to the talks at VanBUG and other forums. Dont be shy to go up after someone has spoken, introduce yourself and let them know you are interested. Or, check the web pages of your professors to see if they what their research is about, and talk to them outside of class. Networking is really important. There are many opportunities for students to dive in and learn from the inside what is going on if they make the right contacts. This can be a better use of a students time than loading up on extra courses. It is important to get the perspective from the inside.
RB: One of the great things about bioinformatics is its open nature of collaborations. There is so much to do, there are many ways even students can pitch in and give a hand. One of the leaders of BioPerl, one of the big bioinformatics projects, is a graduate student. Even as an undergrad there are lots of ways to get involved. Find out about open source projects, since they, by definition, are free, even undergrads can afford them. So install some software, I would say the Ensembl database is a great place to start. You don't need to download the entire human genome (though you can do that if you want, its there, its free and it fits on your hardrive), but at least get the tools that let you explore the genome. There are perl scripts to learn, database techniques to delve into. Lots of goodies. Get familiar with the tools that are there to do bioinformatics. There are lots of interesting things out there and if you can pull something out of your bag of tricks down the road, that just means you don't need to do it yourself. Plus it will help you get that first job. If I saw a resume from someone who had taken the time to learn a bunch of tools themselves, which means more than just pasting a sequence into NCBI's BLAST page, but really understanding how BLAST works, they definitely have a leg up on the competition. If someone has installed Ensembl's database on their own (they do a pretty good job of telling you how in their documents, but it still takes you actually doing it rather than just reading how easy it is), well that goes a long way. Write some code that parses data out of a MySQL database. I think if you look at any bioinformatics job posting they are going to have three things in common, they'll want you to have a background in biology (and going to class is a great way to get this background), but they are also going to want you to be able to code (Perl is used a lot, but Java, C/C++ and PHP are all used for different thing) and be able to interact with databases. Well download Perl and MySQL (they are free) and learn these other skills on your own. This advice applies more to biologists interested in bioinformatics, for computer scientists its of course different since its a bit more difficult to setup a molecular biology lab in your kitchen, though people have done it. For the computer scientists I would say it is really important focus on the biology aspect of bioinformatics, not the tools side, since that is where they will tend to be the weakest. Of course it will be likely much harder going through a fourth year genetics textbook than say running Blast, but it will pay off far more in the long run. The Genes series by Benjamin Lewin and Introduction to Genetic Analysis by Anthony Griffiths are good places to start, just get the latest versions, as believe it or not, genetics technology and understanding changes just as fast as computer technology does, if not faster.
Read. Do lots of reading. Perhaps a bioinformatics journal is a good place to start. This will give you an idea of the kinds of research people are doing in bioinformatics. Find something that you think is neat that you may want to learn more about.
Learn biology. I mention it twice because it is that important.
Finding a good mentor helps, but learn all you can on your own as well.
Do what you love doing and everything will come into place. That may sound trite but I can say it does seem to work. People may be coming into bioinformatics because they heard it's the next hot job market. While that may not be as true as it once was, there is always room for smart people in any field. You have to like what you are doing though, otherwise what's the point in having a "hot job" if you don't really love what you are doing. There are lots of different areas of bioinformatics to delve into, find one that excites you.
If Ryan Brinkman quit his day job he'd no longer have the pleasure of bringing all that is bioinformatics to Xenon Genetics, where he spends his time helping the company identify drug candidates through clinical genetics. He completed his Ph.D. in Genetics at UBC in 2001.
For more information on Xenon visit Bioteach's entry on this company or go to:
www.xenongenetics.com
Anne Condon is a Professor of Computer Science at U. British Columbia. Her research interests are in complexity theory and bioinformatics, and DNA computing. Condon received a B.Sc. degree from University College, Cork, Ireland in 1982 and a Ph.D. from the University of Washington in 1987, and spent 12 years as a faculty member at U. Wisconsin prior to coming to UBC in 1999. Her thesis, a study of game-theoretic complexity classes, won an ACM Distinguished Dissertation Award in 1988 and she received a National Young Investigator Award in 1992. She was named Distinguished Alumna of University College Cork in 2001.
For more information on Annes work visit:
|
|