Chapter 1 Introduction
Bioinformatics can be defined as the use of computational methods to answer questions in Biology. Another way to define it is as the field concerned with the development of such tools. Another related name is Computational Biologist, which refers to a Biologist that use mainly Bioinformatics tools to tackle questions in Biology. All these terms can be confusing but here we will put our focus on the Biologist point of view. From that perspective Bioinformatics is just a tool, in the same way as Molecular Biology is, that can help us to get insight into Biology.
Another confusion comes from the overlap with other fields like Statistics and Computer Science. This is because researchers that use Bioinformatics tools may inevitably end up using also methods and tools from Statistics, Computer Science, Machine learning, etc. Here we will concentrate on the description of mainly purely Bioinformatics tools. However, it is unavoidable that reference and/or knowledge of these other fields will be required to understand how some Bioinformatics tools work. Often it will also be necessary to interpret the results obtained from the use of Bioinformatics tools.
Bioinformatics is a very broad terms covering and overlapping with other previously existing fields. Some of the topics touched by Bioinformatics include:
- Sequence analysis
- Omics analysis
- Structural analysis
- Phylogenetic analysis
Below we will briefly describe each of these topics. In this introductory course, however, we will focus on two of them, sequence analysis and omics analysis.
1.1 Sequence analysis
The main goal of sequence analysis is two answer questions regarding DNA, RNA and/or protein sequences. For example, we could be presented with an unknown sequence, like the one shown here:
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDL
PSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGC
VKIKKCIIM
The question is what can we learn about this unknown sequence identity from the sequence information alone. In particular, we can ask:
- Are there any other similar sequences?
- What parts of the sequences are similar and which one are not?
- Can we see a conservation pattern among a set of sequences?
- What is its function?
Sequence analysis deals with the methods used to answer some of these questions.
1.2 Omics analysis
In omics analysis we are concerned with the manipulation, processing and analysis of high-throughtput data from omics technologies, including microarrays, next generation sequencing (NGS) and mass-spectromemtry (MS). High-throughput here means that we are concerned with measuring the expression of, for example, all the transcripts in the cell, not just a small number of them. We may also be interested on the information about all the proteins, or all the metabolites, etc. Omics technologies enables this to be done, and generate large datasets with information about all the molecular components of the cell.
For example, we can consider an study in which the effect of some drug on the expression profile of some cells will be evaluated. Mice are treated with either the drug or some placebo. Cells are isolated from the tissue and RNA is extracted. After amplification and convertion into cDNA we hybridize the samples to microarrays containing probes for >55 thousand transcripts. This will give us an estimate of the expression profile for 55 thousand transcripts. If we have initially 3 samples per group, we will obtain a 55,000 x 6 matrix of expression values.
Omics analysis deals with problems associated with the manipulation, processing and analysis of omics datasets. How can we process and analyze this data in a way that avoids biases and gives us some information about the biological question of interest?
1.3 Structural analysis
Structural bioinformatics deals with methods predicting the structure of biopolymers and their interactions with one another as well as with small molecules. It can be very important in order to decipher the mechanism of action of some drugs.
1.4 Phylogenetic analysis
Phylogenetic analysis is concerned with the evolutionary relationships between biological sequences and what can that tell us about the evolution of species. It uses information directly related with the sequence analysis methods described above, and includes additional methods to talckle specific questions in the field.