Chapter 6 Learning Bioinformatics

6.1 Books

6.1.1 Sequence analysis

BLAST: An essential guide to the Basic Local Alignment Search Tool

Introductory book to the topic of sequence aligments in general and Blast in particular. Includes Perl code.

Blast. Ian Korf, ... O'Reilly, 2003

Figure 6.1: Blast. Ian Korf, … O’Reilly, 2003

Biological sequence analysis: Probabilistic models of proteins and nucleic acids

Advance level understanding of the theory of sequence aligment and Hidden Markov Models (HMM).

Biological sequence analysis: Probabilistic models of proteins and nucleic acids. R. Durbin, S. Eddy, ... Cambridge, 1998

Figure 6.2: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. R. Durbin, S. Eddy, … Cambridge, 1998

6.1.1.1 Sequence Analysis in a Nutshell: A Guide to Tools and Databases

General introduction to many of the sequence analysis tools covered in this course.

Sequence Analysis in a Nutshell: A Guide to Tools and Databases. S. Markel, D. Leon , ... Oreilly, 2003

Figure 6.3: Sequence Analysis in a Nutshell: A Guide to Tools and Databases. S. Markel, D. Leon , … Oreilly, 2003

Molecular evolution and phylogenetics

Introductory level book on the topic of inferring phylogenies from sequence information.

Molecular evolution and phylogenetics. M. Nei, S. Kumar. Oxford University Press, 2000.

Figure 6.4: Molecular evolution and phylogenetics. M. Nei, S. Kumar. Oxford University Press, 2000.

6.1.2 Omics technologies

6.2 Online resources

6.2.1 Online courses

There are many free online courses by instructors from reputable universities at web sites like edX. You can find many topics related to Bioinformatics, Computational Biology, Statistics and Data analysis.

Screenshot of the edX home page at https://www.edx.org.

Figure 6.5: Screenshot of the edX home page at https://www.edx.org.

6.2.2 YouTube

Many Bioinformatics resources are putting a lot of resources into teaching how to use their services, with an emphasis on newcomers. Some of them provide introductory tutorials and case studies in videos hosted in their YouTube channels. For example there are channels for NCBI, Uniprot, EnsEMBL, and Bioconductor, to mention a few.

Screenshot of the Uniprot YouTube channel home page at https://www.youtube.com/user/uniprotvideos.

Figure 6.6: Screenshot of the Uniprot YouTube channel home page at https://www.youtube.com/user/uniprotvideos.

6.2.3 How to use the command line

Often using Bioinformatics tools requires knowledge of how to use the command line, specially in Linux/Unix environments. http://linuxcommand.org has a nice introduction to the Linux command line. Another interesting resource is https://www.codecademy.com, providing an interactive learning experience (It offers also paid courses). Also, here there is a tutorial on Bash scripting.

6.2.4 Programming languages

Learning a programming language is not absolutely necessary when using most of the bioinformatics tools described in this book. However, when analyzing high-throughput datasets it is neccessary to use them. In particular, Python and R are the most used languages in data science, including Bioinformatics. There are many free online resources to learn the basics of these programming languages. Check their web pages for tutorials for beginners.

Screenshot of the Python home page at https://www.python.org.

Figure 6.7: Screenshot of the Python home page at https://www.python.org.

Screenshot of the R home page at https://www.r-project.org.

Figure 6.8: Screenshot of the R home page at https://www.r-project.org.

6.2.5 Support Communities

Learning Bioinformatics, as with any other complex topic, can be daunting. Often Bioinformatics software come with good and detailed documentation and. It is also possible to frequently find short tutorials that can help newcomers get started. It is important to get a good understanding of the methods being used when the goal is to publish your results. The Bioinformatics community can sometimes help in the learning process by providing solutions to the most frequently encountered problems. Here I will highlight some of the most relevant online communities for the Bioinformatician.

Biostars

Biostars is an online community deboted to answering questions about “bioinformatics, computational genomics and biological data analysis”. Users can post questions, get answers in the same post. There is a rating system that enables to give credit to good questions and answers. It is possible to include comments for further clarification and discussion.

Bioinformatics Stack Exchange

Another useful resource for solving general questions related to Bioinformatics can be the Bioinformatics Stack Exchange network. These site follows the same phylosohpy of Biostars (indeed, Biostars originated in the old version of Stack Exchange) but offers, in my opinion, a better user inteface. The site is still in beta but most of the Biostars community can be found also in this new site.

Bioconductor support site

The Bioconductor support site provides support for questions related to the use of Bioconductor, using a similar interface to Biostars.

How to use the support sites

It is important to remember that online support sites are usually driven by voluntarees that lend their time and expertise to help others. Because the number of questions put in one of these support sites can be very large, it is very helpful if the questions themeselves are written in a way that simplify the work of those willing to help us with our doubts. For example, it is important to write simple yet meaningful titles that summarize what our problem is. This way the community can easily know if the question is related to a topic about which they have expertise. When describing your issues, be as explicit as possible. And whenever possible, include a minimal reproducible example. This is particularly important if you have problems with programming code but it can also apply to using a web server. For example, if you had some problem using a web server to generate a sequence aligment, include all the steps you performed to obtain the error (if any) you obtained. If possible, when needed, include a minimal example dataset to reproduce the error. In this example, it could be a small file with a few sequences that trigger the error.

The particulars will depend of the software and problem you are having. But as a rule of thumb it is always helpful to put yourself in the place of those reading your question, and imagining whether that is enough for them to understand your problem.