University of Colorado Anschutz Medical Campus

MOLB 7900: Practical Computational Biology for Biologists — Python

Spring Semester

A computational biology class aimed at biology PhD students. Topics covered include basic practices for coding in Python, analysis of standard high-throughput genomic data to study the regulation of gene expression, integration of multiple datasets for genomic analysis, and introduction to scientific computing in Python.

Overview

Classroom time will consist of a brief lecture on the theory and principles behind the day’s material. This will be followed by interactive exploration of the topics to be covered using Jupyter notebooks in a cloud-based CoCalc environment. These notebooks allow fully integrated explanation of topics, blocks of code, and embedded graphics. Each classroom session will consist of one Jupyter notebook that will cover that day’s material. Near the end of the notebook, there will be a coding exercise that will be completed by the student within the notebook.

Lectures and interaction will be conducted either in-person or via Zoom. Additionally, a Slack channel will be created for the course to facilitate discussion both inside and outside of class time.

Goals and Learning Objectives

The goal of this course is to introduce students to the Python programming language and its application to computational biology. Students are not expected to have any prior experience with Python or other programming languages.

At the completion of the course, students will be familiar with the basic tenets of Python programming such that they are able to write basic software and scripts that will enable them to derive meaning from the large datasets typical of modern biology. Further, students will have the basic skills necessary to continue their computational development using tried-and-true crowd-sourced knowledge repositories including Stack Overflow. In short, this course aims to “teach students to fish” by teaching them the necessary basics of the Python language and pointing them toward resources for further development, rather than “giving students a fish” by handing them pre-designed, cookie-cutter scripts that answer specific biological questions.

The primary learning objectives are therefore to work towards being able to:

Understand basic datatypes, their properties, limitations, and combinations
Master the use of different containers of these datatypes and understand their relative advantages and disadvantages
Understand and use iterating loops and their associated controls
Incorporate external packages into the workflow of student-designed software
Be able to define and write functions and understand their scope, as well as incorporate Python’s powerful built-in functions where appropriate
Use the spreadsheet-mimicking package pandas to organize and manipulate tabular data
Create a computational workflow that identifies analyzes the genes and sequences bound by a protein in a ChIP-seq experiment
Create a computational workflow that integrates ChIP-seq and RNA-seq data to analyze the biological consequences of chromatin-binding events

Prerequisites

While there are no strict prerequisites for this course, a basic knowledge of the Unix command line may be helpful. Additionally, a basic knowledge of the experimental techniques behind genomic experiments including ChIP-seq and RNA-seq may also be helpful.

The coursework must be performed on a laptop to which the student has continuous access. IF AVAILABLE, STUDENTS ARE HIGHLY RECOMMENDED TO USE UNIX-BASED, e.g. Mac, Linux, SYSTEMS.

It is REQUIRED that all students configure their laptops for use with course materials. All computation will be done using the cloud-based platform CoCalc. To ensure smooth operation of CoCalc, it is REQUIRED students attend a preliminary meeting prior to the first day of class.

Examinations and Grading

Coding exercises at the end of each lesson will be completed by students within their Jupyter notebook; notebook files must be turned in prior to the beginning of the next class period. These exercises will, in sum, account for 50% of the student’s grade. The remaining 50% will be comprised of a take-home final exam to be administered after the completion of the final classroom session. This exam will also be completed in Jupyter notebook format and must be turned in to the instructors by 5 PM on the last day of class. To complete the exercises and final exam, students are permitted to use any materials available them online or otherwise. Students must acquire at least 50% of the possible points in the class in order to pass.

Letter Grade Thresholds

A 75-100%

A- 60-74.9%

B+ 55-59.9%

B 50-54.9%

Instructors

Srinivas Ramachandran- srinivas.ramachandran@cuanschutz.edu

Matthew Taliaferro- matthew.taliaferro@cuanschutz.edu

Schedule

TBD

Course Materials

MOLB 7900

Tools & Resources

CU Campuses

CU Anschutz Medical Campus

RNA Bioscience Initiative

School of Medicine