Shared Content Block:
Styles -- retool bulleted and numbered lists, add "spaced-out" class

MOLB 7950: Informatics and Statistics for Molecular Biology

Fall Semester

A hands-on tutorial of skills and theory needed to process, analyze, and visualize output from large biological data sets. We emphasize command-line tools, Python programming, and the R statistical computing environment.


This course is divided into blocks- Bootcamp, DNA, RNA, and Protein. We reinforce concepts with problem sets assigned at the end of class on Mon and Wed that should take ~30 minutes to complete. Problems sets assigned on Friday will be more substantial, requiring ~2 hours to complete. Block exams will be assigned at the end of the Bootcamp, DNA, RNA, and protein blocks and should take 3-4 hours to complete; these will be graded by Instructors/TAs. Final projects can be completed in groups of 1-3 people. Projects will involve analysis of existing public data sets and end with a short presentation the last week of class.

Goals and Learning Objectives


The Bootcamp Block covers the basics of shell, R, and Python programming. We will meet every day for 90 minutes to cover the fundamental concepts you will need throughout the course. In addition, we will cover the basics of data types you will encounter during biological data analysis and approaches for their analysis.

Main Blocks

After Bootcamp, instruction will cover the experimental approaches used to analyze DNA, RNA, and protein. Each block spans ~5 weeks, with each week focused on a particular type of experiment (see below). Each block covers statistical concepts needed for rigorous analysis and analytic approaches to process raw data into results (tables or figures). In most weeks we will discuss and analyze data from a publication. You are responsible for reading the week’s material before class begins on Monday. Block Experiments-

  1. The DNA block covers genome sequencing for identifying mutations, and two approaches for analyzing chromatin state (ChIP-seq and MNase-seq).
  2. The RNA block covers RNA-seq, alternative splicing, differential gene expression, and RNA:protein interactions.
  3. The Protein block covers mass spectrometry, densitometry, and image analysis.

Recommended Reading

Data Processing and Visualization:



You will need to complete the following assignments before the relevant section of the bootcamp begins (listed under the assignments in your Datacamp workspace):

  1. Introduction to R (before week 2 of bootcamp)
  2. Introduction to the tidyverse (before week 2 of bootcamp)
  3. Introduction to Python (before week 3 of bootcamp)

Examinations and Grading

Class attendance in lecture and lab is a firm expectation; frequent absences or tardiness will be considered a legitimate cause for grade reduction.

Problem sets will be assigned periodically, usually on MWF. You may use online resources but must explicitly cite where you have obtained code (both code you used directly and “paraphrased” code/code used as inspiration). Any reused code that is not explicitly cited will be treated as plagiarism.

You may discuss the content of assignments with others in this class. If you do so, please acknowledge your collaborator(s) at the top of your assignment, for example: “Collaborators: Hillary and Bernie”. Failure to acknowledge collaborators will result in a grade of 0. You may not copy code and/or answers directly from another student. If you copy someone else’s work, both parties will receive a grade of 0. Rather than copying someone else’s work, ask for help. You are not alone in this course! Homework with the lowest score for each student will be dropped.

Late work policy for homework assignments and labs-

Late, but within 24 hours of due date/time = -50%

After 24 hours of due date/time = no credit

All regrade requests must be discussed with the professor within one week of receiving your grade. There will be no grade changes after the final project.

This course assesses learning through daily problem sets (graded by your peers), block exams, a final project, and your participation, broken out as follows-

Problem Sets = 30%

Block Exams = 30%

Final Project = 20%

Participation = 20%

Grades will be assigned as follows:

>= 95        A

>= 90        A-

>= 85        B+

>= 80        B


Jay Hesselberth, Neel Mukherjee, Maggie Lam, Suja Jagannathan, Srinivas Ramachandran, Matt Taliaferro

Teaching Assistants

Kent Riemondy, Rui Fu, Ryan Sheridan, Caitlin Winkler, Tyler Matheny


MOLB 7950- Syllabus

CMS Login