Course materials for The Biologist’s Toolkit (BIOL 3872): scientific computing for biology in R

Develop fundamental skills for the practical use of computers in biological sciences. The course focuses on computer-related techniques used in day-to-day biological work, covering best practices for scientific computing and data handling, theory of visualisation, and scripting, teaching students how to create, store, and manipulate data using the object-oriented programming language R. Examples and projects will be drawn from a wide variety of biological areas, covering typical problems encountered with computer use. The course is a prerequisite for higher-level quantitative courses in biology.

Instructor: Aaron MacNeil, LSC 7088,

TA: Taylor Gorham, LSC 7087,

Class location: Wednesday 08:30-11:30 Dunn 221C

Office hours: Wednesday 11:30-12:30 LSC 7088

In-class exercises (tasks) must be submitted before leaving the labs to receive credit!

Computer setup

Mac: check your current installation of python by opening a terminal window and entering python –version. If the version is 2.x, install the newest python version by entering in the terminal:

  1. xcode-select –install
  2. ruby -e “$(curl -fsSL”
  3. export PATH=/usr/local/bin:/usr/local/sbin:$PATH
  4. brew install python3
  5. alias python=’python3′

All users: Install each of:

  1. Download and set up Anaconda:      
  2. Download and set up R:
  3. Set up R kernel:                                    


Open Anaconda command window (Anaconda prompt) and run conda install -c r r-irkernel AND conda install r-essentials

Open a new Anaconda cmd window, and run jupyter notebook


Open a terminal window and run conda install -c r r-irkernel

Then run run conda install -c r r-essentials

Open a terminal window and run jupyter notebook


Magic link:

Lecture schedule

Week 1: Introduction to Jupyter and R – survey of programming languages: Link

Week 2: R programmaing – functions, loops, and logic: Link, data1, data2
Week 3: Code and data storage – Git, naming, excel from hell, csv files, Tidyverse: Link, data
Week 4: Data wrangling – DataFrames, arrays, lists, data manipulation: Link

  • Data cleanup assignment: Link
  • Due 12 October at 23:59:59.

Week 5: Databasing – Database skills, dplyr, merges, filters: Link, data
Week 6: Temporal data – manipulating dates and times: Link
Week 7: String manipulation – working with text, genetics, wrangling webpages: Link

  • Webpage wrangling assignment start

Week 8: HTML – tags, tags, and more tags: Link

  • Webpage wrangling assignment continued: Link
  • Due 11 November at 23:59:59.

Week 9: Living documents – LaTex, bemer, knitr, markdown: Link

  • Reproducible document assignment: Link
  • Due 25 November at 23:59:59.

Week 10: Plotting – looking at data, Base, ggplot: Link Murray Logan pdf

Week 11: Scientific graphics – Tufte, the data:ink ratio, small multiples, perceptions: Link
Week 12: Scientific graphics – Colours, transparency, symbols, vector graphics: Link

  • Beautiful graphics assignment: Link
  • Due 16 December at 23:59:59.

Grading scheme

Students will be assessed based on in-lab exercises (50%) and four assignments (50%). There will be no final exam.

Assignments are to be developed as Jupyter notebooks.

Academic integrity

Plagiarism, cheating, and other misconduct are serious violations of your contract as a Dalhousie student. You are expected to know and abide by Dalhousie’s policies regarding academic misconduct. Violations of these policies will be dealt with according to the Faculty Discipline Process.

For this course, plagiarism is defined as code that is identical or eerily similar to that of other students – programmers develop code that reflect their individual styles and these conventions are easily recognized. You are absolutely encouraged to collaborate and consult online forums such as Stack Overflow, however submitted work must be your own effort, with sources of borrowed code clearly indicated in script comments.