Still Magic

It’s still magic even if you know how it’s done.

– Terry Pratchett

As research becomes more computing intensive, researchers need more computing skills. Being able to write code and use version control is a start; the goal of this training is to help you take your next steps so that:

  • other people (including your future self) can re-do your analyses,
  • you and the people using your results can be confident that they’re correct,
  • and re-using your software is easier than rewriting it.

These lessons are part of Merely Useful, and are based on:

  • “A Quick Guide to Organizing Computational Biology Projects” [Nobl2009]
  • “Ten Simple Rules for Making Research Software More Robust” [Tasc2017]
  • “Best Practices for Scientific Computing” [Wils2014]
  • “Good Enough Practices in Scientific Computing” [Wils2017]
  • The Software Carpentry and Data Carpentry lessons

For notes on how these lessons were designed, please see s:design and [Wils2018].

Who Are These Lessons For?

Samira completed a Master’s in epidemiology five years ago, and has worked since then for a small NGO. She did a biostatistics course during her degree, and has learned some more about R and Python by doing a couple of online data science courses on her own, but has no formal training in programming.

Samira would like to tidy up the scripts, data sets, and reports she has created over the past few years in order to share them with two junior staff the NGO has just hired. These lessons will show her what her first steps should be and what “done” looks like.

Jun completed an Insight Data Science fellowship last year, and now works for a company that does forensic audits. He has used a wide variety of machine learning and visualization software, and has collaborated on a couple of open source R packages.

Jun is supposed to spend 10% of his time on community projects, and has decided to help review research papers for a forensic accounting journal. This course will show him what a mature data science project should look like so that he can evaluate the papers he is sent. It will also show him what improvements his firm’s clients should make to their internal analysis pipelines.


For researchers and data scientists who have a basic understanding of the Unix shell, Python, and Git, and who want to be more productive and have more confidence in their results, this training course provides a pragmatic, tools-based introduction to program design and maintenance. Unlike academic software engineering courses and most books aimed at professional software developers, this course uses data analysis as a motivating example and assumes that the learner’s ultimate goal is to answer a question rather than ship an application.

Learners must be comfortable with the basics of the Unix shell, Python, and Git at the level covered by the core Software Carpentry lessons. They will need a personal computer with Internet access, the Bash shell, Python 3, and a GitHub account.