Still Magic

September 2018: This material is under very active development, and should all be considered the "brainstorming" phase of a rational lesson design process. Please see the design notes for an explanation of scope and plan. We would appreciate your help: please send us suggestions or file an issue in our GitHub repository. (We would particularly appreciate descriptions of common errors and how to fix them—there are subsections for these in every lesson.) Please note that all contributors are required to abide by our Code of Conduct.

It’s still magic even if you know how it’s done.

– Terry Pratchett

As research becomes more computing intensive, researchers need more computing skills. Being able to write code and use version control is a start; the goal of this training is to help you take your next steps so that:

  • other people (including your future self) can re-do your analyses,
  • you and the people using your results can be confident that they’re correct,
  • and re-using your software is easier than rewriting it.

These lessons are the second of three parts of Merely Useful. Each part is designed as a one-semester course:

  1. One Extra Fact is the fundamentals that every researcher ought to know (basically, what the Carpentries teach, but in more detail).
  2. This course, Still Magic, is more advanced material for people whose primary role is building research software. The learner profiles below describe its intended audience in more detail.
  3. Finally, Set On Fire is intended for those who find themselves running larger projects with external collaborators.

This material is based on:

  • “A Quick Guide to Organizing Computational Biology Projects” [Nobl2009]
  • “Ten Simple Rules for Making Research Software More Robust” [Tasc2017]
  • “Best Practices for Scientific Computing” [Wils2014]
  • “Good Enough Practices in Scientific Computing” [Wils2017]
  • The Software Carpentry and Data Carpentry lessons

For notes on how these lessons were designed, please see s:design and [Wils2018].

Who Are These Lessons For?

Samira completed a Master’s in epidemiology five years ago, and has worked since then for a small NGO. She did a biostatistics course during her degree, and has learned some more about R and Python by doing a couple of online data science courses on her own, but has no formal training in programming.

Samira would like to tidy up the scripts, data sets, and reports she has created over the past few years in order to share them with two junior staff the NGO has just hired. These lessons will show her what her first steps should be and what “done” looks like.

Jun completed an Insight Data Science fellowship last year, and now works for a company that does forensic audits. He has used a wide variety of machine learning and visualization software, and has collaborated on a couple of open source R packages.

Jun is supposed to spend 10% of his time on community projects, and has decided to help review research papers for a forensic accounting journal. This course will show him what a mature data science project should look like so that he can evaluate the papers he is sent. It will also show him what improvements his firm’s clients should make to their internal analysis pipelines.

Summary

For researchers and data scientists who have a basic understanding of the Unix shell, Python, and Git, and who want to be more productive and have more confidence in their results, this training course provides a pragmatic, tools-based introduction to program design and maintenance. Unlike academic software engineering courses and most books aimed at professional software developers, this course uses data analysis as a motivating example and assumes that the learner’s ultimate goal is to answer a question rather than ship an application.

Learners must be comfortable with the basics of the Unix shell, Python, and Git at the level covered by the core Software Carpentry lessons. They will need a personal computer with Internet access, the Bash shell, Python 3, and a GitHub account.