# C Key Points

This appendix lists the key points for each chapter.

## C.1 Getting Started

• Make tidiness a habit, rather than cleaning up your project files later.
• Include a few standard files in all your projects, such as README, LICENSE, CONTRIBUTING, CONDUCT and CITATION.
• Put runnable code in a bin/ directory.
• Put raw/original data in a data/ directory and never modify it.
• Put results in a results/ directory. This includes cleaned-up data and figures (i.e. everything created using what’s in bin and data).
• Put documentation and manuscripts in a docs/ directory.
• Refer to The Carpentries software installation guide if you’re having trouble.

## C.2 The Basics of the Unix Shell

• A shell is a program that reads commands and runs other programs.
• The filesystem manages information stored on disk.
• Information is stored in files, which are located in directories (folders).
• Directories can also store other directories, which forms a directory tree.
• pwd prints the user’s current working directory.
• / on its own is the root directory of the whole filesystem.
• ls prints a list of files and directories.
• An absolute path specifies a location from the root of the filesystem.
• A relative path specifies a location in the filesystem starting from the current directory.
• cd changes the current working directory.
• .. means the parent directory.
• . on its own means the current directory.
• mkdir creates a new directory.
• cp copies a file.
• rm removes (deletes) a file.
• mv moves (renames) a file or directory.
• * matches zero or more characters in a filename.
• ? matches any single character in a filename.
• wc counts lines, words, and characters in its inputs.
• man displays the manual page for a given command; some commands also have a --help option.

## C.3 Building Tools with the Unix Shell

• cat displays the contents of its inputs.
• head displays the first few lines of its input.
• tail displays the last few lines of its input.
• sort sorts its inputs.
• Use the up-arrow key to scroll up through previous commands to edit and repeat them.
• Use history to display recent commands and !number to repeat a command by number.
• Every process in Unix has an input channel called standard input and an output channel called standard output.
• > redirects a command’s output to a file, overwriting any existing content.
• >> appends a command’s output to a file.
• < operator redirects input to a command.
• A pipe | sends the output of the command on the left to the input of the command on the right.
• A for loop repeats commands once for every thing in a list.
• Every for loop must have a variable to refer to the thing it is currently operating on and a body containing commands to execute.
• Use $name or ${name} to get the value of a variable.

## C.4 Going Further with the Unix Shell

• Save commands in files (usually called shell scripts) for re-use.
• bash filename runs the commands saved in a file.
• $@ refers to all of a shell script’s command-line arguments. • $1, $2, etc., refer to the first command-line argument, the second command-line argument, etc. • Place variables in quotes if the values might have spaces or other special characters in them. • find lists files with specific properties or whose names match patterns. • $(command) inserts a command’s output in place.
• grep selects lines in files that match patterns.
• Use the .bashrc file in your home directory to set shell variables each time the shell runs.
• Use alias to create shortcuts for things you type frequently.

## C.5 Building Command-Line Programs in Python

• Write command-line Python programs that can be run in the Unix shell like other command-line tools.
• If the user does not specify any input files, read from standard input.
• If the user does not specify any output files, write to standard output.
• Place all import statements at the start of a module.
• Use the value of __name__ to determine if a file is being run directly or being loaded as a module.
• Use argparse to handle command-line arguments in standard ways.
• Use short options for common controls and long options for less common or more complicated ones.
• Use docstrings to document functions and scripts.
• Place functions that are used across multiple scripts in a separate file that those scripts can import.

## C.6 Using Git at the Command Line

• Use git config with the --global option to configure your user name, email address, and other preferences once per machine.
• git init initializes a repository.
• Git stores all repository management data in the .git subdirectory of the repository’s root directory.
• git status shows the status of a repository.
• git add puts files in the repository’s staging area.
• git commit saves the staged content as a new commit in the local repository.
• git log lists previous commits.
• git diff shows the difference between two versions of the repository.
• Synchronize your local repository with a remote repository on a forge such as GitHub.
• git remote manages bookmarks pointing at remote repositories.
• git push copies changes from a local repository to a remote repository.
• git pull copies changes from a remote repository to a local repository.
• git restore and git checkout recover old versions of files.
• The .gitignore file tells Git what files to ignore.

## C.7 Going Further with Git

• Use a branch-per-feature workflow to develop new features while leaving the master branch in working order.
• git branch creates a new branch.
• git checkout switches between branches.
• git merge merges changes from another branch into the current branch.
• Conflicts occur when files or parts of files are changed in different ways on different branches.
• Version control systems do not allow people to overwrite changes silently; instead, they highlight conflicts that need to be resolved.
• Forking a repository makes a copy of it on a server.
• Cloning a repository with git clone creates a local copy of a remote repository.
• Create a remote called upstream to point to the repository a fork was derived from.
• Create pull requests to submit changes from your fork to the upstream repository.

## C.8 Working in Teams

• Welcome and nurture community members proactively.
• Create an explicit Code of Conduct for your project modelled on the Contributor Covenant.
• Include a license in your project so that it’s clear who can do what with the material.
• Create issues for bugs, enhancement requests, and discussions.
• Label issues to identify their purpose.
• Triage issues regularly and group them into milestones to track progress.
• Include contribution guidelines in your project that specify its workflow and its expectations of participants.
• Make rules about governance explicit.
• Use common-sense rules to make project meetings fair and productive.
• Manage conflict between participants rather than hoping it will take care of itself.

## C.9 Automating Analyses with Make

• Make is a widely-used build manager.
• A build manager re-runs commands to update files that are out of date.
• A build rule has targets, prerequisites, and a recipe.
• A target can be a file or a phony target that simply triggers an action.
• When a target is out of date with respect to its prerequisites, Make executes the recipe associated with its rule.
• Make executes as many rules as it needs to when updating files, but always respects prerequisite order.
• Make defines automatic variables such as $@ (target), $^ (all prerequisites), and $< (first prerequisite). • Pattern rules can use % as a placeholder for parts of filenames. • Makefiles can define variables using NAME=value. • Makefiles can also use functions such as $(wildcard ...) and \$(patsubst ...).
• Use specially-formatted comments to create self-documenting Makefiles.

## C.10 Configuring Programs

• Overlay configuration specifies settings for a program in layers, each of which overrides previous layers.
• Use a system-wide configuration file for general settings.
• Use a user-specific configuration file for personal preferences.
• Use a job-specific configuration file with settings for a particular run.
• Use command-line options to change things that commonly change.
• Use YAML or some other standard syntax to write configuration files.
• Save configuration information to make your research reproducible.

## C.11 Testing Software

• Test software to convince people (including yourself) that software is correct enough and to make tolerances on “enough” explicit.
• Add assertions to code so that it checks itself as it runs.
• Write unit tests to check individual pieces of code.
• Write integration tests to check that those pieces work together correctly.
• Write regression tests to check if things that used to work no longer do.
• A test framework finds and runs tests written in a prescribed fashion and reports their results.
• Test coverage is the fraction of lines of code that are executed by a set of tests.
• Continuous integration re-builds and/or re-tests software every time something changes.

## C.12 Handling Errors

• Signal errors by raising exceptions.
• Use try/except blocks to catch and handle exceptions.
• Python organizes its standard exceptions in a hierarchy so that programs can catch and handle them selectively.
• “Throw low, catch high”, i.e., raise exceptions immediately but handle them at a higher level.
• Write error messages that help users figure out what to do to fix the problem.
• Store error messages in a lookup table to ensure consistency.
• Use a logging framework instead of print statements to report program activity.
• Separate logging messages into DEBUG, INFO, WARNING, ERROR, and CRITICAL levels.
• Use logging.basicConfig to define basic logging parameters.

## C.13 Tracking Provenance

• Publish data and code as well as papers.
• Use DOIs to identify reports, datasets, and software releases.
• Use an ORCID to identify yourself as an author of a report, dataset, or software release.
• Data should be FAIR: findable, accessible, interoperable, and reusable.
• Put small datasets in version control repositories; store large ones on data sharing sites.
• Describe your software environment, analysis scripts, and data processing steps in reproducible ways.
• Make your analyses inspectable as well as reproducible.

## C.14 Creating Packages with Python

• Use setuptools to build and distribute Python packages.
• Create a directory named mypackage containing a setup.py script as well as a subdirectory also called mypackage containing the package’s source files.
• Use semantic versioning for software releases.
• Use a virtual environment to test how your package installs without disrupting your main Python installation.
• Use pip to install Python packages.
• The default respository for Python packages is PyPI.
• Use TestPyPI to test the distribution of your package.
• Use a README file for package-level documentation.
• Use Sphinx to generate documentation for a package.
• Use Read The Docs to host package documentation online.
• Create a DOI for your package using GitHub’s Zenodo integration.
• Publish the details of your package in a software journal so that others can cite it.