Chapter 2 The Basics of the Unix Shell

Ninety percent of most magic merely consists of knowing one extra fact.

— Terry Pratchett

Computers do four basic things: store data, run programs, talk with each other, and interact with people. They do the last of these in many different ways, of which graphical user interfaces (GUI) are the most widely used. The computer displays icons to show our files and programs, and we tell it to copy or run those by clicking with a mouse. GUIs are easy to learn but hard to automate, and don’t create a record of what we did.

In contrast, when we use a command-line interface (CLI) we communicate with the computer by typing commands, and the computer responds by displaying text. CLIs existed long before GUIs; they have survived because they are efficient, easy to automate, and automatically record what we have done.

The heart of every CLI is a read-evaluate-print loop (REPL). When we type a command and press Return (also called Enter) the CLI reads the command, evaluates it (i.e., executes it), prints the command’s output, and loops around to wait for another command. If you have used an interactive console for R or Python, you have already used a simple CLI.

This lesson introduces another CLI that lets us interact with our computer’s operating system. It is called a “command shell”, or just shell for short, and in essence is a program that runs other programs on our behalf (Figure 2.1). Those “other programs” can do things as simple as telling us the time or as complex as modeling global climate change; as long as they obey a few simple rules, the shell can run them without having to know what language they are written in or how they do what they do.

The Shell

Figure 2.1: The Shell

The shell’s greatest strength is that it lets us combine programs to create pipelines that can handle large volumes of data. Sequences of commands can be saved in a script, just as commands for R or Python can be saved in programs, which makes our workflows more reproducible. Finally, the shell is often the easiest way to interact with remote machines—in fact, the shell is practically essential for working with clusters and the cloud. We won’t need this much power in our Zipf’s Law examples, but as we will see, being able to combine commands and save our work makes life easier even when working on small problems.

What’s in a Name?

Programmers have written many different shells over the last forty years, just as they have created many different text editors and plotting packages. The most popular shell today is called Bash (an acronym of Bourne Again SHell, and a weak pun on the name of its predecessor, the Bourne shell). Other shells may differ from Bash in minor ways, but the core commands and ideas remain the same. In particular, the most recent versions of MacOS use a shell called the Z Shell or zsh; we will point out a few differences as we go along.

Please see Appendix E for instructions on how to launch the shell on your computer.

2.1 Exploring Files and Directories

When Bash runs it presents us with a prompt to indicate that it is waiting for input. By default, this prompt is a simple dollar sign:

$

However, different shells may use a different symbol: in particular, the zsh shell that is the default on newer version of MacOS uses %. As we’ll see in Chapter 3, we can customize the prompt to give us more information.

Don’t Type the Dollar Sign

We show the $ prompt so that it’s clear what you are supposed to type, particularly when several commands appear in a row, but you should not type it yourself.

Let’s run a command to find out who the shell thinks we are:

$ whoami
amira

Learn by Doing

Amira is one of the learners described in Section 1.2. For the rest of the book, we’ll present code and examples from her perspective. You should follow along on your own computer, though what you see might deviate in small ways because of differences in operating system (and because your name probably isn’t Amira).

Now that we know who we are, we can explore where we are and what we have. The part of the operating system that manages files and directories (also called folders) is called the filesystem. Some of the most commonly-used commands in the shell create, inspect, rename, and delete files and directories. Let’s start exploring them by running the command pwd, which stands for print working directory. The “print” part of its name is straightforward; the “working directory” part refers to the fact that the shell keeps track of our current working directory at all times. Most commands read and write files in the current working directory unless we tell them to do something else, so knowing where we are before running a command is important.

$ pwd
/Users/amira

Here, the computer’s response is /Users/amira, which tells us that we are in a directory called amira that is contained in a top-level directory called Users. This directory is Amira’s home directory; to understand what that means, we must first understand how the filesystem is organized. On Amira’s computer it looks like Figure 2.2.

Sample Filesystem

Figure 2.2: Sample Filesystem

At the top is the root directory that holds everything else, which we can refer to using a slash character, / on its own. Inside that directory are several other directories, including bin (where some built-in programs are stored), data (for miscellaneous data files), tmp (for temporary files that don’t need to be stored long-term), and Users (where users’ personal directories are located). We know that /Users is stored inside the root directory / because its name begins with /, and that our current working directory /Users/amira is stored inside /Users because /Users is the first part of its name. A name like this is called a path because it tells us how to get from one place in the filesystem (e.g., the root directory) to another (e.g., Amira’s home directory).

Slashes

The / character means two different things in a path. At the front of a path or on its own, it refers to the root directory. When it appears inside a name, it is a separator. Windows uses backslashes (\\) instead of forward slashes as separators.

Underneath /Users, we find one directory for each user with an account on this machine. Jun’s files are stored in /Users/jun, Sami’s in /Users/sami, and Amira’s in /Users/amira. This is where the name “home directory” comes from: when we first log in, the shell puts us in the directory that holds our files.

Home Directory Variations

Our home directory will be in different places on different operating systems. On Linux it may be /home/amira, and on Windows it may be C:\Documents and Settings\amira or C:\Users\amira (depending on the version of Windows). Our examples show what we would see on MacOS.

Now that we know where we are, let’s see what we have using the command ls (short for “listing”), which prints the names of the files and directories in the current directory:

$ ls
Applications Documents    Library      Music        Public         todo.txt
Desktop      Downloads    Movies       Pictures     zipf

Again, our results may be different depending on our operating system and what files or directories we have.

We can make the output of ls more informative using the -F option (also sometimes called a switch or a flag). Options are exactly like arguments to a function in R or Python; in this case, -F tells ls to decorate its output to show what things are. A trailing / indicates a directory, while a trailing * tell us something is a runnable program. Depending on our setup, the shell might also use colors to indicate whether each entry is a file or directory.

$ ls -F
Applications/ Documents/    Library/      Music/        Public/        todo.txt
Desktop/      Downloads/    Movies/       Pictures/     zipf/

Here, we can see that almost everything in our home directory is a subdirectory; the only thing that isn’t is a file called todo.txt.

Spaces Matter

1+2 and 1 + 2 mean the same thing in mathematics, but ls -F and ls-F are very different things in the shell. The shell splits whatever we type into pieces based on spaces, so if we forget to separate ls and -F with at least one space, the shell will try to find a program called ls-F and (quite sensibly) give an error message like ls-F: command not found.

Some options tell a command how to behave, but others tell it what to act on. For example, if we want to see what’s in the /Users directory, we can type:

$ ls /Users
amira   jun     sami

We often call the file and directory names that we give to commands arguments to distinguish them from the built-in options. We can combine options and arguments:

$ ls -F /Users
amira/  jun/    sami/

but we must put the options (like -F) before the names of any files or directories we want to work on, because once the command encounters something that isn’t an option it assumes there aren’t any more:

$ ls /Users -F
ls: -F: No such file or directory
amira   jun     sami

Command Line Differences

Code can sometimes behave in unexpected ways on different computers, and this applies to the command line as well. For example, the following code actually does work on some Linux operating systems:

$ ls /Users -F

Some people think this is convenient; others (including us) believe it is confusing, so it’s best to avoid doing this.

2.2 Moving Around

Let’s run ls again. Without any arguments, it shows us what’s in our current working directory:

$ ls -F
Applications/ Documents/    Library/      Music/        Public/        todo.txt
Desktop/      Downloads/    Movies/       Pictures/     zipf/

If we want to see what’s in the zipf directory we can ask ls to list its contents:

$ ls -F zipf
data/

Notice that zipf doesn’t have a leading slash before its name. This absence tells the shell that it is a relative path, i.e., that it identifies something starting from our current working directory. In contrast, a path like /Users/amira is an absolute path: it is always interpreted from the root directory down, so it always refers to the same thing. Using a relative path is like telling someone to go two kilometers north and then half a kilometer east; using an absolute path is like giving them the latitude and longitude of their destination.

We can use whichever kind of path is easiest to type, but if we are going to do a lot of work with the data in the zipf directory, the easiest thing would be to change our current working directory so that we don’t have to type zipf over and over again. The command to do this is cd, which stands for change directory. This name is a bit misleading because the command doesn’t change the directory; instead, it changes the shell’s idea of what directory we are in. Let’s try it out:

$ cd zipf

cd doesn’t print anything. This is normal: many shell commands run silently unless something goes wrong, on the theory that they should only ask for our attention when they need it. To confirm that cd has done what we asked, we can use pwd:

$ pwd
/Users/amira/zipf
$ ls -F
data/

Missing Directories and Unknown Options

If we give a command an option that it doesn’t understand, it will usually print an error message and (if we’re lucky) tersely remind us of what we should have done:

$ cd -j
-bash: cd: -j: invalid option
cd: usage: cd [-L|-P] [dir]

On the other hand, if we get the syntax right but make a mistake in the name of a file or directory, it will tell us that:

$ cd whoops
-bash: cd: whoops: No such file or directory

We now know how to go down the directory tree, but how do we go up? This doesn’t work:

$ cd amira
cd: amira: No such file or directory

because amira on its own is a relative path meaning “a file or directory called amira below our current working directory”. To get back home, we can either use an absolute path:

$ cd /Users/amira

or a special relative path called .. (two periods in a row with no spaces), which always means “the directory that contains the current one”. The directory that contains the one we are in is called the parent directory, and sure enough, .. gets us there:

$ cd ..
$ pwd
/Users/amira

ls usually doesn’t show us this special directory—since it’s always there, displaying it every time would be a distraction. We can ask ls to include it using the -a option, which stands for “all”. Remembering that we are now in /Users/amira:

$ ls -F -a
./              Applications/   Documents/      Library/        Music/          Public/         todo.txt
../             Desktop/        Downloads/      Movies/         Pictures/       zipf/

The output also shows another special directory called . (a single period), which refers to the current working directory. It may seem redundant to have a name for it, but we’ll see some uses for it soon.

Combining Options

You’ll occasionally need to use multiple options in the same command. In most command line tools, multiple options can be combined with a single - and no spaces between the options:

$ ls -Fa

This command is synonymous with the previous example. While you may see commands written like this, we don’t recommend you use this approach in your own work. This is because some commands take long options with multi-letter names, and it’s very easy to mistake --no (meaning “answer ‘no’ to all questions”) with -no (meaning -n -o).

The special names . and .. don’t belong to cd: they mean the same thing to every program. For example, if we are in /Users/amira/zipf, then ls .. will display a listing of /Users/amira. When the meanings of the parts are the same no matter how they’re combined, programmers say they are orthogonal. Orthogonal systems tend to be easier for people to learn because there are fewer special cases to remember.

Other Hidden Files

In addition to the hidden directories .. and ., we may also comes across files with names like .jupyter or .Rhistory. These usually contain settings or other data for particular programs; the prefix . is used to prevent ls from cluttering up the output when we run ls. We can always use the -a option to display them.

cd is a simple command, but it allows us to explore several new ideas. First, several .. can be joined by the path separator to move higher than the parent directory in a single step. For example, cd ../.. will move us up two directories (e.g., from /Users/amira/zipf to /Users), while cd ../Movies will move us up from zipf and back down into Movies.

What happens if we type cd on its own without giving a directory?

$ pwd
/Users/amira/Movies
$ cd
$ pwd
/Users/amira

No matter where we are, cd on its own always returns us to our home directory. We can achieve the same thing using the special directory name ~, which is a shortcut for our home directory:

$ ls ~
Applications    Documents       Library         Music           Public          todo.txt
Desktop         Downloads       Movies          Pictures        zipf

(ls doesn’t show any trailing slashes here because we haven’t used -F.) We can use ~ in paths, so that (for example) ~/Downloads always refers to our download directory.

Finally, cd interprets the shortcut - (a single dash) to mean the last directory we were in. Using this is usually faster and more reliable than trying to remember and type the path, but unlike ~, it only works with cd: ls - tries to print a listing of a directory called - rather than showing us the contents of our previous directory.

2.3 Creating New Files and Directories

We now know how to explore files and directories, but how do we create them? To find out, let’s go back to our zipf directory:

$ cd ~/zipf
$ ls -F
data/

To create a new directory, we use the command mkdir (short for make directory):

$ mkdir analysis

Since analysis is a relative path (i.e., does not have a leading slash) the new directory is created below the current working directory:

$ ls -F
analysis/  data/

Using the shell to create a directory is no different than using a graphical tool. If we look at the current directory with our computer’s file browser we will see the analysis directory there too. The shell and the file explorer are two different ways of interacting with the files; the files and directories themselves are the same.

Naming Files and Directories

Complicated names of files and directories can make our life painful. Following a few simple rules can save a lot of headaches:

  1. Don’t use spaces. Spaces can make a name easier to read, but since they are used to separate arguments on the command line, most shell commands interpret a name like My Thesis as two names My and Thesis. Use - or _ instead, e.g, My-Thesis or My_Thesis.

  2. Don’t begin the name with - (dash) to avoid confusion with command options like -F.

  3. Stick with letters, digits, . (period or ‘full stop’), - (dash) and _ (underscore). Many other characters mean special things in the shell. We will learn about some of these during this lesson, but these are always safe.

If we need to refer to files or directories that have spaces or other special characters in their names, we can surround the name in quotes (""). For example, ls "My Thesis" will work where ls My Thesis does not.

Since we just created the analysis directory, ls doesn’t display anything when we ask for a listing of its contents:

$ ls -F analysis

Let’s change our working directory to analysis using cd, then use a very simple text editor called Nano to create a file called draft.txt (Figure 2.3):

$ cd analysis
$ nano draft.txt
The Nano Editor

Figure 2.3: The Nano Editor

When we say “Nano is a text editor” we really do mean “text”: it can only work with plain character data, not spreadsheets, images, Microsoft Word files, or anything else invented after 1970. We use it in this lesson because it runs everywhere, and because it is as simple as something can be and still be called an editor. However, that last trait means that we shouldn’t use it for larger tasks like writing a program or a paper.

Recycling Pixels

Unlike most modern editors, Nano runs inside the shell window instead of opening a new window of its own. This is a holdover from an era when graphical terminals were a rarity and different applications had to share a single screen.

Once Nano is open we can type in a few lines of text, then press Ctrl+O (the Control key and the letter ‘O’ at the same time) to save our work. Nano will ask us what file we want to save it to; press Return to accept the suggested default of draft.txt. Once our file is saved, we can use Ctrl+X to exit the editor and return to the shell.

Control, Ctrl, or ^ Key

The Control key, also called the “Ctrl” key, can be described in a bewildering variety of ways. For example, Control plus X may be written as:

  • Control-X
  • Control+X
  • Ctrl-X
  • Ctrl+X
  • C-x
  • ^X

When Nano runs it displays some help in the bottom two lines of the screen using the last of these notations: for example, ^G Get Help means “use Control+G to get help” and ^O WriteOut means “use Control+O to write out the current file”.

Nano doesn’t leave any output on the screen after it exits, but ls will show that we have indeed created a new file draft.txt:

$ ls
draft.txt

Dot Something

All of Amira’s files are named “something dot something”. This is just a convention: we can call a file mythesis or almost anything else. However, both people and programs use two-part names to help them tell different kinds of files apart. The part of the filename after the dot is called the filename extension and indicates what type of data the file holds: .txt for plain text, .pdf for a PDF document, .png for a PNG image, and so on. This is just a convention: saving a PNG image of a whale as whale.mp3 doesn’t somehow magically turn it into a recording of whalesong, though it might cause the operating system to try to open it with a music player when someone double-clicks it.

2.4 Moving Files and Directories

Let’s go back to our zipf directory:

cd ~/zipf

The analysis directory contains a file called draft.txt. That isn’t a particularly informative name, so let’s change it using mv (short for move):

$ mv analysis/draft.txt analysis/prior-work.txt

The first argument tells mv what we are “moving”, while the second is where it’s to go. “Moving” analysis/draft.txt to analysis/prior-work.txt has the same effect as renaming the file:

$ ls analysis
prior-work.txt

We must be careful when specifying the destination because mv will overwrite existing files without warning. An option -i (for “interactive”) makes mv ask us for confirmation before overwriting. mv also works on directories, so mv analysis first-paper would rename the directory without changing its contents.

Now suppose we want to move prior-work.txt into the current working directory. If we don’t want to change the file’s name, just its location, we can provide mv with a directory as a destination and it will move the file there. In this case, the directory we want is the special name . that we mentioned earlier:

$ mv analysis/prior-work.txt .

ls now shows us that analysis is empty:

$ ls analysis

and that our current directory now contains our file:

$ ls
analysis/  data/  prior-work.txt

If we only want to check that the file exists, we can give its name to ls just like we can give the name of a directory:

$ ls prior-work.txt
prior-work.txt

2.5 Copy Files and Directories

The cp command copies files. It works like mv except it creates a file instead of moving an existing one (and no, we don’t know why the creators of Unix seemed to be allergic to vowels):

$ cp prior-work.txt analysis/section-1.txt

We can check thatcp did the right thing by giving ls two arguments to ask it to list two things at once:

$ ls prior-work.txt analysis/section-1.txt
analysis/section-1.txt  prior-work.txt

Notice that ls shows the output in alphabetical order. If we leave off the second filename and ask it to show us a file and a directory (or multiple directories) it lists them one by one:

$ ls prior-work.txt analysis
prior-work.txt

analysis:
section-1.txt

Copying a directory and everything it contains is a little more complicated. If we use cp on its own, we get an error message:

$ cp analysis backup
cp: analysis is a directory (not copied).

If we really want to copy everything, we must give cp the -r option (meaning recursive):

$ cp -r analysis backup

Once again we can check the result with ls:

$ ls analysis backup
analysis/:
section-1.txt

backup/:
section-1.txt

2.6 Deleting Files and Directories

Let’s tidy up by removing the prior-work.txt file we created in our zipf directory. The command to do this is rm (for remove):

$ rm prior-work.txt

We can confirm the file is gone using ls:

$ ls prior-work.txt
ls: prior-work.txt: No such file or directory

Deleting is forever: unlike most GUIs, the Unix shell doesn’t have a trash bin that we can recover deleted files from. Tools for finding and recovering deleted files do exist, but there is no guarantee they will work, since the computer may recycle the file’s disk space at any time. In most cases, when we delete a file it really is gone.

In a half-hearted attempt to stop us from erasing things accidentally, rm refuses to delete directories:

$ rm analysis
rm: analysis: is a directory

We can tell rm we really want to do this by giving it the recursive option -r:

$ rm -r analysis

rm -r should be used with great caution: in most cases, it’s safest to add the -i option (for interactive) to get rm to ask us to confirm each deletion. As a halfway measure, we can use -v (for verbose) to get rm to print a message for each file it deletes. This options works the same way with mv and cp.

2.7 Wildcards

zipf/data contains the text files for several ebooks from Project Gutenberg:

$ ls data
README.md         moby_dick.txt
dracula.txt       sense_and_sensibility.txt
frankenstein.txt  sherlock_holmes.txt
jane_eyre.txt     time_machine.txt

The wc command (short for word count) tells us how many lines, words, and letters there are in one file:

$ wc data/moby_dick.txt
 22331  215832 1276222 data/moby_dick.txt

What’s in a Word?

wc only considers spaces to be word breaks: if two words are connected by a long dash—like “dash” and “like” in this sentence—then wc will count them as one word.

We could run wc more times to count find out how many lines there are in the other files, but that would be a lot of typing and we could easily make a mistake. We can’t just give wc the name of the directory as we do with ls:

$ wc data
wc: data: read: Is a directory

Instead, we can use wildcards to specify a set of files at once. The most commonly-used wildcard is * (a single asterisk). It matches zero or more characters, so data/*.txt matches all of the text files in the data directory:

$ ls data/*.txt
data/dracula.txt       data/sense_and_sensibility.txt
data/frankenstein.txt  data/sherlock_holmes.txt
data/jane_eyre.txt     data/time_machine.txt
data/moby_dick.txt

while data/s*.txt only matches the two whose names begin with an ‘s’:

$ ls data/s*.txt
data/sense_and_sensibility.txt  data/sherlock_holmes.txt

Wildcards are expanded to match filenames before commands are run, so they work exactly the same way for every command. This means that we can use them with wc to (for example) count the number of words in the books with names that contains an underscore:

$ wc data/*_*.txt
  21054  188460 1049294 data/jane_eyre.txt
  22331  215832 1253891 data/moby_dick.txt
  13028  121593  693116 data/sense_and_sensibility.txt
  13053  107536  581903 data/sherlock_holmes.txt
   3582   35527  200928 data/time_machine.txt
  73048  668948 3779132 total

or the number of words in Frankenstein:

$ wc data/frank*.txt
  7832  78100 442967 data/frankenstein.txt

The exercises will introduce and explore other wildcards. For now, we only need to know that it’s possible for a wildcard expression to not match anything. In this case, the command will usually print an error message:

$ wc data/*.csv
wc: data/*.csv: open: No such file or directory

2.8 Reading the Manual

wc displays lines, words, and characters by default, but we can ask it to display only the number of lines:

$ wc -l data/s*.txt
  13028 sense_and_sensibility.txt
  13053 sherlock_holmes.txt
  26081 total

wc has other options as well. We can use the man command (short for manual) to find out what they are:

$ man wc

Paging Through the Manual

If our screen is too small to display an entire manual page at once, the shell will use a paging program called less to show it piece by piece. We can use and to move line-by-line or Ctrl+Spacebar and Spacebar to skip up and down one page at a time. (B and F also work.)

To search for a character or word, use / followed by the character or word to search for. If the search produces multiple hits, we can move between them using N (for “next”). To quit, press Q.

Manual pages contain a lot of information—often more than we really want. Figure 2.3 includes excerpts from the manual on your screen, and highlights a few of features useful for beginners.

Manual highlights

Figure 2.4: Manual highlights

Some commands have a --help option that provides a succinct summary of possibilites, but the best place to go for help these days is probably the TLDR website. The acronym stands for “too long, didn’t read”, and its help for wc displays this:

wc
Count words, bytes, or lines.

Count lines in file:
wc -l {{file}}

Count words in file:
wc -w {{file}}

Count characters (bytes) in file:
wc -c {{file}}

Count characters in file (taking multi-byte character sets into account):
wc -m {{file}}

edit this page on github

As the last line suggests, all of its examples are in a public GitHub repository so that users like you can add the examples you wish it had. For more information, we can search on Stack Overflow or browse the GNU manuals (particularly those for the core GNU utilities, which include many of the commands introduced in this lesson). In all cases, though, we need to have some idea of what we’re looking for in the first place: someone who wants to know how many lines there are in a data file is unlikely to think to look for wc.

2.9 Combining Commands

Now that we know a few basic commands, we can introduce one of the shell’s most powerful features: the ease with which it lets us combine existing programs in new ways. Let’s go into the zipf/data directory and count the number of lines in each file once again:

$ cd ~/zipf/data
$ wc -l *.txt

  15975 dracula.txt
   7832 frankenstein.txt
  21054 jane_eyre.txt
  22331 moby_dick.txt
  13028 sense_and_sensibility.txt
  13053 sherlock_holmes.txt
   3582 time_machine.txt
  96855 total

Which of these books is shortest? We can check by eye when there are only 16 files, but what if there were eight thousand?

Our first step toward a solution is to run this command:

$ wc -l *.txt > lengths.txt

The greater-than symbol > tells the shell to redirect the command’s output to a file instead of printing it. Nothing appears on the screen; instead, everything that would have appeared has gone into the file lengths.txt. The shell creates this file if it doesn’t exist, or overwrites it if it already exists. ls lengths.txt confirms that the file exists:

$ ls lengths.txt
lengths.txt

We can print the contents of lengths.txt using cat, which is short for concatenate (because if we give it the names of several files it will print them all in order):

$ cat lengths.txt
  15975 dracula.txt
   7832 frankenstein.txt
  21054 jane_eyre.txt
  22331 moby_dick.txt
  13028 sense_and_sensibility.txt
  13053 sherlock_holmes.txt
   3582 time_machine.txt
  96855 total

We can now use sort to sort the lines in this file:

$ sort lengths.txt
   3582 time_machine.txt
   7832 frankenstein.txt
  13028 sense_and_sensibility.txt
  13053 sherlock_holmes.txt
  15975 dracula.txt
  21054 jane_eyre.txt
  22331 moby_dick.txt
  96855 total

Just to be safe, we should use sort’s -n option to specify that we want to sort numerically. Without it, sort would order things alphabetically so that 10 would come before 2.

sort does not change lengths.txt. Instead, it sends its output to the screen just as wc did. We can therefore put the sorted list of lines in another temporary file called sorted-lengths.txt using > once again:

$ sort lengths.txt > sorted-lengths.txt

Redirecting to the Same File

It’s tempting to send the output of sort back to the file it reads:

$ sort -n lengths.txt > lengths.txt

However, all this does is wipe out the contents of lengths.txt. The reason is that when the shell sees the redirection, it opens the file on the right of the > for writing, which erases anything that file contained. It then runs sort, which finds itself reading from a newly-empty file.

Creating intermediate files with names like lengths.txt and sorted-lengths.txt works, but keeping track of those files and cleaning them up when they’re no longer needed is a burden. Let’s delete the two files we just created:

rm lengths.txt sorted-lengths.txt

We can produce the same result more safely and with less typing using a pipe:

$ wc -l *.txt | sort -n
   3582 time_machine.txt
   7832 frankenstein.txt
  13028 sense_and_sensibility.txt
  13053 sherlock_holmes.txt
  15975 dracula.txt
  21054 jane_eyre.txt
  22331 moby_dick.txt
  96855 total

The vertical bar | between the wc and sort commands tells the shell that we want to use the output of the command on the left as the input to the command on the right.

Running a command with a file as input has a clear flow of information: the command performs a task on that file and prints the output to the screen (Figure 2.5a). When using pipes, however, the information flows differently after the first (upstream) command. The downstream command doesn’t read from a file. Instead, it reads the output of the upstream command (Figure 2.5b).

Piping commands

Figure 2.5: Piping commands

We can use | to build pipes of any length. For example, we can use the command head to get just the first three lines of sorted data, which shows us the three shortest books:

$ wc -l *.txt | sort -n | head -n 3
   3582 time_machine.txt
   7832 frankenstein.txt
  13028 sense_and_sensibility.txt

Options Can Have Values

When we write head -n 3, the value 3 is not input to head. Instead, it is associated with the option -n. Many options take values like this, such as the names of input files or the background color to use in a plot.

We could always redirect the output to a file by adding > shortest.txt to the end of the pipeline, thereby retaining our answer for later reference.

In practice, most Unix users would create this pipeline step by step, just as we have: by starting with a single command and adding others one by one, checking the output after each change. The shell makes this easy by letting us move up and down in our command history with the and keys. We can also edit old commands to create new ones, so a very common sequence is:

  • Run a command and check its output.
  • Use to bring it up again.
  • Add the pipe symbol | and another command to the end of the line.
  • Run the pipe and check its output.
  • Use to bring it up again.
  • And so on.

2.10 How Pipes Work

In order to use pipes and redirection effectively, we need to know a little about how they work. When a computer runs a program—any program—it creates a process in memory to hold the program’s instructions and data. Every process in Unix has an input channel called standard input and an output channel called standard output. (By now you may be surprised that their names are so memorable, but don’t worry: most Unix programmers call them “stdin” and “stdout”, which are pronounced “stuh-Din” and “stuh-Dout”).

The shell is a program like any other, and like any other, it runs inside a process. Under normal circumstances its standard input is connected to our keyboard and its standard output to our screen, so it reads what we type and displays its output for us to see (Figure 2.6a). When we tell the shell to run a program it creates a new process and temporarily reconnects the keyboard and stream to that process’s standard input and output (Figure 2.6b).

Standard I/O

Figure 2.6: Standard I/O

If we provide one or more files for the command to read, as with sort lengths.txt, the program reads data from those files. If we don’t provide any filenames, though, the Unix convention is for the program to read from standard input. We can test this by running sort on its own, typing in a few lines of text, and then pressing Ctrl+D to signal the end of input . sort will then sort and print whatever we typed:

$ sort
one
two
three
four
^D
four
one
three
two

Redirection with > tells the shell to connect the program’s standard output to a file instead of the screen (Figure 2.6c).

When we create a pipe like wc *.txt | sort, the shell creates one process for each command so that wc and sort will run simultaneously, and then connects the standard output of wc directly to the standard input of sort (Figure 2.6d).

wc doesn’t know whether its output is going to the screen, another program, or to a file via >. Equally, sort doesn’t know if its input is coming from the keyboard or another process; it just knows that it has to read, sort, and print.

Why Isn’t It Doing Anything?

What happens if a command is supposed to process a file but we don’t give it a filename? For example, what if we type:

$ wc -l

but don’t type *.txt (or anything else) after the command? Since wc doesn’t have any filenames, it assumes it is supposed to read from the keyboard, so it waits for us to type in some data. It doesn’t tell us this: it just sits and waits.

This mistake can be hard to spot, particularly if we put the filename at the end of the pipeline:

$ wc -l | sort moby_dick.txt

In this case, sort ignores standard input and reads the data in the file, but wc still just sits there waiting for input.

If we make this mistake, we can end the program by typing Ctrl+C. We can also use this to interrupt programs that are taking a long time to run or are trying to connect to a website that isn’t responding.

Just as we can redirect standard output with >, we can connect standard input to a file using <. In the case of a single file, this has the same effect as providing the file’s name to the command:

$ wc < moby_dick.txt
    22331  215832 1276222

If we try to use redirection with a wildcard, though, the shell doesn’t concatenate all of the matching files:

$ wc < *.txt
-bash: *.txt: ambiguous redirect

It also doesn’t print the error message to standard output, which we can prove by redirecting:

$ wc < *.txt > all.txt
-bash: *.txt: ambiguous redirect
$ cat all.txt
cat: all.txt: No such file or directory

Instead, every process has a second output channel called standard error (or stderr). Programs use it for error messages so that their attempts to tell us something has gone wrong don’t vanish silently into an output file. There are ways to redirect standard error, but doing so is almost always a bad idea.

2.11 Repeating Commands on Many Files

A loop is a way to repeat a set of commands for each item in a list. We can use them to build complex workflows out of simple pieces, and (like wildcards) they reduce the typing we have to do and the number of mistakes we might make.

Let’s suppose that we want to take a section out of each book whose name starts with the letter “s” in the data directory. More specifically, suppose we want to get the first 8 lines of each book after the 9 lines of license information that appear at the start of the file. If we only cared about one file, we could write a pipeline to take the first 17 lines and then take the last 8 of those:

$ head -n 17 sense_and_sensibility.txt | tail -n 8
Title: Sense and Sensibility

Author: Jane Austen
Editor:
Release Date: May 25, 2008 [EBook #161]
Posting Date:
Last updated: February 11, 2015
Language: English

If we try to use a wildcard to select files, we only get 8 lines of output, not the 16 we expect:

$  head -n 17 s*.txt | tail -n 8
Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle
Editor:
Release Date: April 18, 2011 [EBook #1661]
Posting Date: November 29, 2002
Latest Update:
Language: English

The problem is that head is producing a single stream of output containing 17 lines for each file (along with a header telling us the file’s name):

$ head -n 17 s*.txt
==> sense_and_sensibility.txt <==
The Project Gutenberg EBook of Sense and Sensibility, by Jane Austen

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net



Title: Sense and Sensibility

Author: Jane Austen
Editor:
Release Date: May 25, 2008 [EBook #161]
Posting Date:
Last updated: February 11, 2015
Language: English

==> sherlock_holmes.txt <==
Project Gutenberg's The Adventures of Sherlock Holmes, by Arthur Conan Doyle

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net



Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle
Editor:
Release Date: April 18, 2011 [EBook #1661]
Posting Date: November 29, 2002
Latest Update:
Language: English

Let’s try this instead:

$ for filename in sense_and_sensibility.txt sherlock_holmes.txt
> do
>   head -n 17 $filename | tail -n 8
> done
Title: Sense and Sensibility

Author: Jane Austen
Editor:
Release Date: May 25, 2008 [EBook #161]
Posting Date:
Last updated: February 11, 2015
Language: English
Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle
Editor:
Release Date: April 18, 2011 [EBook #1661]
Posting Date: November 29, 2002
Latest Update:
Language: English

As the output shows, the loop runs our pipeline once for each file. There is a lot going on here, so we will break it down into pieces:

  1. The keywords for, in, do, and done create the loop, and must always appear in that order.

  2. filename is a variable just like a variable in R or Python. At any moment it contains a value, but that value can change over time.

  3. The loop runs once for each item in the list. Each time it runs, it assigns the next item to the variable. In this case filename will be sense_and_sensibility.txt the first time around the loop and sherlock_holmes.txt the second time.

  4. The commands that the loop executes are called the body of the loop and appear between the keywords do and done. Those commands use the current value of the variable filename, but to get it, we must put a dollar sign $ in front of the variable’s name. If we forget and use filename instead of $filename, the shell will think that we are referring to a file that is actually called filename.

  5. The shell prompt changes from $ to a continuation prompt > as we type in our loop to remind us that we haven’t finished typing a complete command yet. We don’t type the >, just as we don’t type the $. The continuation prompt > has nothing to do with redirection; it’s used because there are only so many punctuation symbols available.

It is very common to use a wildcard to select a set of files and then loop over that set to run commands:

$ for filename in s*.txt
> do
>   head -n 17 $filename | tail -n 8
> done
Title: Sense and Sensibility

Author: Jane Austen
Editor:
Release Date: May 25, 2008 [EBook #161]
Posting Date:
Last updated: February 11, 2015
Language: English



Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle
Editor:
Release Date: April 18, 2011 [EBook #1661]
Posting Date: November 29, 2002
Latest Update:
Language: English

2.12 Variable Names

We should always choose meaningful names for variables, but we should remember that those names don’t mean anything to the computer. For example, we have called our loop variable filename to make its purpose clear to human readers, but we could equally well write our loop as:

$ for x in s*.txt
> do
>   head -n 17 $x | tail -n 8
> done

or as:

$ for username in s*.txt
> do
>   head -n 17 $username | tail -n 8
> done

Don’t do this. Programs are only useful if people can understand them, so meaningless names like x and misleading names like username increase the odds of misunderstanding.

2.13 Redoing Things

Loops are useful if we know in advance what we want to repeat, but if we have already run commands, we can still repeat. One way is to use and to go up and down in our command history as described earlier. Another is to use history to get a list of the last few hundred commands we have run:

$ history
  551  wc -l *.txt | sort -n
  552  wc -l *.txt | sort -n | head -n 3
  553  wc -l *.txt | sort -n | head -n 1 > shortest.txt

We can use an exclamation mark ! followed by a number to repeat a recent command:

$ !552
wc -l *.txt | sort -n | head -n 3
   3582 time_machine.txt
   7832 frankenstein.txt
  13028 sense_and_sensibility.txt

The shell prints the command it is going to re-run to standard error before executing it, so that (for example) !572 > results.txt puts the command’s output in a file without also writing the command to the file.

Having an accurate record of the things we have done and a simple way to repeat them are two of the main reasons people use the Unix shell. In fact, being able to repeat history is such a powerful idea that the shell gives us several ways to do it:

  • !head re-runs the most recent command starting with head, while !wc re-runs the most recent starting with wc.
  • If we type Ctrl+R (for reverse search) the shell searches backward through its history for whatever we type next. If we don’t like the first thing it finds, we can type Ctrl+R again to go further back.

If we use history, , or Ctrl+R we will quickly notice that loops don’t have to be broken across lines. Instead, their parts can be separated with semi-colons:

$ for filename in s*.txt ; do head -n 17 $filename | tail -n 8; done

This is fairly readable, although even experienced users have a tendency to put the semi-colon after do instead of before it. If our loop contains multiple commands, though, the multi-line format is much easier to read. For example, compare this:

$ for filename in s*.txt
> do
>   echo $filename
>   head -n 17 $filename | tail -n 8
> done

with this:

$ for filename in s*.txt; do echo $filename; head -n 17 $filename | tail -n 8; done

(The echo command simply prints its arguments to the screen. It is often used to keep track of progress or for debugging.)

2.14 Creating New Filenames Automatically

Suppose we want to create a backup copy of each book whose name ends in “e”. If we don’t want to change the files’ names, we can do this with cp:

$ cd ~/zipf
$ mkdir backup
$ cp data/*e.txt backup
$ ls backup
jane_eyre.txt  time_machine.txt

Warnings

If you attempt to re-execute the code chunk above, you’ll end up with an error after the second line:

mkdir: backup: File exists

This warning isn’t necessarily a cause for alarm. It lets you know that the command couldn’t be completed, but will not prevent you from proceeding.

But what if we want to append the extension .bak to the files’ names? cp can do this for a single file:

$ cp data/time_machine.txt backup/time_machine.txt.bak

but not for all the files at once:

$ cp data/*e.txt backup/*e.txt.bak
cp: target 'backup/*e.txt.bak' is not a directory

backup/*e.txt.bak doesn’t match anything—those files don’t yet exist—so after the shell expands the * wildcards, what we are actually asking cp to do is:

$ cp data/jane_eyre.txt data/time_machine.txt backup/*e.bak

This doesn’t work because cp only understands how to do two things: copy a single file to create another file, or copy a bunch of files into a directory. If we give it more than two names as arguments, it expects the last one to be a directory. Since backup/*e.bak is not, cp reports an error.

Instead, let’s use a loop to copy files to the backup directory and append the .bak suffix:

$ cd data
$ for filename in *e.txt
> do
>   cp $filename ../backup/$filename.bak
> done
$ ls ../backup
jane_eyre.txt.bak  time_machine.txt.bak

2.15 Summary

The original Unix shell was created in 1971, and will soon celebrate its fiftieth anniversary. Its commands may be cryptic, but few programs have remained in daily use for so long. The secret to its success is the way it combines a few powerful ideas: command history, wildcards, redirection, and above all pipes. The next chapter will explore how we can go beyond these basics.

2.16 Exercises

The exercises below involve creating and moving new files, as well as considering hypothetical files. Please note that if you create or move any files or directories in your Zipf’s Law project, you may want to reorganize your files following the outline at the beginning of the next chapter. If you accidentally delete necessary files, you can start with a fresh copy of the data files by following the instructions in Appendix E.

2.16.1 Exploring more ls flags

What does the command ls do when used with the -l option?

What happens if you use two options at the same time, such as ls -l -h?

2.16.2 Listing recursively and by time

The command ls -R lists the contents of directories recursively, which means the subdirectories, sub-subdirectories, and so on at each level are listed. The command ls -t lists things by time of last change, with most recently changed files or directories first.

In what order does ls -R -t display things? Hint: ls -l uses a long listing format to view timestamps.

2.16.3 Absolute and relative paths

Starting from a hypothetical directory located at /Users/amira/data, which of the following commands could Amanda use to navigate to her home directory, which is /Users/amira?

  1. cd .
  2. cd /
  3. cd /home/amira
  4. cd ../..
  5. cd ~
  6. cd home
  7. cd ~/data/..
  8. cd
  9. cd ..
  10. cd ../.

2.16.4 Relative path resolution

Using the filesystem shown in Figure 2.7, if pwd displays /Users/sami, what will ls -F ../backup display?

  1. ../backup: No such file or directory
  2. final original revised
  3. final/ original/ revised/
  4. data/ analysis/ doc/
Exercise Filesystem

Figure 2.7: Exercise Filesystem

2.16.5 ls reading comprehension

Using the filesystem shown in Figure 2.7, if pwd displays /Users/backup, and -r tells ls to display things in reverse order, what command(s) will result in the following output:

doc/ data/ analysis/
  1. ls pwd
  2. ls -r -F
  3. ls -r -F /Users/backup

2.16.6 Creating files a different way

What happens when you execute touch my_file.txt? (Hint: use ls -l to find information about the file)

When might you want to create a file this way?

2.16.7 Using rm safely

What would happen if you executed rm -i my_file.txt on a hypothetical file? Why would we want this protection when using rm?

2.16.8 Moving to the current folder

After running the following commands, Amira realizes that she put the (hypothetical) files chapter1.dat and chapter2.dat into the wrong folder:

$ ls -F
  processed/ raw/
$ ls -F processed
  chapter1.dat chapter2.dat appendix1.dat appendix2.dat
$ cd raw/

Fill in the blanks to move these files to the current folder (i.e., the one she is currently in):

$ mv ___/chapter1.dat  ___/chapter2.dat ___

2.16.9 Renaming files

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt

After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

  1. cp statstics.txt statistics.txt
  2. mv statstics.txt statistics.txt
  3. mv statstics.txt .
  4. cp statstics.txt .

2.16.10 Moving and copying

Assuming the following hypothetical files, what is the output of the closing ls command in the sequence shown below?

$ pwd
/Users/amira/data
$ ls
books.dat
$ mkdir doc
$ mv books.dat doc/
$ cp doc/books.dat ../books-saved.dat
$ ls
  1. books-saved.dat doc
  2. doc
  3. books.dat doc
  4. books-saved.dat

2.16.11 Copy with multiple filenames

This exercises explores how cp responds when attempting to copy multiple things.

What does cp do when given several filenames followed by a directory name?

$ mkdir backup
$ cp dracula.txt frankenstein.txt backup/

What does cp do when given three or more file names?

$ cp dracula.txt frankenstein.txt jane_eyre.txt

2.16.12 List filenames matching a pattern

When run in the data directory, which ls command(s) will produce this output?

jane_eyre.txt sense_and_sensibility.txt

  1. ls ??n*.txt
  2. ls *e_*.txt
  3. ls *n*.txt
  4. ls *n?e*.txt

2.16.13 Organizing directories and files

Amira is working on a project and she sees that her files aren’t very well organized:

$ ls -F
books.txt    data/    results/   titles.txt

The books.txt and titles.txt files contain output from her data analysis. What command(s) does she need to run to produce the output shown?

$ ls -F
data/   results/
$ ls results
books.txt    titles.txt

2.16.14 Reproduce a directory structure

You’re starting a new analysis, and would like to duplicate the directory structure from your previous experiment so you can add new data.

Assume that the previous experiment is in a folder called ‘2016-05-18’, which contains a data folder that in turn contains folders named raw and processed that contain data files. The goal is to copy the folder structure of the 2016-05-18-data folder into a folder called 2016-05-20 so that your final directory structure looks like this:

2016-05-20/
└── data
    ├── processed
    └── raw

Which of the following set of commands would achieve this objective?

What would the other commands do?

$ mkdir 2016-05-20
$ mkdir 2016-05-20/data
$ mkdir 2016-05-20/data/processed
$ mkdir 2016-05-20/data/raw
$ mkdir 2016-05-20
$ cd 2016-05-20
$ mkdir data
$ cd data
$ mkdir raw processed
$ mkdir 2016-05-20/data/raw
$ mkdir 2016-05-20/data/processed
$ mkdir 2016-05-20
$ cd 2016-05-20
$ mkdir data
$ mkdir raw processed

2.16.15 What does >> mean?

We have seen the use of >, but there is a similar operator >> which works slightly differently. We’ll learn about the differences between these two operators by printing some strings. We can use the echo command to print strings e.g.

$ echo The echo command prints text
The echo command prints text

Now test the commands below to reveal the difference between the two operators:

$ echo hello > testfile01.txt

and:

$ echo hello >> testfile02.txt

Hint: Try executing each command twice in a row and then examining the output files.

2.16.16 Appending data

Given the following commands, what will be included in the file extracted.txt:

$ head -n 3 dracula.txt > extracted.txt
$ tail -n 2 dracula.txt >> extracted.txt
  1. The first three lines of dracula.txt
  2. The last two lines of dracula.txt
  3. The first three lines and the last two lines of dracula.txt
  4. The second and third lines of dracula.txt

2.16.17 Piping commands

In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?

  1. wc -l * > sort -n > head -n 3
  2. wc -l * | sort -n | head -n 1-3
  3. wc -l * | head -n 3 | sort -n
  4. wc -l * | sort -n | head -n 3

2.16.18 Why does uniq only remove adjacent duplicates?

The command uniq removes adjacent duplicated lines from its input. Consider a hypothetical file genres.txt containing the following data:

science fiction
fantasy
science fiction
fantasy
science fiction
science fiction

Running the command uniq genres.txt produces:

science fiction
fantasy
science fiction
fantasy
science fiction

Why do you think uniq only removes adjacent duplicated lines? (Hint: think about very large datasets.) What other command could you combine with it in a pipe to remove all duplicated lines?

2.16.19 Pipe reading comprehension

A file called titles.txt contains the following data:

Sense and Sensibility,1811
Frankenstein,1818
Jane Eyre,1847
Wuthering Heights,1847
Moby Dick,1851
The Adventures of Sherlock Holmes,1892
The Time Machine,1895
Dracula,1897
The Invisible Man,1897

What text passes through each of the pipes and the final redirect in the pipeline below?

$ cat titles.txt | head -n 5 | tail -n 3 | sort -r > final.txt

Hint: build the pipeline up one command at a time to test your understanding

2.16.20 Pipe construction

For the file titles.txt from the previous exercise, consider the following command:

$ cut -d , -f 2 titles.txt

What does the cut command (and its options) accomplish?

2.16.21 Which pipe?

Consider the same titles.txt from the previous exercises.

The uniq command has a -c option which gives a count of the number of times a line occurs in its input. If titles.txt was in your working directory, what command would you use to produce a table that shows the total count of each publication year in the file?

  1. sort titles.txt | uniq -c
  2. sort -t, -k2,2 titles.txt | uniq -c
  3. cut -d, -f 2 titles.txt | uniq -c
  4. cut -d, -f 2 titles.txt | sort | uniq -c
  5. cut -d, -f 2 titles.txt | sort | uniq -c | wc -l

2.16.22 Wildcard expressions

Wildcard expressions can be very complex, but you can sometimes write them in ways that only use simple syntax, at the expense of being a bit more verbose. In your data/ directory, the wildcard expression [st]*.txt matches all files beginning with s or t and ending with .txt. Imagine you forgot about this.

  1. Can you match the same set of files with basic wildcard expressions that do not use the [] syntax? Hint: You may need more than one expression.

  2. The expression that you found and the expression from the lesson match the same set of files in this example. What is the small difference between the outputs?

  3. Under what circumstances would your new expression produce an error message where the original one would not?

2.16.23 Removing unneeded files

Suppose you want to delete your processed data files, and only keep your raw files and processing script to save storage. The raw files end in .txt and the processed files end in .csv. Which of the following would remove all the processed data files, and only the processed data files?

  1. rm ?.csv
  2. rm *.csv
  3. rm * .csv
  4. rm *.*

2.16.24 Doing a dry run

A loop is a way to do many things at once—or to make many mistakes at once if it does the wrong thing. One way to check what a loop would do is to echo the commands it would run instead of actually running them.

Suppose we want to preview the commands the following loop will execute without actually running those commands (analyze is a hypothetical command):

$ for file in *.txt
> do
>   analyze $file > analyzed-$file
> done

What is the difference between the two loops below, and which one would we want to run?

# Version 1
$ for file in *.txt
> do
>   echo analyze $file > analyzed-$file
> done
# Version 2
$ for file in *.txt
> do
>   echo "analyze $file > analyzed-$file"
> done

2.16.25 Variables in loops

Given the files in data/, what is the output of the following code?

$ for datafile in *.txt
> do
>    ls *.txt
> done

Now, what is the output of the following code?

$ for datafile in *.txt
> do
>   ls $datafile
> done

Why do these two loops give different outputs?

2.16.26 Limiting sets of files

What would be the output of running the following loop in your data/ directory?

$ for filename in d*
> do
>    ls $filename
> done

How would the output differ from using this command instead?

$ for filename in *d*
> do
>    ls $filename
> done

2.16.27 Saving to a file in a loop

Consider running the following loop in the data/ directory:

for book in *.txt
> do
>     echo $book
>     head -n 16 $book > headers.txt
> done

Why would the following loop be preferable?

for book in *.txt
> do
>     head -n 16 $book >> headers.txt
> done

2.16.28 Why does history record commands before running them?

If you run the command:

$ history | tail -n 5 > recent.sh

the last command in the file is the history command itself, i.e., the shell has added history to the command log before actually running it. In fact, the shell always adds commands to the log before running them. Why do you think it does this?

2.16.29 Other wildcards

The shell provides several wildcards beyond the widely-used *. To explore them, explain in plain language what files the expression novel-????-[ab]*.{txt,pdf} matches and why.

2.17 Key Points

  • A shell is a program that reads commands and runs other programs.
  • The filesystem manages information stored on disk.
  • Information is stored in files, which are located in directories (folders).
  • Directories can also store other directories, which forms a directory tree.
  • pwd prints the user’s current working directory.
  • / on its own is the root directory of the whole filesystem.
  • ls prints a list of files and directories.
  • An absolute path specifies a location from the root of the filesystem.
  • A relative path specifies a location in the filesystem starting from the current directory.
  • cd changes the current working directory.
  • .. means the parent directory; . on its own means the current directory.
  • mkdir creates a new directory.
  • cp copies a file.
  • rm removes (deletes) a file.
  • mv moves (renames) a file or directory.
  • * matches zero or more characters in a filename.
  • ? matches any single character in a filename.
  • wc counts lines, words, and characters in its inputs.
  • man displays the manual page for a given command; some commands also have a --help option.
  • Every process in Unix has an input channel called standard input and an output channel called standard output.
  • > redirects a command’s output to a file, overwriting any existing content.
  • >> appends a command’s output to a file.
  • < operator redirects input to a command
  • A pipe | sends the output of the command on the left to the input of the command on the right.
  • cat displays the contents of its inputs.
  • head displays the first few lines of its input.
  • tail displays the last few lines of its input.
  • sort sorts its inputs.
  • A for loop repeats commands once for every thing in a list.
  • Every for loop must have a variable to refer to the thing it is currently operating on and a body containing commands to execute.
  • Use $name or ${name} to get the value of a variable.
  • Use the up-arrow key to scroll up through previous commands to edit and repeat them.
  • Use `history` to display recent commands and !number to repeat a command by number.