Introducing the Shell
Last updated on 2024-12-12 | Edit this page
Estimated time: 5 minutes
Overview
Questions
- What is a command shell and why would I use one?
Objectives
- Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
- Explain when and why command-line interfaces should be used instead of graphical interfaces.
HPCBio Notes
If you’re signed up for an HPCBio workshop, this episode/chapter is required reading. You may watch a recording of this episode by viewing the video named “Episode 1: Introducing the Shell” on our video playlist. Please see your email for a URL to the playlist.
Background
Humans and computers commonly interact in many different ways, such as through a keyboard and mouse, touch screen interfaces, or using speech recognition systems. The most widely used way to interact with personal computers is called a graphical user interface (GUI). With a GUI, we give instructions by clicking a mouse and using menu-driven interactions.
While the visual aid of a GUI makes it intuitive to learn, this way of delivering instructions to a computer scales very poorly. Imagine the following task: for a literature search, you have to copy the third line of one thousand text files in one thousand different directories and paste it into a single file. Using a GUI, you would not only be clicking at your desk for several hours, but you could potentially also commit an error in the process of completing this repetitive task. This is where we take advantage of the Unix shell. The Unix shell is both a command-line interface (CLI) and a scripting language, allowing such repetitive tasks to be done automatically and fast. With the proper commands, the shell can repeat tasks with or without some modification as many times as we want. Using the shell, the task in the literature example can be accomplished in seconds.
The Shell
The shell is a program where users can type commands. With the shell, it’s possible to invoke complicated programs like climate modeling software or simple commands that create an empty directory with only one line of code. The most popular Unix shell is Bash (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows. Note that ‘Git Bash’ is a piece of software that enables Windows users to use a Bash like interface when interacting with Git.
Using the shell will take some effort and some time to learn. While a GUI presents you with choices to select, CLI choices are not automatically presented to you, so you must learn a few commands like new vocabulary in a language you’re studying. However, unlike a spoken language, a small number of “words” (i.e. commands) gets you a long way, and we’ll cover those essential few today.
The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows.
In addition, the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is near essential to run a variety of specialized tools and resources including high-performance computing systems. As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with the shell is becoming a necessary skill. We can build on the command-line skills covered here to tackle a wide range of scientific questions and computational challenges.
Let’s get started.
When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input.
The shell typically uses $
as the prompt, but may use a
different symbol. In the examples for this lesson, we’ll show the prompt
as $
. Most importantly, do not type the prompt
when typing commands. Only type the command that follows the prompt.
This rule applies both in these lessons and in lessons from other
sources. Also note that after you type a command, you have to press the
Enter key to execute it.
The prompt is followed by a text cursor, a character that indicates the position where your typing will appear. The cursor is usually a flashing or solid block, but it can also be an underscore or a pipe. You may have seen it in a text editor program, for example.
Note that your prompt might look a little different. In particular,
most popular shell environments by default put your user name and the
host name before the $
. Such a prompt might look like,
e.g.:
The prompt might even include more than this. Do not worry if your
prompt is not just a short $
. This lesson does not depend
on this additional information and it should also not get in your way.
The only important item to focus on is the $
character
itself and we will see later why.
So let’s try our first command, ls
, which is short for
listing. This command will list the contents of the current
directory:
OUTPUT
Desktop Downloads Movies Pictures
Documents Library Music Public
Nelle’s Pipeline: A Typical Problem
Nelle Nemo, a marine biologist, has just returned from a six-month
survey of the North Pacific
Gyre, where she has been sampling gelatinous marine life in the Great
Pacific Garbage Patch. She has 1,520 samples that she’s run through
an assay machine to measure the relative abundance of 300 proteins. She
needs to run these 1,520 files through an imaginary program called
goostats.sh
. In addition to this huge task, she has to
write up results by the end of the month, so her paper can appear in a
special issue of Aquatic Goo Letters.
If Nelle chooses to run the steps found in goostats.sh
by hand using a GUI, she’ll have to select options and open a file 1,520
times. If goostats.sh
takes 30 seconds to run each file,
the whole process will take more than 12 hours of Nelle’s attention.
Additionally, it will be nearly impossible for her to avoid mistakes
running these steps 1,520 times. With the shell, Nelle can instead
assign her computer this mundane task, which it will execute perfectly,
while she focuses her attention on writing her paper.
The next few lessons will explore the ways Nelle can achieve this.
More specifically, the lessons explain how she can use a command shell
to run the goostats.sh
program, using loops to automate the
repetitive steps of entering file names, so that her computer can work
while she writes her paper.
As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.
In order to achieve her task, Nelle needs to know how to:
- navigate to a file/directory
- create a file/directory
- check the length of a file
- chain commands together
- retrieve a set of files
- iterate over files
- run a shell script containing her pipeline
Key Points
- A shell is a program whose primary purpose is to read commands and run other programs.
- This lesson uses Bash, the default shell in many implementations of Unix.
- Programs can be run in Bash by entering commands at the command-line prompt.
- The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
- A significant challenge when using the shell can be knowing what commands need to be run and how to run them.