What is the UNIX Shell?
The UNIX shell is a command-line interface, similar in some ways to the old
DOS prompt on the PC platform. It is much more powerful, customizable, and
easy to use however.
In addition to providing a command-line interface to the operating system,
the shell also provides a programming language. Shell programs (also commonly
called shell scripts) are collections of commands listed one after another
inside of a text file. Creating shell scripts provides a simple way to
automate repetitive tasks or generate various reports, among a myriad of other
possibilities.
For simple tasks like program and file management, there is often a
choice between using the GUI or the shell to accomplish the same thing. In
general, becoming familiar with the shell will make you more productive and
efficient. Relying on the GUI is slower, but may require less effort for
novices. But even in the latter case it will at some point be necessary to use
the shell for something that is not readily available in the GUI. So it is a
necessary and worthwhile investment of time to become familiar with the
shell's basic operation.
Manual Pages
Nearly every program in UNIX, including graphical ones, has an online manual
page ("man page" for short) associated with it. The man page for a given
program is a help file that describes in detail how to use the program, what
arguments it can accept, what files it expects, what other commands are related
to it, and anything else you need to know. There is
an extremely useful program called man which provides access to these
manual pages via the command line. In addition, on some systems, the graphical
program xman provides a rudimentary graphical interface to the manual pages.
The importance of the man program cannot be stressed enough. UNIX
is so flexible and vast that not even the experts can keep all the details of
every command in their head. For this reason, we have man.
To use man, you type "man", followed by the name of the command whose
manual page you wish to view. For example:
% man ls
will give you great detail about the ls command, its options, and
its usage.
man usually uses the program more (or its counterpart,
less) to
view the contents of each man page. To page forward through a manual page,
you can use the spacebar. To page backward, you can usually use <ctrl>-b.
To quit back to the shell prompt, press "q". For more information, you can
visit the manpages for man and more:
% man man
% man more
Input/Output Redirection
Most command-line UNIX programs can read from something called
standard input for their input and write to something called
standard output for their output. Standard input (STDIN for short) is
normally connected to the keyboard - that's how you normally provide input to
programs via the shell. Standard output (STDOUT for short) is normally
connected to the terminal window - the terminal is normally where the output
will appear from commands you issued via the shell.
However, the shell also allows you to redirect STDIN and STDOUT with the
special characters >, >>, <, and |. For example, STDOUT in our above
example can be redirected to a file instead of the terminal window by typing:
% ls -l > list.txt
In this example, the output of ls -l is redirected into a file
called list.txt instead of being printed to your terminal. If list.txt
does not exist, it is created. If it does exist, its contents are replaced with
the contents of STDOUT. Note that > is a special character to the
shell. As such, it and subsequent characters on the command line are not
interpreted as arguments to the ls command.
A variant of the above is:
% ls -l >> list.txt
>> works just like > except that if list.txt already
exists, STDOUT is appended to the bottom of the file rather than replacing
its contents.
STDIN can be taken from a file instead of the keyboard in an analagous way:
% sort < list.txt
In this example, the sort program reads the file list.txt on STDIN and
sorts it into alphabetical order on STDOUT, which will be output to our
terminal. If we instead type:
% sort < list.txt > sorted_list.txt
then list.txt will be passed to sort on STDIN, and the sorted
result will be placed in the file sorted_list.txt on STDOUT.
Finally, we introduce the | (pipe) character, which is used to make
so-called pipelines. The | character takes STDOUT of the
command that appears on the left, and passes it to STDIN of the command that
appears on the right. For example:
% ls -l | sort | lpr
takes STDOUT of ls -l, sends it to STDIN of sort, and finally
sends STDOUT of sort to STDIN of lpr which prints out a hardcopy
of the result via the default printer.
This is the equivalent of:
% ls -l > list.txt
% sort < list.txt > sorted_list.txt
% lpr sorted_list.txt
except that no intermediate files are used to pass the output of the
preceding commands to the input of the succeeding commands.
Filenames and Naming Files
There are very few restrictions placed on naming files on most flavors of UNIX.
You can put just about any character you want in a filename - including
spaces and other special characters like %, $, ^, &, !, etc. This is normally
considered to be bad form however, because as mentioned above some of these
characters are interpreted by the shell to have special meaning. This can be
worked around by preceding these characters with the \ character, which
prevents whatever comes next from being interpreted as a special character
by the shell. For example, say I had created a file named "is 1>2?" and I
wanted to remove it with the rm command:
% rm is 1>2?
The above command has several problems. First of all, the shell
uses spaces to separate arguments from each other, so rm will not
interpret this string of characters as a single filename. Second, as we've
already seen, the shell
interprets the > character as a redirection of STDOUT, not part of the
filename. Finally, the ? character has special meaning to the shell as a
wildcard character (see about wildcards below).
To actually remove this file (or use it as an argument to any UNIX command),
you would need to escape the space and the special characters with the \
character like so:
% rm is\ 1\>2\?
or put the whole filename in quotes:
% rm "is 1>2?"
As you can see, this can be rather confusing to look at or at least tedious
to remember. To avoid these headaches, it is recommended that you keep spaces
and other non-alphanumeric characters except for - (dash), _ (underscore), and .
(dot) out of your filenames. UNIX will let you use the other characters, but
the question should be "Do I really want to?"
Finally, a word about files beginning with the "." (period, or dot)
character: In UNIX, there is nothing special about "." in a filename per se.
But it is convention to use the "." character to define file extensions which
give the user a hint about what type of file we are dealing with (such as .txt
for text files, .jpg or .jpeg for jpeg image files, etc.). An additional
convention in UNIX is that files named with a "." in the first position, such
as ".cshrc", are configuration files. For instance, the csh and tcsh shells
use the file named .cshrc in your home directory to configure your command-line
environment. As such, these so-called "dot-files" are not displayed by default
when using the ls command to list files. In addition, there are two
special files in every directory called "." and "..", which point to the current
directory and parent directory (the directory above the current one),
respectively. To see files and directories that start with the "." character,
use the "-a" argument to ls.
To be more specific about the identity of the characters that you wish to match,
you can use the [...] construct. For example:
Would specifically match the four filenames "0.doc", "1.doc", "2.doc", and
"3.doc". This construct can also accept a range of characters (specified by
a "-") instead of requiring you to list every member. For example,
would be equivalent to the previous example which used "[0123]". You can
also concatenate ranges by inserting a comma between them like so:
The above example would pick up all files "a.doc" through "j.doc", "s.doc",
and "w.doc" through "z.doc".
would match any filename that starts with "G", has anything in the second
position, has anything except 2, 3, or 4 in the third position, has
"foo" anyplace between the fourth and last character before finally ending
in ".doc".
As you can see, these constructs can be combined in clever ways to select
very specific lists of filenames in short-hand. This activity is also reffered
to as globbing or file globbing. This is very powerful and useful
when dealing with large numbers of files. Learning to use wild cards
effectively can save you a lot of time compared to using a GUI file
manager or worse, typing them all by hand!
Variables
Variables are words to which values (either string or numeric) can be assigned
for later use. The shell provides two types of variables, shell
variables and environment variables. Variables are used in a variety
of ways. Some programs (like the shell itself, for example) can be configured
by setting particular variables to user-specified values. Many of these
variables are set to default values for you, but can be customized later to
suit your needs. A program's man page usually indicates which variables (if any)
can be manipulated to affect its operation.
In tcsh, some key differences between shell and environment variables are:
- Shell variables are valid only in the shell that created them.
- Any environment variables that exist in a shell are inhereted by child
processes of that shell at the time the child process is created. Subsequently
created or modified variables are not inherited by existing children, however.
- Shell variables must start with a letter, but can contain numbers,
letters and underscores.
- Environment variables can start with numbers or letters and can contain
letters, numbers, and special characters like -, _, %, etc.
In tcsh, shell variables are created with the set command, and
destroyed with the unset command:
% set path=(/bin /usr/bin /usr/local/bin)
% set a=100
% set docdir=/home/jake/docs
% unset a
Environment variables are created with the setenv command, and
destroyed with the unsetenv command:
% setenv LM_LICENSE_FILE /usr/local/flexlm/license.dat
% setenv B 100
% setenv EDITOR /bin/vi
% unsetenv B
Note that when using the set command to create shell variables there is an
= sign between the variable and its value, but when using setenv to
create environment variables there is only a blank space. Also, it is
convention (but not enforced) that environment variables are named in
all-caps, while shell variables are named in lowercase.
To use a variable in a command, precede it with the $ character.
Assuming the docdir variable contained "/home/jake/docs", the command:
% ls -l $docdir
would be the equivalent of:
% ls -l /home/jake/docs
To view all current shell and environment variables (and their values), use
the set and printenv commands alone, respectively:
% set
% printenv
To view the value of a specific environment variable, use its name alone as
the argument to printenv:
% printenv LM_LICENSE_FILE
There is no equivalent of the above for shell variables using the set
command, but you can use echo for this:
% echo $docdir
You can invent your own variable names to use in your own commands and
scripts. Here is a partial list of variables that are reserved for
the shell, the system as a whole, or common applications. Many of these are
set for you by default and are important for the normal operation of your
shell and/or the system.
- $path or $PATH - A space-separated list of directories in which the shell will look for programs that you type from the command line.
- $MANPATH - A colon-separated list of directories in which the
man program will look for manual pages.
- $HOME - Your home directory.
- $SHELL - Your default shell
- $TERM - The type of terminal you are using (this is important for
certain keyboard definitions and other terminal display characteristics).
- $EDITOR - Your default text editor (see the section on text editors
below).
- $DISPLAY - The display to which the X Window System should
attempt to draw.
- $LD_LIBRARY_PATH - A colon-separated list of directories in
which the loader/linker will look for dynamic libraries (in addition to the
system-default places it looks).
- $LM_LICENSE_PATH - The place where flexlm will look for a license
file. Flexlm is license management software used by some commercial software
vendors as an authorization mechanism for their applications.
Filename Completion
Using a simple command-line interface can be tedious and repetitive. Fortunately,
modern UNIX shells have many features which make using the command line very
efficient. In fact, for many tasks it is much more efficient than using a
GUI with a mouse. Three features which make using the shell efficient are called
filename completion, history, and aliases.
Filename completion is a mechanism by which you can type just part of a
command or filename and have the shell figure out what you mean and complete
it for you automatically. Learning this technique can save you many hundreds
or even thousands of keystrokes per day.
In tcsh (and bash as well), when you are typing the first word of a command,
you can press the <tab> key when you are part way through and the shell
will attempt to finish typing the word for you by looking for commands
in your $PATH variable that match the text you've typed so far.
If there is only one possiblity, the shell will finish typing the word for you.
If there are multiple possibilities, it will complete
as much of the word as it can and then wait for you to type more.
Similarly, when you are typing arguments to a command, tcsh (and bash)
tries to interpret the argument as a filename when you press <tab>, filling
in as much of the filename as it can based on what you've typed.
For example, let us
assume that there is a jpeg image called 1.jpg in the current directory that we
wish to convert to the RGB format with the convert program. Let us also assume that
it is the only file in this directory that starts with the character "1", and that
the convert command is the only program in my PATH that starts with the
characters "conv":
% conv<tab> 1<tab> 1.rgb
Would produce:
% convert 1.jpg 1.rgb
In cases where there are multiple possibilites (for example, if the
convert program was not the only thing in my PATH that started with
"conv", you can press
<ctrl>-d and tcsh will print the list of possibilites, then put you
back where you were in the command that you are typing. Example:
% conv<ctrl>-d
convert convolve
% conv
I typed conv followed by <ctrl>-d and all the commands in
my PATH that started with "conv" appeared. Then I was returned to the command
line as if I had not pressed <ctrl>-d.
History
As mentioned above, history is another mechanism that's designed to make
using the UNIX command-line interface more efficient. History remembers
the last commands you've typed so that you can reuse them (or portions of them)
in subsequent commands without the tedium of retyping. In tcsh, the
$history variable defines how many commands history will remember:
% set history=50
will tell history to remember up to the last 50 commands you've typed in
this shell (normally, each running shell keeps track of its own history).
Typing "history" by itself at the command line will report a numbered
list of commands that you've typed along with the time each one was issued:
% history
1 15:04 cd /home/jake/notes/121103
2 15:04 ls
3 15:05 rm junk
4 15:10 cd /usr/local/bin
5 15:10 ls
6 15:10 file dx
7 15:14 history
%
The ! character is another special character to tcsh - it is the history
substitution character. What you type after the "!" is interpreted by tcsh's
history mechanism. If it is a number, history invokes the command associated
with that number (see the first column of the output of the history
command above), if it exists. If it is text, history finds the last command
that started with that text and invokes that command.
For example, if the above output from history was my current history
and I wanted to remove the file "junk" in the current directory,
I could simply type:
% !3
or:
% !r
which would both have the effect of issuing the command "rm junk" from
history.
A related command is the !! command, which invokes the previous command,
whatever it was. Depending on your configuration, it may also be possible to
browse through the shell's history using up/down arrow keys on your keyboard.
There are other history substitutions which can extract just pieces
of the commands residing in history as well. The !$ history command,
for example expands to the last argument of the previous command:
% ls /usr/local/bin
% cd !$
is equivalent to:
% ls /usr/local/bin
% cd /usr/local/bin
Finally, the : character can be used in conjuction with !
to recall just the command or argument part from any entry in the history as
well:
% !6:0 !3:2
would issue the command part (:0) of command #6, and pass to it
argument 2 (:2) of command #3. Using our history from the first example
in this section, the above would be the equivalent of typing:
% file junk
There are many variations on this. Check the
tcsh man page for more details.
Aliases
Aliases are yet another mechanism to make using the UNIX shell more
more efficient and automated. Aliases provide a way to create your own
UNIX commands that are built up from other commands and options. For example,
perhaps you would like to define a shorthand way to invoke ls with all
your favorite options without having to type the options every time. With aliases,
you can do the following (tcsh):
% alias myls "ls -FC"
The above will define a new command called myls. With this alias
defined, typing "myls" will have the same effect as typing "ls -FC".
This can be extended further to
include multiple commands, options, etc (note that you can concatenate
multiple shell commands on the same line with the ; character).
Here are some examples:
% alias myalias "source /usr/myapp/setup.csh; cd ~/myapp; myapp"
% alias calcs "cd /home/jake/projects/121103/final/calcs"
% alias mako "ssh -l jake mako.structbio.vanderbilt.edu"
To view all currently defined aliases:
% alias
To destroy an alias called "myalias":
% unalias myalias
Taking the time to define some key aliases for your most commonly used
UNIX commands and/or directories can save a tremendous amount of repetitive typing.
Text Editors
Many files in UNIX are plain text files. You will encounter textual
configuration files, input files, script files, mail files and program code,
just to name a few. It is therefore a common thing in UNIX to edit text files.
This requires users to be familiar with at least one UNIX-based text editor.
The most ubiquitous text editor in UNIX is called vi. vi is highly
recommended for users who will spend a lot of time on UNIX systems. It has
a steep learning curve, but once you have spent a few weeks mastering it, vi
becomes a very powerful tool that will help you through many tasks. For a
good description and tutorial, we recommend reading the
vi life
preserver.
For more casual users, vi is overkill. We recommend
nedit which is a graphical text editor
with built-in mouse support (not unlike Windows notepad.exe). Nedit has
ease-of-use features that make it extremely simple to pick up, if not
extremely powerful. If nedit is not installed on your workstation, contact
your system administrator.
Customizing the Shell Environment
We've now covered several concepts related to the UNIX shell.
As you've seen, many aspects of the shell environment can be customized via
environment variables, shell variables, and aliases. There are also
likely to be customizations required by various applications that you will
encounter.
It is desireable to have these customizations automatically applied each
time you log in and/or start a new shell. In tcsh, this is accomplished
via the .cshrc file. When tcsh (or csh) starts, it looks in your home
directory for a file called .cshrc and automatically executes the commands
it contains just as if you typed them at the command line.
Setting up your .cshrc file requires a text editor, and some idea about
the customizations you would like to make, and that are appropriate for your
site. Usually you will be supplied with a good default .cshrc file from which
you can build your own customizations. If in doubt, contact your system
administrator for guidance.