Introduction to UNIX - Part 4: The Command Line

Back to Part 3

Up to the Index

What is the UNIX Shell?

The UNIX shell is a command-line interface, similar in some ways to the old DOS prompt on the PC platform. It is much more powerful, customizable, and easy to use however.

In addition to providing a command-line interface to the operating system, the shell also provides a programming language. Shell programs (also commonly called shell scripts) are collections of commands listed one after another inside of a text file. Creating shell scripts provides a simple way to automate repetitive tasks or generate various reports, among a myriad of other possibilities.

For simple tasks like program and file management, there is often a choice between using the GUI or the shell to accomplish the same thing. In general, becoming familiar with the shell will make you more productive and efficient. Relying on the GUI is slower, but may require less effort for novices. But even in the latter case it will at some point be necessary to use the shell for something that is not readily available in the GUI. So it is a necessary and worthwhile investment of time to become familiar with the shell's basic operation.

Which Shell Should I Use?

There are different versions of the UNIX shell and you were assigned a default shell when your user account was created. The two most commonly used shells today are called "tcsh" and "bash". tcsh is an expanded version of the older "csh" (csh is short for "C shell"). bash is an expanded version of the older "sh" (the full name of sh is the "Bourne Shell"). For the most part, they provide similar features and functionality, with a slightly different syntax.

Here we will focus on "tcsh" since csh and tcsh are historically more prevalent in academic environments than sh and bash. For this reason, user accounts in Structural Biology are set up with tcsh as the default shell.

Starting a Shell

The shell itself actually runs inside of another piece of software called a "terminal". The first thing you should do is become familiar with the way to start a terminal window on your UNIX operating system. As just mentioned, it will usually be fairly close at hand in the GUI - via one of the window manager's default menus (SGI's 4dwm on IRIX) or or via a button on a toolbar (Gnome or KDE on Linux).

Once the terminal window starts, the shell loads inside of it and you are presented with a prompt, followed by a cursor. The prompt usually ends with a >, $, or % character (I will use the "%" character as the prompt in the examples below). The cursor is usually a box or underline and may or may not flash, depending on the terminal program and its settings. When the terminal window is selected by the GUI, any keystrokes you make will appear in the terminal at the shell prompt. Selecting a window may require you to click in it, or to simply keep the mouse pointer inside the window while you type. This is dependent on your window manager's settings and is usually a customizable preference.

Issuing Commands

Shell commands consist of one or more words separated by blank spaces. The first word to appear in a command line is the name of a program or built-in shell command that is to be executed. The remaining words get passed as "arguments" to this program or command. Arguments can be options that tell the program how to behave, files that the program should operate on, processes the command should affect, or a number of other possibilities.

For example typing the command:

% ls

Executes the program ls which prints the names of the files in the current directory. If instead we type:

% ls -l

Then ls gets "-l" as an argument. The -l argument tells ls to include information about the size, creation date, owner, and permissions of each file along with its name.

Manual Pages

Nearly every program in UNIX, including graphical ones, has an online manual page ("man page" for short) associated with it. The man page for a given program is a help file that describes in detail how to use the program, what arguments it can accept, what files it expects, what other commands are related to it, and anything else you need to know. There is an extremely useful program called man which provides access to these manual pages via the command line. In addition, on some systems, the graphical program xman provides a rudimentary graphical interface to the manual pages.

The importance of the man program cannot be stressed enough. UNIX is so flexible and vast that not even the experts can keep all the details of every command in their head. For this reason, we have man.

To use man, you type "man", followed by the name of the command whose manual page you wish to view. For example:

% man ls

will give you great detail about the ls command, its options, and its usage.

man usually uses the program more (or its counterpart, less) to view the contents of each man page. To page forward through a manual page, you can use the spacebar. To page backward, you can usually use <ctrl>-b. To quit back to the shell prompt, press "q". For more information, you can visit the manpages for man and more:

% man man
% man more

Input/Output Redirection

Most command-line UNIX programs can read from something called standard input for their input and write to something called standard output for their output. Standard input (STDIN for short) is normally connected to the keyboard - that's how you normally provide input to programs via the shell. Standard output (STDOUT for short) is normally connected to the terminal window - the terminal is normally where the output will appear from commands you issued via the shell.

However, the shell also allows you to redirect STDIN and STDOUT with the special characters >, >>, <, and |. For example, STDOUT in our above example can be redirected to a file instead of the terminal window by typing:

% ls -l > list.txt

In this example, the output of ls -l is redirected into a file called list.txt instead of being printed to your terminal. If list.txt does not exist, it is created. If it does exist, its contents are replaced with the contents of STDOUT. Note that > is a special character to the shell. As such, it and subsequent characters on the command line are not interpreted as arguments to the ls command.

A variant of the above is:

% ls -l >> list.txt

>> works just like > except that if list.txt already exists, STDOUT is appended to the bottom of the file rather than replacing its contents.

STDIN can be taken from a file instead of the keyboard in an analagous way:

% sort < list.txt

In this example, the sort program reads the file list.txt on STDIN and sorts it into alphabetical order on STDOUT, which will be output to our terminal. If we instead type:

% sort < list.txt > sorted_list.txt

then list.txt will be passed to sort on STDIN, and the sorted result will be placed in the file sorted_list.txt on STDOUT.

Finally, we introduce the | (pipe) character, which is used to make so-called pipelines. The | character takes STDOUT of the command that appears on the left, and passes it to STDIN of the command that appears on the right. For example:

% ls -l | sort | lpr

takes STDOUT of ls -l, sends it to STDIN of sort, and finally sends STDOUT of sort to STDIN of lpr which prints out a hardcopy of the result via the default printer.

This is the equivalent of:

% ls -l > list.txt % sort < list.txt > sorted_list.txt % lpr sorted_list.txt

except that no intermediate files are used to pass the output of the preceding commands to the input of the succeeding commands.

Filenames and Naming Files

There are very few restrictions placed on naming files on most flavors of UNIX. You can put just about any character you want in a filename - including spaces and other special characters like %, $, ^, &, !, etc. This is normally considered to be bad form however, because as mentioned above some of these characters are interpreted by the shell to have special meaning. This can be worked around by preceding these characters with the \ character, which prevents whatever comes next from being interpreted as a special character by the shell. For example, say I had created a file named "is 1>2?" and I wanted to remove it with the rm command:

% rm is 1>2?

The above command has several problems. First of all, the shell uses spaces to separate arguments from each other, so rm will not interpret this string of characters as a single filename. Second, as we've already seen, the shell interprets the > character as a redirection of STDOUT, not part of the filename. Finally, the ? character has special meaning to the shell as a wildcard character (see about wildcards below).

To actually remove this file (or use it as an argument to any UNIX command), you would need to escape the space and the special characters with the \ character like so:

% rm is\ 1\>2\?

or put the whole filename in quotes:

% rm "is 1>2?"

As you can see, this can be rather confusing to look at or at least tedious to remember. To avoid these headaches, it is recommended that you keep spaces and other non-alphanumeric characters except for - (dash), _ (underscore), and . (dot) out of your filenames. UNIX will let you use the other characters, but the question should be "Do I really want to?"

Finally, a word about files beginning with the "." (period, or dot) character: In UNIX, there is nothing special about "." in a filename per se. But it is convention to use the "." character to define file extensions which give the user a hint about what type of file we are dealing with (such as .txt for text files, .jpg or .jpeg for jpeg image files, etc.). An additional convention in UNIX is that files named with a "." in the first position, such as ".cshrc", are configuration files. For instance, the csh and tcsh shells use the file named .cshrc in your home directory to configure your command-line environment. As such, these so-called "dot-files" are not displayed by default when using the ls command to list files. In addition, there are two special files in every directory called "." and "..", which point to the current directory and parent directory (the directory above the current one), respectively. To see files and directories that start with the "." character, use the "-a" argument to ls.

Wild Cards

Many UNIX commands accept multiple filenames as arguments. Wild cards are a way to specify multiple filenames to the shell at once in a sort of "short-hand" notation. There are three base constructs from which wild card expessions can be created in the UNIX shell:

*	Matches any string of characters
?	Matches any single character
[...]	Matches any one of the characters listed inside the brackets

For example, the command:

% ls *.doc

tells ls that you want a listing of all the files that end in the string ".doc". The * character at the beginning tells the shell that the filenames you are looking for can have any number of any character at that position of the expression.

Similarly, if you were to type:

% ls ?.doc

then ls would return all the files that have names consisting of any single character followed by the string ".doc". For example, filenames like "1.doc", "a.doc", or "A.doc", would be matched. Filenames like "01.doc" or "party.doc" not be matched.

If you were to type:

% ls ??.doc

then ls would return all the files that have names consisting of a single character followed by another single character, followed by the string ".doc".

To be more specific about the identity of the characters that you wish to match, you can use the [...] construct. For example:

% ls [0123].doc

Would specifically match the four filenames "0.doc", "1.doc", "2.doc", and "3.doc". This construct can also accept a range of characters (specified by a "-") instead of requiring you to list every member. For example,

% ls [0-3].doc

would be equivalent to the previous example which used "[0123]". You can also concatenate ranges by inserting a comma between them like so:

% ls [a-j,s,w-z].doc

The above example would pick up all files "a.doc" through "j.doc", "s.doc", and "w.doc" through "z.doc".

Finally, you can use the ^ character along with [...] to negate certain classes of characters as well. Also any of these constructs can be combined any number of times. For example:

% ls G?[^2-4]*foo*.doc

would match any filename that starts with "G", has anything in the second position, has anything except 2, 3, or 4 in the third position, has "foo" anyplace between the fourth and last character before finally ending in ".doc".

As you can see, these constructs can be combined in clever ways to select very specific lists of filenames in short-hand. This activity is also reffered to as globbing or file globbing. This is very powerful and useful when dealing with large numbers of files. Learning to use wild cards effectively can save you a lot of time compared to using a GUI file manager or worse, typing them all by hand!

Variables

Variables are words to which values (either string or numeric) can be assigned for later use. The shell provides two types of variables, shell variables and environment variables. Variables are used in a variety of ways. Some programs (like the shell itself, for example) can be configured by setting particular variables to user-specified values. Many of these variables are set to default values for you, but can be customized later to suit your needs. A program's man page usually indicates which variables (if any) can be manipulated to affect its operation.

In tcsh, some key differences between shell and environment variables are:

Shell variables are valid only in the shell that created them.
Any environment variables that exist in a shell are inhereted by child processes of that shell at the time the child process is created. Subsequently created or modified variables are not inherited by existing children, however.
Shell variables must start with a letter, but can contain numbers, letters and underscores.
Environment variables can start with numbers or letters and can contain letters, numbers, and special characters like -, _, %, etc.

In tcsh, shell variables are created with the set command, and destroyed with the unset command:

% set path=(/bin /usr/bin /usr/local/bin) % set a=100 % set docdir=/home/jake/docs % unset a

Environment variables are created with the setenv command, and destroyed with the unsetenv command:

% setenv LM_LICENSE_FILE /usr/local/flexlm/license.dat % setenv B 100 % setenv EDITOR /bin/vi % unsetenv B

Note that when using the set command to create shell variables there is an = sign between the variable and its value, but when using setenv to create environment variables there is only a blank space. Also, it is convention (but not enforced) that environment variables are named in all-caps, while shell variables are named in lowercase.

To use a variable in a command, precede it with the $ character. Assuming the docdir variable contained "/home/jake/docs", the command:

% ls -l $docdir

would be the equivalent of:

% ls -l /home/jake/docs

To view all current shell and environment variables (and their values), use the set and printenv commands alone, respectively:

% set % printenv

To view the value of a specific environment variable, use its name alone as the argument to printenv:

% printenv LM_LICENSE_FILE

There is no equivalent of the above for shell variables using the set command, but you can use echo for this:

% echo $docdir

You can invent your own variable names to use in your own commands and scripts. Here is a partial list of variables that are reserved for the shell, the system as a whole, or common applications. Many of these are set for you by default and are important for the normal operation of your shell and/or the system.

$path or $PATH - A space-separated list of directories in which the shell will look for programs that you type from the command line.
$MANPATH - A colon-separated list of directories in which the man program will look for manual pages.
$HOME - Your home directory.
$SHELL - Your default shell
$TERM - The type of terminal you are using (this is important for certain keyboard definitions and other terminal display characteristics).
$EDITOR - Your default text editor (see the section on text editors below).
$DISPLAY - The display to which the X Window System should attempt to draw.
$LD_LIBRARY_PATH - A colon-separated list of directories in which the loader/linker will look for dynamic libraries (in addition to the system-default places it looks).
$LM_LICENSE_PATH - The place where flexlm will look for a license file. Flexlm is license management software used by some commercial software vendors as an authorization mechanism for their applications.

Filename Completion

Using a simple command-line interface can be tedious and repetitive. Fortunately, modern UNIX shells have many features which make using the command line very efficient. In fact, for many tasks it is much more efficient than using a GUI with a mouse. Three features which make using the shell efficient are called filename completion, history, and aliases.

Filename completion is a mechanism by which you can type just part of a command or filename and have the shell figure out what you mean and complete it for you automatically. Learning this technique can save you many hundreds or even thousands of keystrokes per day.

In tcsh (and bash as well), when you are typing the first word of a command, you can press the <tab> key when you are part way through and the shell will attempt to finish typing the word for you by looking for commands in your $PATH variable that match the text you've typed so far. If there is only one possiblity, the shell will finish typing the word for you. If there are multiple possibilities, it will complete as much of the word as it can and then wait for you to type more.

Similarly, when you are typing arguments to a command, tcsh (and bash) tries to interpret the argument as a filename when you press <tab>, filling in as much of the filename as it can based on what you've typed.

For example, let us assume that there is a jpeg image called 1.jpg in the current directory that we wish to convert to the RGB format with the convert program. Let us also assume that it is the only file in this directory that starts with the character "1", and that the convert command is the only program in my PATH that starts with the characters "conv":

% conv<tab> 1<tab> 1.rgb

Would produce:

% convert 1.jpg 1.rgb

In cases where there are multiple possibilites (for example, if the convert program was not the only thing in my PATH that started with "conv", you can press <ctrl>-d and tcsh will print the list of possibilites, then put you back where you were in the command that you are typing. Example:

% conv<ctrl>-d convert convolve % conv

I typed conv followed by <ctrl>-d and all the commands in my PATH that started with "conv" appeared. Then I was returned to the command line as if I had not pressed <ctrl>-d.

History

As mentioned above, history is another mechanism that's designed to make using the UNIX command-line interface more efficient. History remembers the last commands you've typed so that you can reuse them (or portions of them) in subsequent commands without the tedium of retyping. In tcsh, the $history variable defines how many commands history will remember:

% set history=50

will tell history to remember up to the last 50 commands you've typed in this shell (normally, each running shell keeps track of its own history). Typing "history" by itself at the command line will report a numbered list of commands that you've typed along with the time each one was issued:

% history 1 15:04 cd /home/jake/notes/121103 2 15:04 ls 3 15:05 rm junk 4 15:10 cd /usr/local/bin 5 15:10 ls 6 15:10 file dx 7 15:14 history %

The ! character is another special character to tcsh - it is the history substitution character. What you type after the "!" is interpreted by tcsh's history mechanism. If it is a number, history invokes the command associated with that number (see the first column of the output of the history command above), if it exists. If it is text, history finds the last command that started with that text and invokes that command.

For example, if the above output from history was my current history and I wanted to remove the file "junk" in the current directory, I could simply type:

% !3

or:

% !r

which would both have the effect of issuing the command "rm junk" from history.

A related command is the !! command, which invokes the previous command, whatever it was. Depending on your configuration, it may also be possible to browse through the shell's history using up/down arrow keys on your keyboard.

There are other history substitutions which can extract just pieces of the commands residing in history as well. The !$ history command, for example expands to the last argument of the previous command:

% ls /usr/local/bin
% cd !$

is equivalent to:

% ls /usr/local/bin
% cd /usr/local/bin

Finally, the : character can be used in conjuction with ! to recall just the command or argument part from any entry in the history as well:

% !6:0 !3:2

would issue the command part (:0) of command #6, and pass to it argument 2 (:2) of command #3. Using our history from the first example in this section, the above would be the equivalent of typing:

% file junk

There are many variations on this. Check the tcsh man page for more details.

Aliases

Aliases are yet another mechanism to make using the UNIX shell more more efficient and automated. Aliases provide a way to create your own UNIX commands that are built up from other commands and options. For example, perhaps you would like to define a shorthand way to invoke ls with all your favorite options without having to type the options every time. With aliases, you can do the following (tcsh):

% alias myls "ls -FC"

The above will define a new command called myls. With this alias defined, typing "myls" will have the same effect as typing "ls -FC".

This can be extended further to include multiple commands, options, etc (note that you can concatenate multiple shell commands on the same line with the ; character). Here are some examples:

% alias myalias "source /usr/myapp/setup.csh; cd ~/myapp; myapp"
% alias calcs "cd /home/jake/projects/121103/final/calcs"
% alias mako "ssh -l jake mako.structbio.vanderbilt.edu"

To view all currently defined aliases:

% alias

To destroy an alias called "myalias":

% unalias myalias

Taking the time to define some key aliases for your most commonly used UNIX commands and/or directories can save a tremendous amount of repetitive typing.

Text Editors

Many files in UNIX are plain text files. You will encounter textual configuration files, input files, script files, mail files and program code, just to name a few. It is therefore a common thing in UNIX to edit text files. This requires users to be familiar with at least one UNIX-based text editor.

The most ubiquitous text editor in UNIX is called vi. vi is highly recommended for users who will spend a lot of time on UNIX systems. It has a steep learning curve, but once you have spent a few weeks mastering it, vi becomes a very powerful tool that will help you through many tasks. For a good description and tutorial, we recommend reading the vi life preserver.

For more casual users, vi is overkill. We recommend nedit which is a graphical text editor with built-in mouse support (not unlike Windows notepad.exe). Nedit has ease-of-use features that make it extremely simple to pick up, if not extremely powerful. If nedit is not installed on your workstation, contact your system administrator.

Customizing the Shell Environment

We've now covered several concepts related to the UNIX shell. As you've seen, many aspects of the shell environment can be customized via environment variables, shell variables, and aliases. There are also likely to be customizations required by various applications that you will encounter.

It is desireable to have these customizations automatically applied each time you log in and/or start a new shell. In tcsh, this is accomplished via the .cshrc file. When tcsh (or csh) starts, it looks in your home directory for a file called .cshrc and automatically executes the commands it contains just as if you typed them at the command line.

Setting up your .cshrc file requires a text editor, and some idea about the customizations you would like to make, and that are appropriate for your site. Usually you will be supplied with a good default .cshrc file from which you can build your own customizations. If in doubt, contact your system administrator for guidance.

Useful Commands

Manual pages are great, but you have to know the command exists first! The list of commands below is a good place to start getting familiar with UNIX. Browsing through man pages with a tool like xman can also be beneficial. Visit the man pages of the following commands to learn more about each of them:

File Management

ls - List files
cd - Change directories
pwd - Print current (working) directory
mkdir - Make a new, empty directory
rmdir - Remove an empty directory
cp - Copy files
rm - Remove files
mv - Move files or directories
find - Find files or folders based on name, date, size, ownership or other parameters

Permissions

chown - Change ownership of files/directories
chgrp - Change group ownership of files/directories
chmod - Change permissions (mode) of files/directories
groups - Report the group(s) you belong to.
id - Report your username, userid, group(s) and group id(s).
newgrp - Set your default group for the current shell.

Resource Monitoring

w - See who is logged on and what they are doing
top - See the top resource-hungry processes
df - See how much disk space is free
du - Report numbers and sizes of files on disk

Printing

lpr - Send text or postscript files to the printer
lpq - View the print queue
lprm - Remove your print jobs

Job Control

which - Display the full path of commands
ps - View active processes
<ctrl>-z - Suspended the current job
bg - Put a suspended job in the background
fg - Bring a suspended job into the foreground
kill - Kill a process

Filtering/Reporting

grep - Search for substrings in a file or pipeline
sort - Sort lines alphabetically or numerically
wc - Count lines, words, characters
cat - Catalog a file or files
more - Terminal-based text viewing program

Back to Part 3

Up to the Index