Introduction to UNIX - Part 4: The Command Line
What is the UNIX Shell?The UNIX shell is a command-line interface, similar in some ways to the old DOS prompt on the PC platform. It is much more powerful, customizable, and easy to use however.
In addition to providing a command-line interface to the operating system, the shell also provides a programming language. Shell programs (also commonly called shell scripts) are collections of commands listed one after another inside of a text file. Creating shell scripts provides a simple way to automate repetitive tasks or generate various reports, among a myriad of other possibilities.
For simple tasks like program and file management, there is often a choice between using the GUI or the shell to accomplish the same thing. In general, becoming familiar with the shell will make you more productive and efficient. Relying on the GUI is slower, but may require less effort for novices. But even in the latter case it will at some point be necessary to use the shell for something that is not readily available in the GUI. So it is a necessary and worthwhile investment of time to become familiar with the shell's basic operation.
Which Shell Should I Use?There are different versions of the UNIX shell and you were assigned a default shell when your user account was created. The two most commonly used shells today are called "tcsh" and "bash". tcsh is an expanded version of the older "csh" (csh is short for "C shell"). bash is an expanded version of the older "sh" (the full name of sh is the "Bourne Shell"). For the most part, they provide similar features and functionality, with a slightly different syntax.
Here we will focus on "tcsh" since csh and tcsh are historically more prevalent in academic environments than sh and bash. For this reason, user accounts in Structural Biology are set up with tcsh as the default shell.
Starting a ShellThe shell itself actually runs inside of another piece of software called a "terminal". The first thing you should do is become familiar with the way to start a terminal window on your UNIX operating system. As just mentioned, it will usually be fairly close at hand in the GUI - via one of the window manager's default menus (SGI's 4dwm on IRIX) or or via a button on a toolbar (Gnome or KDE on Linux).
Once the terminal window starts, the shell loads inside of it and you are presented with a prompt, followed by a cursor. The prompt usually ends with a >, $, or % character (I will use the "%" character as the prompt in the examples below). The cursor is usually a box or underline and may or may not flash, depending on the terminal program and its settings. When the terminal window is selected by the GUI, any keystrokes you make will appear in the terminal at the shell prompt. Selecting a window may require you to click in it, or to simply keep the mouse pointer inside the window while you type. This is dependent on your window manager's settings and is usually a customizable preference.
Issuing CommandsShell commands consist of one or more words separated by blank spaces. The first word to appear in a command line is the name of a program or built-in shell command that is to be executed. The remaining words get passed as "arguments" to this program or command. Arguments can be options that tell the program how to behave, files that the program should operate on, processes the command should affect, or a number of other possibilities.
For example typing the command:
Executes the program ls which prints the names of the files in the current directory. If instead we type:
Then ls gets "-l" as an argument. The -l argument tells ls to include information about the size, creation date, owner, and permissions of each file along with its name.
Manual PagesNearly every program in UNIX, including graphical ones, has an online manual page ("man page" for short) associated with it. The man page for a given program is a help file that describes in detail how to use the program, what arguments it can accept, what files it expects, what other commands are related to it, and anything else you need to know. There is an extremely useful program called man which provides access to these manual pages via the command line. In addition, on some systems, the graphical program xman provides a rudimentary graphical interface to the manual pages.
The importance of the man program cannot be stressed enough. UNIX is so flexible and vast that not even the experts can keep all the details of every command in their head. For this reason, we have man.
To use man, you type "man", followed by the name of the command whose manual page you wish to view. For example:
will give you great detail about the ls command, its options, and its usage.
man usually uses the program more (or its counterpart, less) to view the contents of each man page. To page forward through a manual page, you can use the spacebar. To page backward, you can usually use <ctrl>-b. To quit back to the shell prompt, press "q". For more information, you can visit the manpages for man and more:
Input/Output RedirectionMost command-line UNIX programs can read from something called standard input for their input and write to something called standard output for their output. Standard input (STDIN for short) is normally connected to the keyboard - that's how you normally provide input to programs via the shell. Standard output (STDOUT for short) is normally connected to the terminal window - the terminal is normally where the output will appear from commands you issued via the shell.
However, the shell also allows you to redirect STDIN and STDOUT with the special characters >, >>, <, and |. For example, STDOUT in our above example can be redirected to a file instead of the terminal window by typing:
In this example, the output of ls -l is redirected into a file called list.txt instead of being printed to your terminal. If list.txt does not exist, it is created. If it does exist, its contents are replaced with the contents of STDOUT. Note that > is a special character to the shell. As such, it and subsequent characters on the command line are not interpreted as arguments to the ls command.
A variant of the above is:
>> works just like > except that if list.txt already exists, STDOUT is appended to the bottom of the file rather than replacing its contents.
STDIN can be taken from a file instead of the keyboard in an analagous way:
In this example, the sort program reads the file list.txt on STDIN and sorts it into alphabetical order on STDOUT, which will be output to our terminal. If we instead type:
then list.txt will be passed to sort on STDIN, and the sorted result will be placed in the file sorted_list.txt on STDOUT.
Finally, we introduce the | (pipe) character, which is used to make so-called pipelines. The | character takes STDOUT of the command that appears on the left, and passes it to STDIN of the command that appears on the right. For example:
takes STDOUT of ls -l, sends it to STDIN of sort, and finally sends STDOUT of sort to STDIN of lpr which prints out a hardcopy of the result via the default printer.
This is the equivalent of:
except that no intermediate files are used to pass the output of the preceding commands to the input of the succeeding commands.
Filenames and Naming FilesThere are very few restrictions placed on naming files on most flavors of UNIX. You can put just about any character you want in a filename - including spaces and other special characters like %, $, ^, &, !, etc. This is normally considered to be bad form however, because as mentioned above some of these characters are interpreted by the shell to have special meaning. This can be worked around by preceding these characters with the \ character, which prevents whatever comes next from being interpreted as a special character by the shell. For example, say I had created a file named "is 1>2?" and I wanted to remove it with the rm command:
The above command has several problems. First of all, the shell uses spaces to separate arguments from each other, so rm will not interpret this string of characters as a single filename. Second, as we've already seen, the shell interprets the > character as a redirection of STDOUT, not part of the filename. Finally, the ? character has special meaning to the shell as a wildcard character (see about wildcards below).
To actually remove this file (or use it as an argument to any UNIX command), you would need to escape the space and the special characters with the \ character like so:
or put the whole filename in quotes:
As you can see, this can be rather confusing to look at or at least tedious to remember. To avoid these headaches, it is recommended that you keep spaces and other non-alphanumeric characters except for - (dash), _ (underscore), and . (dot) out of your filenames. UNIX will let you use the other characters, but the question should be "Do I really want to?"
Finally, a word about files beginning with the "." (period, or dot) character: In UNIX, there is nothing special about "." in a filename per se. But it is convention to use the "." character to define file extensions which give the user a hint about what type of file we are dealing with (such as .txt for text files, .jpg or .jpeg for jpeg image files, etc.). An additional convention in UNIX is that files named with a "." in the first position, such as ".cshrc", are configuration files. For instance, the csh and tcsh shells use the file named .cshrc in your home directory to configure your command-line environment. As such, these so-called "dot-files" are not displayed by default when using the ls command to list files. In addition, there are two special files in every directory called "." and "..", which point to the current directory and parent directory (the directory above the current one), respectively. To see files and directories that start with the "." character, use the "-a" argument to ls.
Wild CardsMany UNIX commands accept multiple filenames as arguments. Wild cards are a way to specify multiple filenames to the shell at once in a sort of "short-hand" notation. There are three base constructs from which wild card expessions can be created in the UNIX shell:
For example, the command:
tells ls that you want a listing of all the files that end in the string ".doc". The * character at the beginning tells the shell that the filenames you are looking for can have any number of any character at that position of the expression.
Similarly, if you were to type:
then ls would return all the files that have names consisting of any single character followed by the string ".doc". For example, filenames like "1.doc", "a.doc", or "A.doc", would be matched. Filenames like "01.doc" or "party.doc" not be matched.
If you were to type:
then ls would return all the files that have names consisting of a single character followed by another single character, followed by the string ".doc".
To be more specific about the identity of the characters that you wish to match, you can use the [...] construct. For example:
Would specifically match the four filenames "0.doc", "1.doc", "2.doc", and "3.doc". This construct can also accept a range of characters (specified by a "-") instead of requiring you to list every member. For example,
would be equivalent to the previous example which used "". You can also concatenate ranges by inserting a comma between them like so:
The above example would pick up all files "a.doc" through "j.doc", "s.doc", and "w.doc" through "z.doc".
Finally, you can use the ^ character along with [...] to negate certain classes of characters as well. Also any of these constructs can be combined any number of times. For example:
would match any filename that starts with "G", has anything in the second position, has anything except 2, 3, or 4 in the third position, has "foo" anyplace between the fourth and last character before finally ending in ".doc".
As you can see, these constructs can be combined in clever ways to select very specific lists of filenames in short-hand. This activity is also reffered to as globbing or file globbing. This is very powerful and useful when dealing with large numbers of files. Learning to use wild cards effectively can save you a lot of time compared to using a GUI file manager or worse, typing them all by hand!
VariablesVariables are words to which values (either string or numeric) can be assigned for later use. The shell provides two types of variables, shell variables and environment variables. Variables are used in a variety of ways. Some programs (like the shell itself, for example) can be configured by setting particular variables to user-specified values. Many of these variables are set to default values for you, but can be customized later to suit your needs. A program's man page usually indicates which variables (if any) can be manipulated to affect its operation.
In tcsh, some key differences between shell and environment variables are:
In tcsh, shell variables are created with the set command, and destroyed with the unset command:
Environment variables are created with the setenv command, and destroyed with the unsetenv command:
Note that when using the set command to create shell variables there is an = sign between the variable and its value, but when using setenv to create environment variables there is only a blank space. Also, it is convention (but not enforced) that environment variables are named in all-caps, while shell variables are named in lowercase.
To use a variable in a command, precede it with the $ character. Assuming the docdir variable contained "/home/jake/docs", the command:
would be the equivalent of:
To view all current shell and environment variables (and their values), use the set and printenv commands alone, respectively:
To view the value of a specific environment variable, use its name alone as the argument to printenv:
There is no equivalent of the above for shell variables using the set command, but you can use echo for this:
You can invent your own variable names to use in your own commands and scripts. Here is a partial list of variables that are reserved for the shell, the system as a whole, or common applications. Many of these are set for you by default and are important for the normal operation of your shell and/or the system.
Filename CompletionUsing a simple command-line interface can be tedious and repetitive. Fortunately, modern UNIX shells have many features which make using the command line very efficient. In fact, for many tasks it is much more efficient than using a GUI with a mouse. Three features which make using the shell efficient are called filename completion, history, and aliases.
Filename completion is a mechanism by which you can type just part of a command or filename and have the shell figure out what you mean and complete it for you automatically. Learning this technique can save you many hundreds or even thousands of keystrokes per day.
In tcsh (and bash as well), when you are typing the first word of a command, you can press the <tab> key when you are part way through and the shell will attempt to finish typing the word for you by looking for commands in your $PATH variable that match the text you've typed so far. If there is only one possiblity, the shell will finish typing the word for you. If there are multiple possibilities, it will complete as much of the word as it can and then wait for you to type more.
Similarly, when you are typing arguments to a command, tcsh (and bash) tries to interpret the argument as a filename when you press <tab>, filling in as much of the filename as it can based on what you've typed.
For example, let us assume that there is a jpeg image called 1.jpg in the current directory that we wish to convert to the RGB format with the convert program. Let us also assume that it is the only file in this directory that starts with the character "1", and that the convert command is the only program in my PATH that starts with the characters "conv":
In cases where there are multiple possibilites (for example, if the convert program was not the only thing in my PATH that started with "conv", you can press <ctrl>-d and tcsh will print the list of possibilites, then put you back where you were in the command that you are typing. Example:
I typed conv followed by <ctrl>-d and all the commands in my PATH that started with "conv" appeared. Then I was returned to the command line as if I had not pressed <ctrl>-d.
HistoryAs mentioned above, history is another mechanism that's designed to make using the UNIX command-line interface more efficient. History remembers the last commands you've typed so that you can reuse them (or portions of them) in subsequent commands without the tedium of retyping. In tcsh, the $history variable defines how many commands history will remember:
will tell history to remember up to the last 50 commands you've typed in this shell (normally, each running shell keeps track of its own history). Typing "history" by itself at the command line will report a numbered list of commands that you've typed along with the time each one was issued:
The ! character is another special character to tcsh - it is the history substitution character. What you type after the "!" is interpreted by tcsh's history mechanism. If it is a number, history invokes the command associated with that number (see the first column of the output of the history command above), if it exists. If it is text, history finds the last command that started with that text and invokes that command.
For example, if the above output from history was my current history and I wanted to remove the file "junk" in the current directory, I could simply type:
which would both have the effect of issuing the command "rm junk" from history.
A related command is the !! command, which invokes the previous command, whatever it was. Depending on your configuration, it may also be possible to browse through the shell's history using up/down arrow keys on your keyboard.
There are other history substitutions which can extract just pieces of the commands residing in history as well. The !$ history command, for example expands to the last argument of the previous command:
is equivalent to:
Finally, the : character can be used in conjuction with ! to recall just the command or argument part from any entry in the history as well:
would issue the command part (:0) of command #6, and pass to it argument 2 (:2) of command #3. Using our history from the first example in this section, the above would be the equivalent of typing:
There are many variations on this. Check the tcsh man page for more details.
Aliases are yet another mechanism to make using the UNIX shell more more efficient and automated. Aliases provide a way to create your own UNIX commands that are built up from other commands and options. For example, perhaps you would like to define a shorthand way to invoke ls with all your favorite options without having to type the options every time. With aliases, you can do the following (tcsh):
The above will define a new command called myls. With this alias defined, typing "myls" will have the same effect as typing "ls -FC".
This can be extended further to include multiple commands, options, etc (note that you can concatenate multiple shell commands on the same line with the ; character). Here are some examples:
To view all currently defined aliases:
To destroy an alias called "myalias":
Taking the time to define some key aliases for your most commonly used UNIX commands and/or directories can save a tremendous amount of repetitive typing.
Text EditorsMany files in UNIX are plain text files. You will encounter textual configuration files, input files, script files, mail files and program code, just to name a few. It is therefore a common thing in UNIX to edit text files. This requires users to be familiar with at least one UNIX-based text editor.
The most ubiquitous text editor in UNIX is called vi. vi is highly recommended for users who will spend a lot of time on UNIX systems. It has a steep learning curve, but once you have spent a few weeks mastering it, vi becomes a very powerful tool that will help you through many tasks. For a good description and tutorial, we recommend reading the vi life preserver.
For more casual users, vi is overkill. We recommend nedit which is a graphical text editor with built-in mouse support (not unlike Windows notepad.exe). Nedit has ease-of-use features that make it extremely simple to pick up, if not extremely powerful. If nedit is not installed on your workstation, contact your system administrator.
Customizing the Shell Environment
We've now covered several concepts related to the UNIX shell. As you've seen, many aspects of the shell environment can be customized via environment variables, shell variables, and aliases. There are also likely to be customizations required by various applications that you will encounter.
It is desireable to have these customizations automatically applied each time you log in and/or start a new shell. In tcsh, this is accomplished via the .cshrc file. When tcsh (or csh) starts, it looks in your home directory for a file called .cshrc and automatically executes the commands it contains just as if you typed them at the command line.
Setting up your .cshrc file requires a text editor, and some idea about the customizations you would like to make, and that are appropriate for your site. Usually you will be supplied with a good default .cshrc file from which you can build your own customizations. If in doubt, contact your system administrator for guidance.
Useful CommandsManual pages are great, but you have to know the command exists first! The list of commands below is a good place to start getting familiar with UNIX. Browsing through man pages with a tool like xman can also be beneficial. Visit the man pages of the following commands to learn more about each of them: