The Unix File System

Steven Zeil

Last modified: Aug 24, 2023
Contents:

1 Files

What’s in a file? Broadly speaking, we can divide files into two categories: text files and binary files.

Text files are files that consist entirely of human-readable (more or less) text, while binary files are files that encode data in a fashion intended only for interpretation by a machine.

1.1 File Names

Unix file names can be almost any length and may contain almost any characters. As a practical matter, however, you should avoid using punctuation characters other than the hyphen, the underscore, and the period. Also, avoid blanks, and non-printable characters within file names. All of these have special meanings when you are typing commands and so would be very hard to enter within a file name.

Some things to keep in mind about Unix file names that may be different from other file systems you have used:

1.2 Text Files

A lot of what we store in files is just text. Text is represented in files, much like it is stored in memory, by placing one character in each successive byte of the file. Of course, bytes actually hold numbers (in the range 0..255), so we use a character set mapping to assign each character a numeric value.

Since the mid-1960’s, the dominant character set has been ASCII. ASCII encodes 128 different characters. Technically, you can say that it wastes one bit of every 8-bit byte. The characters encoded are

So a data file containing a single line of text with the word “Hello” would actually be encoded in a file as

72 101 108 108 111 10

i.e., 72 is the ASCII code for ‘H’, 101 is the code for ‘e’, 108 the ode for ‘l’, 111 the code for ‘o’, and 10 is the line feed (end of line) control character.

Here is a dump of the opening bytes of the text file from which this particular document was generated.

Compare the ASCII codes you see here to the opening paragraphs of this document.

1.2.1 What Can You Do With ASCII Text Files?

Certainly we will be able to do everything to a text file that we can do with generic binary files: copying, renaming, etc.

In addition, we will learn that Linux has quite a few commands for working with text, including commands for viewing, changing, editing, and measuring properties of text.

1.2.2 Unicode

Now, the 128 characters defined in ASCII is good enough for basic purposes, particularly if you speak (and write in) English.

But before long, pressure built to expand the available characters. Some of this pressure came from specialized applications. For example, there are numerous symbols in mathematics and the sciences that aren’t in the ASCII code. Even outside of technical fields, typesetters might desire specialized symbols like or .

For a while, developers tried to stem the tide by defining character sets that used all 256 possible values of a byte, but even that was little more than a temporary respite.

Even more pressure came from different languages. Anyone whose native language was Spanish, for example, would regret the absence of characters like á or ¿. But that’s only the tip of the iceberg. Greeks and Russians have their own entire alphabets. And once we get past Europe, there are entire families of alphabets for Asian, Middle Eastern, and African languages.

Unicode was introduced in the 1980’s to provide a 16-bit character set, later expanded in the mid 1990’s to 26 bits, which provides for over a million possible characters. Unicode now incorporates not only the traditional ASCII characters (as Unicode values 0..127, so an ASCII text file is also a valid Unicode text file), lots of specialized symbols, many different international alphabets, and, for better or worse, emoticons and emoji.

Complicating matters, Unicode allows for a variety of different ways to arrange the numbers within a stream of bytes. These are called character encoding schemes, or “encodings” for short. For example one encoding, UTF-32, stores a single (26 bit) Unicode character in a block of 4 bytes (32-bits). This is fairly simply, but if 99.9% of your characters are actually down from the original ASCII set, then this wastes nearly 75% of the file storage. Another, more popular, scheme, UTF-8, stores all characters from the original ASCII set in a single byte, but inserts special values outside of the ASCII 0..127 range to signal that the next character is a non-ASCII Unicode symbol that will need 2, 3, or 4 bytes.

1.2.3 What Can You Do With Unicode Text Files?

Certainly we will be able to do everything to a text file that we can do with generic binary files: copying, renaming, etc.

Some of the Linux commands for working with ASCII text files will also work with Unicode. Others will not. Some will work with Unicode files encoded in UTF-8 but not in other encodings. Presumably, more and more of the text processing commands will support Unicode in the future.

1.3 Binary files

A binary file is a sequence of bytes that can contain almost anything. In practice, some software developer working on a program defined a file format for holding the data needed by that program. The file format was probably designed to be compact and easily processed by that program.

Here is a dump of the opening bytes of the file containing the first picture in section 1 of this document.

You’ll notice that the ASCII column on the far right contains lots of ‘.’ characters, whic are actually used by the hexdump program to indicate a byte that contains a non-ASCII character value or an ASCII value that does not have a visible representation (e.g., line terminators). Where ASCII characters are displayed, they appear to be almost random. That’s because, for the most part, they pretty much are. Any binary data file is bound to contain some bytes that just happen to match an ASCII character code.

1.3.1 What Can You Do With Binary Files?

We can also do operations that don’t rely on interpreting or understanding the contents of the file: e.g., copying the file, renaming it, or moving it to a different directory.

Typically, though, the contents of that binary file can only be processed by that one program or by other programs written later with the specific goal of processing that same file format. So you can use binary files only as input to a program specifically designed to handle their file format.

For example, every operating system defines a file format for executable programs. (For all practical purposes, we “run” an executable program by supplying it as input to the operating system.) For the most part, a program is just a block of machine code instructions encoded in binary. Load that block into memory, point the CPU at the address where it was loaded, and it runs. But to facilitate the process of loading that block into memory, an operating system may specify that each executable program file starts with a “header” of information indicating where it can be loaded, what other resources need to be available, etc. Because these headers are operating-system specific, trying to execute a program designed for one operating system will usually result in a quick error message if you try to run it on a different operating system, because that operating system will quickly realize that the header is not in the proper format. (This is by no means the only reason why executables for one operating system won’t run under another. It does, however, account for the fact that if you try this, you will usually get stopped before doing any real damage.)

As another example, in 1987 Steve Wilhite, a developer at Compuserve, defined a new format for holding images called the Graphics Interchange Format, or GIF for short. Originally, the only programs that could interpret the GIF format were conversion programs provided by Compuserve for converting between GIF and older exiting graphics file formats. Eventually, web browsers added code designed to interpret and render GIF, and now almost every program that deals in graphics includes code designed to handle GIF.

You cannot hand a GIF file to the operating system to be executed like a program, nor can you ask a web browser or graphics viewer to render a program as if it were an image. Doing so will result in an error message a best, garbage output if you are not so lucky, a hung system if you are still less likely, or a corrupted file system if you are really having a bad day.

1.4 What’s Text and What’s Binary?

We’ll talk more about this in a later lesson, but some examples might be useful now.

2 Directories

Files in Unix are organized by collecting them into directories. (In Windows these are more commonly known as “folders”.) Directories are themselves files, and so may appear within other directories.

The result is a tree-like hierarchy. At the root of this tree is a directory known simply as “/”.1 This directory lists various others:

The bin directory contains many of the programs for performing common Unix commands. The usr directory contains many of the data files that are required by those and other commands. Of particular interest, however, is the home directory, which contains all of the files associated with individual users like you and me. Each individual user gets a directory within home bearing their own login name. My login name is zeil.

We can expand our view of the Unix files then as:

cd and ls are two common Unix commands, as will be explained later.

Within my own home directory, I have a directory also named “bin”, containing my own personal programs. Two of these are called “clpr” and “psnup”. So these files are arranged as:

3 Paths

3.1 How do you give someone directions?

We’ve all done this from time to time – asked someone for directions on how to get to someplace. Some people are very good at giving directions, others not so much. Some people are good a following directions, others not so much.

How do I get to the White House?

Look for the Washington Monument. It should be easy to spot.

From the Washington Monument, head north along the path until you come to a fork. Turn right, walk about 500 ft. then make a sharp left and head north towards the intersection of 15th St. and Constitution Ave. NW.

From that intersection, continue, continue north along 15th st. until you reach Pennsylvania Ave. Turn left and proceed west along Pennsylvania Ave. until you reach the gate of the White House.

This is an example of absolute directions. They rely on your starting from a well-known, easily reached landmark and proceeding from there.

If you asked how to get to my office on the ODU campus, I might give you absolute directions by assuming that you knew how to start from Webb Center, or, more likely, from the abandoned monorail track that passes through much of the campus.

How do I get to the White House?

“Well, where are you now?”

“I’m in Lafayette Square.”

OK, walk along the southeast path until you come to Pennsylvania Ave NW.

Turn right and proceed west along Pennsylvania Ave. until you reach the gate of the White House.

This is an example of relative directions. Relative directions can often (though not always) be shorter and simpler than absolute directions.

3.2 File Paths

The full “name” of any file is given by listing the entire path from the root of the directory tree down to the file itself, with “/” characters separating each directory from what follows.

For example, the full names (paths) of the four programs in the above diagram are

   /bin/cd
   /bin/ls
   /home/zeil/bin/clpr
   /home/zeil/bin/psnup

3.3 Paths Supply Directions

It’s important to recognize that a path is a step-by-step set of instructions on how to find a specific file. For example, /home/zeil/bin/psnup means:

  1. Start at the root of the Unix file system, /
  2. There you should see a directory named home. Look in that directory.
  3. In that directory, you should see a directory named zeil. Look in that directory.
  4. In that directory, you should see a directory named bin. Look in that directory.
  5. In that directory, you should see a file named psnup. That’s the file you want.

In the assignments for this course, I will often give instructions like

Copy the file /home/zeil/bin/psnup into the directory...

and then will get email from students saying something like

I can't find the file /home/zeil/bin/psnup. Where do I find it?

which is rather like asking “What’s the address of the house at 221B Baker St., London, Eng.”? or “How heavy is a 5 pound bag of flour?”

The answer is literally right there in the question!

Now, when I say that a path is a step-by-step set of directions, understand that we seldom have to follow those directions step by step. Almost any time and any place I need to name a file in Unix, I can simply give a path to it and let the operating system follow those step-by-step directions.

 
Example 1: Try This: Exploring Directories
  1. Log in to your Unix account.

  2. Upon logging in, your working directory should be your home directory. The command pwd will print the working directory. Give the command

    pwd

    You should see something like

    /home/yourname
    

    This is a path. What does this path tell you?

    Answer
  3. The command file can tell you what kind of file you have. Give the command:

    file /home/yourName

    substituting your own login name for yourName.

    Does the response you get from file make sense?

  4. The command cd will let you change your current working directory.

    Give the following commands and observe the results:

    cd /
    pwd
    file /
    cd /usr
    pwd
    file /usr
    
  5. The ls command is used to list the contents of a directory.

    Give the following commands.

    ls /
    ls /usr
    ls /home
    

    You should recognize one of the entries in the last of those listings as being your own login name.

  6. Try using the up arrow and down arrow keys. You should be able to move back and forth through the history of commands you have already tried out.

    Use the arrows to revisit the ls /usr command. Hit the Enter key to re-issue that command.

  7. Many command shells will offer to auto-complete file and folder names after you type a few characters.

    Type

    ls /usr/lo

    and then hit the Tab key instead of Enter. The command shell should guess that you were on your way to typing “local”, and fill in the remaining characters for you.

     

    Hit Enter to issue the command.

  8. Type

    ls /usr/li

    and then hit the Tab key instead of Enter. The command shell will look and discover that there are several possible entries in /usr that begin with “li”. As it happens, all of them begin with “lib”, so it will add the ‘b’ and then pause. You may hear a beep that indicates that it could go no further.

    Hit the Tab key twice more and the shell will show you all of the possible continuations of /usr/lib…

    Hit Enter to list /usr/lib.

  9. Give the command

    exit

    to close out this session.

3.4 Absolute and Relative Paths

File paths give step-by-step directions on how to reach a file.

Just like when we give directions in the real world, we can give paths that are relative or absolute.

Absolute paths start from a “landmark”, namely the file system root /.

Relative paths start from “wherever we are now”, our current working directory. In the Try This exercise earlier, you saw how to change your current working directory with the cd command and how to find out what is is with the pwd command.

  • If a path starts with a ‘/’, it is absolute.

    • Later we will see that an absolute path can also start with ‘~’.

  • If a path starts with anything besides ‘/’ (or ‘~’), it is relative.

Example 2: Try This

Log in to your Unix account.

Give the commands:

cd /usr
pwd

Is that cd command using a relative or an absolute path?

Answer

The command ls is used to list the files contained in a directory. With no path in the command, it lists the contents of the current working directory. We can also give it one or more paths and, if those paths lead to directories, it will list the contents of those directories.

Give the commands:

ls
ls /usr

Why do these commands produce identical output?

Answer

If we are following directions in the real world, we often walk a little way, then stop and look around to be sure that we are in the right place, then follow a little more of the directions, stop and look around, and so on.

We often use relative paths in Unix commands to accomplish much the same thing.

Try the command:

file /usr/include/net/ethernet.h

Good enough, but how did I even know that file was there? For that matter, how did I know that the directory /usr/include/net was there? And how did I know that I was typing it correctly with no misspellings or other goofs?

In many cases, I would approach it step by step, using relative paths to move one directory at a time.

Try the commands:

cd /
pwd
ls

How do I know which of these are directories and which are ordinary files? We could use the file command, but there’s a nice shortcut available in ls. The -F option will attach a punctuation character to the end of “unusual” files to tell us what they are:

  • a ‘/’ on the end of directories
  • a ’*’ on the end of commands and programs that can be executed
  • a ‘@’ on the end of “symbolic links” – we won’t use these in this course, but they are a kind of shortcut tunnel from one directory to another.

Just remember if you use this option that the appended punctuation is not really part of the file name!

Continuing on, try the commands:

ls -F
cd usr
pwd

Why does usr work in the cd command above?

Answer

Continuing on, try the commands:

ls -F
cd include
pwd
ls -F
cd net
pwd
ls -F
file ethernet.h
file /usr/include/net/ethernet.h

Notice how each cd command adds another link to your current working directory.

If we know the absolute path to a file, we can get there immediately. Otherwise we can step our way, one step at a time, until we get where we want.

Now, there’s lots of intermediate possibilities in between those two extremes. For example, try these commands:

cd /usr/include
ls -F
ls -F net
file net/ethernet.h

3.5 Abbreviating Paths

There are some common abbreviations that can be used to shorten paths.

Example 3: Try This

Try the following commands. See if you can predict what each pwd command will print.

cd ~
pwd
ls -F
cd /usr
cd include
pwd
cd ..
pwd
ls -F
cd /usr/include/..
pwd
cd /usr/include/.
pwd
cd /usr/../home
pwd
ls -F
cd ~
pwd
cd ~cs252
pwd
cd ~
pwd
cd ../cs252
pwd
cd ../../usr/include
pwd
cd ./.
pwd

Any surprises? Any results that you just could not explain?

Some miscellaneous notes:

4 File Systems on Other Operating Systems

Much of what we have covered here is actually applicable to other operating systems as well. All operating systems have files, both text and binary. All operating systems have directories, though they may be called “folders” instead.

And proficient use of any operating system will eventually require you to work with paths in that operating system.

4.1 MacOs File Systems

If you do not have access to an Apple OS/X or macOs PC, skip to the next section.

The macOs is a Unix operating system, so it should not be surprising that almost everything we have looked at will carry over directly to Macs.

In fact, the only difference is that in most Unix systems, your home directory would be /home/yourLoginName/, but in macOs your home directory is /Users/yourLoginName/

Example 4: Try This

Open a Terminal window on your Mac. In that window, try the following commands. Try to predict what each pwd and ls is going to show you.

cd /
pwd
ls -F
cd usr
pwd
ls -F
cd /Users
pwd
ls -F
cd ~
pwd
ls -F

Now, it’s true that you can usually use the Finder to examine your directories and files with less effort than working through the command line.

But this course is “…for Programmers”, and there are many tasks that programmers, unlike more casual Mac users, will need to perform that involve paths and other command line concepts.

4.2 Windows File Systems

If you do not have access to a Windows PC, skip this section.

Windows is the only commonly used operating system today that is not a Unix variant, so it will have lots of differences from Linux. Still, it has files, directories (folders), and paths.

The major differences to watch out for:

Example 5: Try This

Open a CMD window on your Windows PC. (Click the Start/Windows button on the left end of the task bar and type cmd, then hit enter.)

In that window, try the following commands. Try to predict what each cd and dir is going to show you.

cd
dir
cd \
cd
dir
cd \Windows\Help
cd
dir
cd ..
cd
dir
cd \Users
cd
dir
cd yourWindowsLoginName
cd
dir

Now, it’s true that you can usually use the File Explorer to examine your directories and files with less effort than working through the command line.

But this course is “…for Programmers”, and there are many tasks that programmers, unlike more casual Windows users, will need to perform that involve paths and other command line concepts.

5 Commands Glossary

Command Example Explanation
file path file foo.txt Indicate what kind of data is stored in a file.
cd path cd ~/playing change your current working directory
ls ls list the files in your current working directory
ls path ls /usr list the files at that path
ls -F path ls -F ~ list the files at that path with simple indicators of whether they are directories, links, executable commands, or just “ordinary” files
pwd pwd prints your current working directory

1: It may be more precise to say that this directory’s name is the empty string "".

2: As we will see, one almost never needs to type an entire file name in a Unix command, so long file names are no harder to work with than short ones.