File System Implementations

Traditionally

Devices:

Tapes(usually back-up only)

Disks

Flash Memory

File system:

         Application Programs
                |
                V
         Logical file systems  <--- As provided by OS; some OSs provide
                |                   several types, some few.
                V
         File organization modules <- again, variety depends on OS
                |                     e.g.  provides "virtual
                V                     files" -- may be either in
         Basic file system            memory or in disk -- up to OS
                |
                V
         I/O control
                |
                V
         physical devices
System usually maintains an open-file table: list of currently open files, with permissions and locations.

OS usually provides mount commands: ties file names to physical device.

Disks usually several recording surfaces on drive. Disk space divided into

sectors (several/track; ~1k to 4k in size)
tracks (several/cylinder with around 30 sectors/trk)
cylinders (several/drive; ~200 to 36000)

I/O transfers done in multiples of sectors; addressing by cylinder number, surface (track) number, & sector number.

Files divided into blocks, with blocks usually same size as sector.

Disk Allocation Methods

Contiguous -- blocks allocated sequentially. Blocks from same file are contiguous.

Can talk about space allocation using just contiguous memory allocation: first fit, best fit, worst fit. And same fragmentation problems.

Linked -- Last several bytes of each block contains location of next block (usually).

Indexed -- File starts with index block which contains address of each block in file.

Directories

(and free space management)

Disk contains special files (often at sector 00001) which contain table of files on the disk. May also contain a "bad sector table" (since the technology often produces a record surface with tiny blemishes -- hence spots that can't be used). Disk is usable unless this first sector is flawed since the file table and back block table must be readable. (On many systems, the "format" command checks each sector -- write something then see if it can be read), and build these tables.)

Free space list

Directories

UNIX: consider the following partial output of a ls -li command:

 41345 drwxr-xr-x 20 price    daemon       1024 Mar 26 13:57 .
     2 drwxr-xr-x  7 root     wheel         512 Jul  7  2016 ..
 41396 -rw-------  1 price    staff         264 Jul  7  2015 .Xauthority
 41397 -rw-r--r--  1 price    staff         262 Sep 25  2016 .Xdefaults
 41397 -rw-r--r--  1 price    staff         905 Feb 23 18:53 .cshrc
  2435 -rw-r--r--  1 price    staff          63 Mar 20  2017 .emacs
 41349 -rw-------  1 price    staff         596 Mar  8 15:06 .history
 41372 -rw-r--r--  1 price    staff         587 Jul  8  2017 .login
  3648 -rw-r--r--  1 price    staff         588 Feb 15 18:11 .xinitrc
 41348 -rw-r--r--  1 price    staff         517 Feb 17 19:04 .xplaces
 40198 drwxr-xr-x  2 price    staff         512 Mar 12 16:53 bin
 26778 drwxr-xr-x 14 price    staff         512 Jun 30  2016 classes
  9758 drwxr-xr-x  2 price    staff         512 Feb 18 13:01 correspondence
 24395 drwxrwxr-x  2 price    staff         512 Aug  1  2017 mailfolders
  6178 drwxr-xr-x  4 price    staff         512 Jan 21 18:24 misc
 35352 drwxr-xr-x  3 price    staff         512 Feb 18 13:00 misc.sailing
 12160 drwxr-sr-x  6 price    staff         512 Jul 28  2017 papers
  6165 drwxr-xr-x  2 price    staff         512 Jan 15  2017 presentations
 20726 drwxr-xr-x  6 price    staff         512 May 30  2017 proposals
 25592 drwxr-xr-x  3 price    staff         512 Mar 26 13:57 research.projects
 31695 drwxr-xr-x  7 price    staff         512 Sep 29  2017 src
   ^     ^  ^  ^   ^  ^         ^            ^        ^      ^
   |     |  |  |   |  |         |            |        |      |
   |     |  |  |   |  |         |            |        |       - name of file
   |     |  |  |   |  |         |            |         - last chg date (UNIX 
   |     |  |  |   |  |         |            |           also keeps create and
   |     |  |  |   |  |         |            |           last reference dates)
   |     |  |  |   |  |         |             -- size (in bytes)
   |     |  |  |   |  |          -- group owner
   |     |  |  |   |   -- owner
   |     |  |  |    -- number of directories in which this file appears (so OS
   |     |  |  |       knows when it can delete the file.  You can delete from
   |     |  |  |       your directory, but others may still have it in theirs.
   |     |  |   -- Others permissions.
   |     |   -- Group permissions.
   |      -- Owner permissions.
    -- i-number: index into i-node table.  If two directories share a file, then
       the two files have the same i-number.
Permissions changed with chmod, Group owner changed with chgrp. Groups on system contained in /etc/group file. Files are put in multiple directories with ln (for link) command.

OS responsible for enforcing "consistency semantics". If several users are using the same file at the same time, what happens? If all users are on the same machine, this is nontrivial, but easier than when users are on different machines accessing the file over a network. This is tricky because pieces of the file are sitting in buffers (with generally more and bigger buffers or caches on networks) to improve performance. More on this when we get to distributed systems.

Disk Scheduling

On a PC, disk scheduling is not a consideration. However, on a large multiuser system, since I/O tends to be very slow (relative to CPU speeds), and since a single drive may have a list of outstanding I/O requests, the order in which the requests are serviced can significantly effect performance. CPU speeds are significantly higher the disk drive speeds so that sufficient time exists or reorder requests to improve performance. What we are doing is potentially changing the movement of heads (since sectors cannot be reordered in response to read requests). Different strategies effect the amount of head motion.

The simplist approach: FCFS, i.e., don't reorder. Appropriate for single user systems.

SSTF (shortest seek time first) -- move the heads to the closest track with an outstanding I/O request. As new requests arrive, read order is changed accordingly. Disadvantage: can lead to starvation.

SCAN -- to avoid starvation of SSTF, once heads start moving in one direction keep going in that direction.

C-SCAN -- to provide more uniform response times, always move heads in one direction, then reset.

C-LOOK -- strict SCAN means that once head starts in a direction, continues until it gets to end (the innermost or the outermost track). LOOK modifies this by stopping motion on one direction if no more requests remain in that direction.

FIFO (FCFS) is appropriate if individual drives (many systems will have several drives) generally have few outstanding requests, as on a PC. C-SCAN and C-LOOK appropriate for heavily loaded drives.

Swap-Space Management

Disks are also used for memory management (for paging, swapping, or paged- swapping). Since this so effects overall system performance, it is given special consideration. Sometimes, swap space has its own preallocated disk partition, so no directories to traverse, maintain. Produces improved speed, but space is permanently allocated, and if sometimes too small, the OS will run out of memory. Changing its size will require halting the system and rebuilding the file system from backups, then rebooting.

Performance

Disk speeds (as measured by the rate at which data can be written or read) can be improved by use of disk striping: each block is divided into several subblocks which can be transferred in parallel to separate drives. This technique means that, for example, a system designer might choose to use several slow speed (and hence cheaper) drives instead of one high speed drive -- and still provide high speed disk transfers.

This technique can also be used to improve reliability of disks (a common weak point in system reliability) by adding parity bits and spreading each byte across several drives in parallel (so that no speed penalty results). If a single drive fails, then the parity can used, along with the good data from the other drives, to compute the failed bit.


Index Previous Next

Copyright ©2014, G. Hill Price
Send comments to G. Hill Price