CS 771/871 Operating Systems
[ Home | Class
Roster | Syllabus | Status | Glossary | Search
| Course Notes]
Distributed File Systems
File Systems and Operating Systems
QUESTION: Which of these layers belongs in
the OS?
Different File System Architectures
- Embed all layers in kernel
(Mainframe Approach)
- Embed bottom two layers (UNIX
approach)
- Only bottom layer, other
layers provided by library
Problem: no one owns the logical disk,
must trust user programs not to overwrite other users
data
- Only bottom: server provides
other layers
- servers own logical
disk
- all requests must go
through server
- Different servers can
provide different file system architecture
and semantics
Design Issues
- Centralized vs Distributed
Data
- The common tradeoffs
exist
- Consistency of global
state is major difficulty in distributed
approaches
- If distributed,
duplications or division
- Naming (Directory services)
- Tree
- Directed Acyclic Graph
(DAG)
- Graph
- Forest
- Symbolic Links (File
System Pointers)
- File Sharing
- UNIX semantics (read
after write returns data just written).
Strict time ordering
Process management (e.g. pipes) assumes
passing of environment including open file
pointers
- Session semantics
(updates only visible at end of session).
Can have race conditions
- Immutable Files (Write
once - like CD-ROMs)
- Transaction oriented
(semantics guarantees serializable)
- Server stateless
Observer File Usage Patterns
- Small files (less than 10K)
- Reading more than Writing
- Access sequential, random rare
- Most files short lived
- Sharing rare
- process uses only a few files
- distinct file classes
If these patterns prove universal
in distributed systems, then can exploit
(see later discussion
on new hardware implications)
Comparison Stateless and Stateful Servers
Advantages of Stateless Servers |
Advantages of Stateful Servers |
Fault Tolerance |
Shorter request messages |
No Open/Close needed (less set up) |
Better performance with buffering |
No tables needed for state |
Readahead possible |
No limits on "open" files |
idempotency easier |
Client crashes no problem |
File locking possible |
Caching In Uniprocessor
- Cache in User Process
- Manage pool of
buffers, try there first
- Tailor to usage
patterns
- Avoid some system
calls
- Cache in Kernel (UNIX)
- Can manage pool of
buffers over all processes (better
utilization of memory)
- Can share buffers
between processes (UNIX semantics)
- Cache server: user process
which caches for all
- Simpler programming
than first option
- Can tailor
- more overhead
- No cache
Caching In Distributed
Server
The above cache-ing approaches can
be done on either client or server.
- Server caches avoid disk
access but incur network overhead.
- Client caches lead to cache
consistency problems
Write Through |
Works, no help for reads |
Delayed Write |
Ambiguous semantics |
Write on close |
Session semantics |
Centralized Control |
UNIX semantics, centralized problems |
Replication
- Increase Reliability (data and
processor redundancy)
- Split workload
Replication can be:
- Explicit (controlled by
programmer)
Directory may permit multiple file handles to be
associated with a name
- Lazy replication: copies made
by system in slack time (like system backups)
- Group communications:
broadcast requests to all processors having copy
(usually just done for write)
Primary Copy Replication
- Writes sent to primary server
- Server write intention to
stable storage
- Primary orders secondary
servers to update
- If primary crashes, reads
stable storage and continues update
Question: What to do if secondary crashes?
Notice similarity to commit.
No updates when primary is done
(but still can read from secondaries)
Voting Algorithms
- Majority must agree to update
- Updates are assigned unique
version numbers
- Reads first request version
number
- Read OK if majority returns
same version number
Gifford's Algorithm
- Read Quorum Nr, at least Nr
must agree on version
- Write Quorum Nw, al least Nw
must agree to update new version
- Nr + Nw > N
Consider the following cases:
- Nr = 1, Nw = N. All servers
must agree to write, can read from anyone
- Nw = 1, Nr = N, No reads until
all servers updated (eventually)
- Nr = N/2, Nw > N/2, just
majority algorithm above
- Nw = N/2, Nr > N/2, Again
must wait for eventual update
- Nr small, Nw nearly N, best
for more common reads
.
Exploiting New Hardware: Matching
Usage Patterns
Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 09, 1996.