Fall 2000: CS 771/871 Operating Systems
[ Home | Class Roster | Syllabus | Status | Glossary | Search | Course Notes]
Amoeba Distributed Operating System
System Architecture
- Location Transparent Non-dedicated
Processor Pool
(possibly different architectures)
- X-terminals dedicated to
individual users
- Set of Services (some dedicated,
some dynamic)
- Underlying network (LAN or WAN)
- Microkernel + CLIENT/SERVER = OS
Microkernel
- Manages processes and threads
(with synchronization)
- Low-level memory management
(Segments)
- Communications (RPC and group)
- Low-level I/O (But using
client/server model)
Client Server Model
- Objects managed by servers
- Objects accessed by capabilities
- General approach supporting
- files
- directories
- windows
- memory
- processors
- I/O devices
- Files are immutable (bullet server)
.
Capabilities
CREATE-OBJECT RPC to Object server
returns capability
- Server Port
- logical address of service
(not a machine address)
- Object
- Like Unix i-node
- Rights
- Particular to an object
type
- Check
- Validates capability
Object Protection
- On CREATION, object is assigned a
random Check kept
at object server and put in capability.
- Rights field is all bits set
(OWNER CAPABILITY), returned to client
- To RESTRICT capability, owner
sends it back with bit mask
bit mask XOR with RIGHT field
- Server XORs new rights mask with
old check field and passes through a one way function.
This is new check field.
- The server returns new capability
with new rights and check field to client, which may be
safely passed to another process.
QUESTION: How safe is this?
How do we know it is owner trying to restrict the capability?
Can another process make up a capability with owner rights?
Can rights be increased?
Can an impostor steal the owner's capability?
Can a process with a restricted capability, increase the rights?
Is XOR reversible?
Give an example of a one way function.
Standard Operations on Objects
Age |
Perform Garbage Collection Cycle |
Copy |
Duplicate and return capability to new
object |
Destroy |
Destroy and reclaim storage |
Getparams |
Get parameters |
Info |
Get ASCII string describing |
Restrict |
Produce restricted capability to this
object |
Setparams |
Set server parameters |
Status |
Get status from server |
Touch |
Pretend object was just used (garbage
collection) |
NOTES:
- Age/Touch used in garbage collection of orphaned objects (like a LEASE)
- Copy is done at server, no object-related traffic to
client needed,
can have remote machines as targets
- Get/Setparams: used by administrator to control object
manager
Process Management
- Process is an object
- Parents get capability to child
process objects
- suspend
- restart
- signal
- destroy
- Differs from UNIX clone method of
fork and exec
- Three Levels
- RPC to process server
kernel thread on specific machine
- library functions which
call RPC
- Run Server which finds a
processor
- Process Descriptor
- architecture
- owner's capability (for
reporting)
- memory segments
- thread descriptors
- PC
- register save area
- stack pointer
- other state info
/*
** Process descriptor.
** This is followed by pd_nseg segment descriptors (segment_d),
** reachable through PD_SD(p)[i], for 0 <= i < p->pd_nseg.
** The index in the segment array is also the segment identifier.
** Following the segments are pd_nthread variable-lenght thread descriptors.
** Sample code to walk through the threads:
** thread_d *t = PD_TD(p);
** for (i = 0; i < p->pd_nthread; ++i, t = TD_NEXT(t))
** <here *t points to thread number i>;
*/
typedef struct {
char pd_magic[ARCHSIZE]; /* Architecture */
capability pd_self; /* Process capability (if running) */
capability pd_owner; /* Default checkpoint recipient */
uint16 pd_nseg; /* Number of segments */
uint16 pd_nthread; /* Number of threads */
} process_d;
- API
- EXEC (capability of
process server, process descriptor)
- GETLOAD (of a processors)
- STUN
- normal: terminate
outstanding RPCs
- emergency: stops
immediately, RPC's are orphans
- NEWPROC: high level APi
builds process descriptor from binary file name,
argument and environment
Thread Management
- Initially one thread, but can
start any number of additional ones
- GLOCAL variables (locally global
to that thread)
- Synchronization by
- SIGNALS (asynchronous
interrupts)
- MUTEX (binary semaphore),
time-out LOCK
fair
- SEMAPHORES, counting with
a time-out WAIT
Memory Management
- entirely in physical memory
- no page faults
- can read/write directly
into user space
- Assumes cheap main memory
- segments contiguous in address
space
- segments are objects
- any number, any where
- Any process with capability to
segment could read/write it (with proper permissions)
- shared memory
communications (need not be on same machine)
- main memory file server
Communications in Amoeba
Address is a 48 bit randomly chosen
number by thread
this is the first field of a capability.
RPC, point to point,
request/block/reply
- get_request(&header, buffer,
bytes)
(server listens to port), header Contains the PORT (6
bytes)
- put_reply (&header, buffer,
bytes)
server sends reply to send
- trans (&header1, buffer1,
bytes1, &header2, buffer2, bytes2)
send message from client to server
To prevent impersonating a server,
ports are assigned in pairs
- get-port (private) known only to
the server
- put-port (public) used by any
client
These are related by a one-way
function.
put-port = F(get-port)
Since get_request uses
get-port, an impostor cannot issue one in place of a server.
Group Communications
Closed groups, but anyone can send RPC
message to any member for group broadcast.
- CreateGroup: specify degree of
fault tolerance
- JoinGroup: includes greeting
message to existing members
- LeaveGroup: includes good bye
message to existing members
- SendToGroup: atomic reliable
broadcast with total ordering
implements sequential consistency model
- ReceiveFromGroup: blocks waiting
for message
- ResetGroup: reestablishes group
with minimum number of members
Reliable Broadcast: Initiation
- User process traps to kernel,
passing message
- kernels blocks user process
- kernel sends point-to-point
message to SEQUENCER
- kernel message contains
unique number to detect duplicates
- also contains number of
last broadcast message received by kernel
(piggybacked acknowledgment)
- starts timer
- Sequencer
allocates next message number and broadcasts message
- 1H[ seeing broadcast, sending
kernel
- stops timer
- unblocks user process
Failures Modes:
- Sending kernel times out because
sequencer did not receive
Just resends
- Sending kernel times out because
kernel did not receive broadcast
- Resends as above
- Sequencer notices
duplicate request from unique number
- Sequencer notifies sending
kernel only that all is OK
- Sending kernel sees wrong
broadcast
- graciously accepts that it
was beaten to the sequencer
Reliable Broadcast: Sequencer
- Checks unique number to catch
retransmission
- If retransmission, just notifies
sending kernel all is OK
- If new,
- Updates sequence number
- assigns to this broadcast
- stores message in history
buffer
- updates acknowledge state
of sending kernel
- broadcasts message
- sends message to an
processes on this processor in that group
Reliable Broadcast:Receiving Kernel
- Compares sequence number to last
one received
- If exactly one higher, the accepts
- If process in group is
waiting, then copies into user process address
space and unblocks
- If process not yet
waiting, buffers message
- NOTE: there may be several
processes in that group on this machine
- If message is out of synch
(sequence number too high).
- sends point-to-point
message to sequencer notifying of lost message(s)
- sequencer transmits lost
messages from history buffer.
Reliable Broadcast: History Buffer
- To prevent overflow, sequencer
needs to delete old messages
- If all kernels involved in this
group have acknowledge message "k", then
sequencer can discard all messages from 0 to k.
- Normally piggy back Ax keep status
reasonably up to date
- If no traffic out, processor sends
status periodically
- RequestforStatus by sequencer can
also be used in rare cases
COMPLEXITY of GROUP COMMUNICATIONS:
slightly more than 2, increasing slightly with N
Fault Tolerant Group Communications
- Processor crash discovered by lack
of ACKs by some processor
- All subsequent group
communications on that processor fail
- User process getting error return,
calls ResetGroup
- Phase one, elects coordinator
- Upon ResetGroup, kernel
sends message to all member kernels inviting
participation in recovery.
- Upon receipt of recover
invitation, processor sends back highest sequence
number seen
- If contention, choose one
with highest sequence number seen
- If still contention,
choose one arbitrarily (highest network address)
- Phase two, coordinator rebuilds
group
- Gets any message it may
have missed into its history buffer.
- Sends Results
message, announcing
- it is coordinator
(and hence new sequencer)
- members of
reformed group
- highest sequence
number seen
- Each member can request
unseen messages from sequencer
- Once ACK received from all
members of new group, sequencer can discard
history buffer and resume.
Fault tolerance of history buffer is
achieved when setting up group by specifying how many machines
maintain a copy ("k" fault tolerance).
To sync "k" copies:
- User process kernel broadcasts
message directly
- Sequencer waits for "k"
lowest-numbered kernels to ACK broadcast
- Then sequencer broadcasts
"ACCEPT" message
- message is "official"
only upon receipt of ACCEPT from sequence (which also
includes the sequence number assigned).
- INVARIANT: ACCEPT messages implies
"k+1" machines have a copy of message
Measurements on 68030 CPUs 10Mbps
ethernet, 800 reliable transmission per second.
FLIP (Fast Local Internet Protocol)
Why another protocol at the network
layer?
- Need to support RPC
- Need to support group
communications
- Process migration should be
location transparent at the address level
- Processes should not impersonate
others
- Support automatic network reconfig
- Should work on WANs
Each Process has a unique randomly
chosen 64 bit FLIP address
this address migrates with process
For security, consists of public and
private parts
Public-address =
DES(private-address)
Use private address as a key to DES
encode bit 0.
Servers listen on private addresses,
but clients send on public ones.
(analogous to put/get ports but at lower level).
FLIP Functions
- INIT: allocate slot in table with
two call-back procedure addresses (interrupt handlers)
- END: deallocates slot
- REGISTER: sets FLIP address
- UNREGISTER: unsets FLIP address
- UNICAST, MULTICAST, BROADCAST: no
guarantees on delivery
- RECEIVE:
- NOTDELIVER: messages sent back to
this machine as undeliverable
FLIP Routing Table
FLIP
Address
|
Network
address
|
Hop
count
|
Trusted
bit
|
Age
|
- Upon receipt of packet
- If new, generates new
entry in routing table
- updates NETWORK ADDRESS
and HOP COUNT
- TRUSTED BIT is managed by
gateways as is HOP COUNT
- AGE is reset to 0 when
packet from FLIP address is received
periodically it is incremented (used to
replacement algorithms)
Locating Put-Ports in Amoeba
Let's look at how client A communicates
to server B.
- When B is created it is assigned a
random FLIP address which is registered with the FLIP
layer
- B does a get_request on its
get_port, traps to kernel
- kernel gets or computes put_port
and notes that this process is listening to the put_port,
blocks B
- A does a TRANS on the put_port,
traps to kernel
- kernel looks up FLIP address for
that put_port
- If not found, RPC layer broadcasts
request to find put_port FLIP
- RPC layers sets timer
- To limit impact over WANs,
sets maximum HOP COUNT to broadcast out to.
- Gateways discard broadcast
which have reached HOP COUNT
else increases HOP COUNT and sends to next
network
- If time-out, then
rebroadcasts with higher HOP COUNT
- At B's machine, RPC layer sends
back FLIP address
- Now A's machine knows network
address at FLIP layer and RPC layer knows FLIP address
- At A, RPC layer sends message to
that FLIP
NOTE: for redundancy, there may be
several processes listening to a put_port
if several respond to a broadcast, RPC layer chooses one.
Separating FLIP from put_ports:
- Allows nonFLIP networks to be used
- protects impostor servers from
listening on a public put_port
- Allows restarts of servers (which
will have a different FLIP but the same put_port)
- Because of new FLIP, RPC can
detect restarts of servers and can abort transaction
This gives AT-MOST-ONCE semantics
- Of course client can just try
again, using new server if it chooses
Amoeba's File System
Amoebas allows arbitrary file servers
to coexist.
Standard file systems consists of
- BULLET server: handles file
storage
- Directory server: maps names to
capabilities
- Replication server: copies files
BULLET server
- Designed for machines with large
primary memories and huge disks
- Files are IMMUTABLE
- Files occupy a contiguous segment
of memory
- Can be swapped to/from
disk in one I/O transfer
- Can be sent to client in
one RPC transfer
- Conceptually, files are created
fully loaded with information
- Also allows UNCOMMITTED files to
be created
- Allows changes until
COMMITTED
- Cannot be seen by other
processes until COMMITTED
- Size must still be known
at CREATION
- Files are accessed by capabilities
not names
Implementation of BULLET Server
- File Table is memory resident
- Pointer to file in main
memory
- Pointer to file on disk
- Length of file
- Object number in capability is
used as index in file table
- Randomly assigned check number
kept in file table must match that in capability.
- Files not in main memory are read
in one access
- Uncommitted files are deleted
after 10 minutes of inactivity.
Garbage Collection
- Every file has a counter,
initialized to MAX_LIFETIME
- Periodically, a daemon asks the
bullet server to AGE the files (by decrementing the
counter)
- Any file whose counters goes to )
is deleted.
- However, another process
periodically issues a TOUCH command for all files in the
directories which resets the counter to MAX_LIFETIME
- In this manner, orphan files are
garbage collected.
Directory Server
Maps ASCII names to capabilities using
directory tables
Can implement different flavors of directory management
Typically, a UNIX like directory service
- ASCII String
- Capability Set: one for each copy
of that object
- Set of columns, one for each
protection domain (shares everything but RIGHTS field of
capability)
- Capability in the directory server
is for one protection domain (column)
- Of course capabilities can be to
other directory objects
(even of a different type of directory service!)
- Allows general graph structures
(which may be better suited to distributed systems
anyway).
- Can access other objects as well
as files
- processor pools
- hosts
- printers
Every user has his on root, so system
looks like a forest from the users point of view.
Directory Server Calls
- Create: returns capability to
directory object
- Delete: deleting a directory or
entry does not delete the object
- Append: adds a new directory entry
in an existing directory object
- Replace: existing entry
- Lookup: given capability of
directory and ASCII string, returns capability set of
object
- GetMasks: return RIGHTS mask for
object
- Chmod: change RIGHTS mask
Implementation of Directory Server
- Each directory object is stored
twice in two bullet files on different bullet servers
- Changes to directories are stored
in new bullet files.
- Primary copy is made first,
background thread makes secondary copy
- After both copies made, old
directory objects are destroyed
- Directory servers themselves are
duplicated on separate disks
Replication Server
- Use LAZY REPLICATION to provide
multiple copies of objects
- Also runs garbage collection
system, tracing through directories to TOUCH all objects
found there
Run Server
- Decides which architecture/which
machine
- Manages a pool of processor,
sorted by architecture
- A program may be compiled for
multiple architectures and so when it is looked up, finds
a directory of executables
- Run Server looks at appropriate
pools
- Using GETLOAD calls, it
knows approximate loads
- Each potential CPU
estimates how much compute power it can spare to
this process (using processor speed and number of
threads running)
- Server chooses processor with
highest available processing bandwidth
Boot Server
- Server interested in being
automatically restarted
register with BOOT SERVER
- BOOT SERVER periodically sends
"are you alive" messages to server process
- If no response after a certain
time, tries to reboot on current processor
- If fails to reboot, chooses new
processor and tries to restart there.
Other Servers
- TCP/IP Server
- I/O Servers
- Time Servers
- Random Number Servers
- Mail Servers
Copyright chris wild 2000.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: 01 Nov 2000
.