CS 771/871 Operating Systems
[ Home | Class
Roster | Syllabus | Status | Glossary | Search
| Course Notes]
Distributed Memory
Multiprocessors and MultiComputers Revisited
- Multiprocessors (share
memory):
- Complicated hardware
- Does not scale well
- Expensive
- architectural design
unified
- memory access
communications model
- Easy to program (have
process semantics)
- well
understood synchronization
- can implement
strict consistency
- Multicomputers (private
memory):
- easy to build (buy as
many as you need of the shelf)
- message passing
communications model
- programming is usually
more difficult (except RPC model)
- Hybrid Models
- distributed shared
memory
- shared variables
- shared objects
Virtual Distributed Memory
- Virtual Memory = not all
memory space need be physically present in main
memory.
- Disk less workstations have
paging store remote
- Could share file server paging
store among a set of processors
- Could distribute the file
server paging store among a set of processors
- Thus distributed processes
could share a virtual address space.
Tightly to Loosely Coupled Memory Architectures
- On chip memory: direct
connection between CPU and memory
could have multiple CPUs on chip sharing memory
- Bus accessed memory with no
cache (no problem) no capacity
- Bus with caches (snoppy
cache).
- Ring Based
- Switched
- NUMA
- Paged
- Shared Variable
- Object Based
Write Through Protocol
- Maintains cache consistency by
performing all writes over bus to memory as well as
cache.
- Other processors see write
request and invalidate or update their copy (if
cached)
- Caching does not speed up
writes.
QUESTION: how
does this fit with expected usage?
What do you need to know?
How could you measure?
Write Once Protocol
- Multiple read caches allowed
(CLEAN)
- First write invalidates
(INVALID) read caches and makes writer owner (DIRTY)
- Subsequent writes are done to
cache only
- Subsequent read from other
processor
- must go to bus
(because cache is invalidated)
- owner intercepts
before memory can satisfy
- owner provides value
from its cache
- owner invalidates its
cache
- Reader's cache marked
DIRTY (it is owner)
QUESTION:
What if another processor writes without reading?
When is memory updated?
Does it ever need to be?
Does this suggest another shared memory protocol?
Ring Based Multiprocessor (MEMNET)
- Address divided into private
and shared.
- Shared memory is distributed.
- Token ring is 160Mbps
- memory is divided into 32byte
blocks
- Each block has a home
processor but can be cached elsewhere
Need not reside on the home processor.
- Multiple read copies but only
one write copy (INVARIANT)
- Each processor has a block
table with
- VALID: if cached on
this machine
- EXCLUSIVE: if write is
allowed
- HOME: is block's home
is this machine
Memnet Protocol
- Read
- If cache VALID, local
access possible
- If not, wait for token
and send request on ring
- Upon receiving token,
processor checks if block cached
- If cached,
puts block in token
- Marks request
as satisfied
- sends token
- clears
exclusive bit (if set)
- Eventually requester
see token again with desired block
INVARIANT: every block exists in at least one
processor
- If space needed, sends
non-homed block home
QUESTION: When? Where? How?
- Write
- If
local exclusive copy, just write
- If
cached for read, send invalidate message with
token
Upon complete circuit, set exclusive bit and
write
- If
not cached, sends request/invalidate message
with token
- First
machine with block, sends it and
invalidates
- Other
machines with copy invalidate
- When
requester receives, copies, marks
exclusive and writes
QUESTION: how
would you evaluate the effectiveness of this architecture?
Gupta/Wild Ring Architecture
- Implements data flow
computational model
-
Each
computation nodes circulates around ring looking for
its data
-
Each Data
packets knows how many computation packets it needs
to populate
-
When a
computation packet has all its data, it is enabled
-
Any idle
processor can execute an enabled computation packet.
-
Data computed
stays on processor waiting for computation packets
which need that value to pass by.
Switched MultiProcessors
When communications channel
saturate, add more (in parallel, as tree, in hierarchy).
DASH
- Cluster = 4 CPUs with snoopy
cache and memory
- Cluster connected by
intercluster bus (MESH)
- memory is distributed with
each cluster holding 16M
- Directory kept at each cluster
knows who else has copy
- state of each block can be
uncached, clean, dirty
- uncached and clean blocks
owned by home cluster
- dirty owned by cluster holding
one and only copy
- See figure 6-8, number of bus
transfers
READ: r = local request, R = global
request, d = local data, D = global data, s = state change to
local home directory, S = state change globally
Block State |
R's cache |
Intracluster cache |
Home memory |
Intercluster cache |
UNCACHED |
NA |
NA |
1R + 1D + 2s |
NA |
CLEAN |
0 |
1r + 1d |
1r+1R+1D+1s |
NA |
DIRTY |
0 |
1r+1d+1D+2s |
NA |
1r+2R+2D+2s |
WRITE: n = number of cached copies
Block State |
R's cache |
Intracluster cache |
Home memory |
Intercluster cache |
UNCACHED |
NA |
NA |
1R + 1D + 2s |
NA |
CLEAN |
2R+1s+nR |
1r+1d+2R+1s+nR |
1r+1R+1D+1s+nR |
NA |
DIRTY |
0 |
1r+1d |
NA |
1r+2R+1D+2s |
Considerable overhead in memory for directories and bus
traffic.
NonUniform Memory Access (NUMA)
Just eliminate caches. Easy but
order of magnitude overhead on remote accesses.
Examples Cm* (Bus), BBN butterfly
(Switched)
Initial location of memory is
critical to performance.
But could remap periodically using
various adaptive algorithms
Consistency Models
Contract which specifies
permissible interactions between competing processors over
communications channels with delays.
Strict Consistency
Any Read to a
memory location x returns the value stored by the most recent
write to x.
Assumes instaneous
communications
QUESTION:
is absolute global time order sufficient?
How about GPS clocks to 100 nanoseconds?
Why
is strict consistency important anyway?
consider two processes on a single processors. Since the
order in which they run is arbitrary, then can make no
guarantees about relative ordering even though absolute total
ordering is possible.
-
So
if I care the order of reads and writes - I should
control explicitly.
If not, not an issue.
-
What
is important is causality perhaps.
-
Consider
that Read and Write are only two
"events" on a memory
-
Any
Read by Process i can potentially affect the
results of a subsequent write by Process i
-
Any
Read of a memory location x is affected by a
previous Write of memory x by any process.
-
People
live with out of date information all the time.
-
What exactly
is a Read and Write? When CPU issues instruction?
When at least one memory is accessed? when all copies
are updated?
Sequential Consistency
The result of any execution is
the same as if the operations of all processors were executed
in some sequential order, and the operations of each
individual processor appear in this sequence in the order
specified by its program. (LAMPORT)
Note similarity to serializability.
Implies that all processes see same
SEQUENCE of memory references
While strict is nearly impossible
on a distributed system, sequential is possible but with a
penalty the r + w >= transfer time between nodes.
Causal Consistency
Not time ordering but information
transfer potential
- write of any variable
following read in one process is potentially linked
- read of some variable
following write from any process is linked
Writes that are pontentially
causally related must be seen b all processes in the same
order. Concurrent writes may be seen in a different order on
different machines.
Other Consistency Models
- PRAM Consistency
- Writes done by a single
processor are received in order issued
Weak Consistency: Need not propagate changes made inside
critical section or atomic action. Synchronization variable:
when synched, all writes propagated out and writes from
others brought in.
- Accesses to sync variable are
sequentially consistent
- no access to sync variable
until previous writes propagated
- no data access until all
previous access to sync variables are performed
Release Consistency: Acquire and
Release actions.
- Before ordinary access to
shared variable, all previous acquires must be
complete
- Before release, all previous
reads and writes by this process must be done
- acquire/release must be
processor consistent
Can be eager or lazy.
Entry Consistency: Associate
a set of shared variables with a synchronization lock. these
locks are owned and have exclusive write permission.
- At acquire, all guarded
variables are brought up to date for that process
- There can be only one process
owning with exclusive mode
- Acquiring non-exclusive mode
implies update by the owner of exclusive mode.
Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 22, 1996.