-
CS 771/871 Operating Systems
[ Home | Class Roster | Syllabus | Status | Glossary | Search | Course
Notes]
Analysis of Processor Pools
Agustin's point about differences
between "N-times" faster processor and processor
pool of N processors is well made.
While the arrival rate is N times the individual rate, the
service rate is not unless the pool is constantly maxmimally
busy (which implies an unstable queuing system). Whenever the
number of requests falls below N, the pool will service them
slower than a fast processor (but faster than N isolated
systems).
Let's analysis N isolated systems
(adapted from Singhal):
Probability of Idle When Waiting
Components of Load Distribution Algorithms
- Transfer Policy
- local based on
thresdholds
- global based on system
load
- Selection Policy (pre-emptive
or not)
- Location Policy
- Information Policy
Comparison of Load
Distribution Algorithms
- M/M/1: distributed system, no
load distribution
- RECV: Receiver initiated.
- RAND: Sender transfer to
random receiver
- SEND: Poll randomly to find
receiver
- SYM
- ADSYM
- ADSEND:
- M/M/K: Ideal load sharing with
no overhead
- Transfer Policy: When task
departs queue and queue length below threshold T.
- Selection Policy: Various at
sender site, probably pre-emptive
- Location Policy: select sender
at random, up to poll limit
- Information Policy: demand
driven by receiver
Algorithm stable under load
Question:
How can receiver initiated avoid preemption at sender?
Goal: maintain load close to system
average
Problem: could cause thrashing
- Transfer Policy: Two adaptive
Thresholds on either side of estimated system average
- Location Policy:
- Sender-initiated
- Sender
broadcasts TOO_HIGH message, sets
TOO_HIGH timeout and waits ACCEPT
message
- Receiver
getting TOO_HIGH, cancels TOO_LOW
timer, sends ACCEPT, increases load
value, sets AWAITING_TASK timer, if
timeout, descreases load value.
- On receiving
ACCEPT, if still sender, selects task
and transfers
- On expiration
of TOO_HIGH timer, ups estimate of
system load, sends CHANGE_AVERAGE
message.
- Receiver-initiated
- Broadcast
TOO_LOW message, sets TOO_LOW timer,
awaits TOO_HIGH message
- If TOO_HIGH
message received, does as above
- If TOO_LOW
times out, decrease estimate of
system load and broadcast
- Selection Policy: various
- Information Policy: demand
driven, adapts to estimated load.
ADSYM: Adaptive Symmetric
Above average does not keep track
of busy and idle processors, just average load. broadcast or
polling therefore may not be efficient.
Keep three lists per processor:
- Overloaded (Senders)
- Underloaded (Receivers)
- OK
Initially all processors are
receivers.
- Transfer Policy: Keep upper
threshold (UT) and lower threshold (LT) on queue
length, update on task arrivals and departures
- Selection Policy: sender only
consideres new tasks, receiver: various
- Location Policy:
- sender-initiated: When
queue length > UT
- (sender) poll
node at head of receiver list.
- (polled node)
put sender at head of sender list
inform sender as to its current
status
- (sender) if
polled node replies receiver, send
task
- otherwise, put
at head of appropriate list, poll
next receiver on list (unless empty
in which case do it yourself)
- receiver-initiated:
When queue length < LT
- (receiver)
Poll from lists in following order
(up to pollLimit)
- sender
(head to tail)
- OK
(tail to head)
- receivers
(tail to head)
- (polled) if
sender, sends tasks and status after
transfer
- (polled) if
not sender, put receiver at head of
list, sends status
- (receiver)
updates polled node's status
- Information Policy: demand
driven
Discussion ADSYM
- High Loads: sender polls
potential receivers, eventually depleting list,
traffic stops awaiting receiver initiated polls
- Low Loads: Receiver initiated
polls fail, but update status at polled nodes
ADSEND: Adaptive Sender
Only non-pre-emptive selection:
Uses sender-initiated portion of
ADSYM algorithm.
Has an additional data structure
STATEVECTOR (one location for each node) which records which
list this node is on at every other node.
When sender polls, it additionally
update STATEVECTOR for the polled node to reflect sender
status.
Polled node updates its STATEVECTOR
for the sender's list assignment.
Receiver algorithm:
- On changing to receiver, sends
message to all nodes which need to know this (based
on info in STATEVECTOR).
Avoids broadcast Question: is
this bad?
* Shivaratri, Krueger and
Singhal," Load Distribution in Locally Distributed
Systems", IEEE Computer, Vol. 25, no 12, Dec. 1992, pp
33-44.
Assumptions and Parameters:
- Average service demand is 1
unit of time
- arrival rates independent
exponential distributions
- homogenous load
- 40 identical nodes
- UT=LT=1
- PollLimit = 5 (P(1-P)**i gives
dimishing returns)
Results Figure 1
- Anything better than no load distribution
- Receiver better than sender under high load
Results Figure 2
|
Better than sender at high
still unstable at high loads
Better than receiver at low
|
Results: Figure 3
|
Adaptive slightly better (SYM is
adaptive)
ADSYM approaches ideal
ADSYM similar to RECV under heavy loads but better
on light loads
|
Results: Figure 4
offered system load of .85 but by only a subset of
processors (heterogeneous load)
- RECV very unstable (random probes unlikely to find
work)
- SEND also unstable, can't get rid of load fast enough
- ADSYM very good
Faults and Failures
- Fault:
- malfunction: (due to design
error, manufacturing flaw, physical damage, hardsh
environment, unanticipated operating conditions, etc)
- Failure:
- output not meeting
specifications
- Transient
- Intermittent
- Permanent
Mean Time to Failure (MTTF)
If p is the propability of failure,
then the expected time to failure is
sum(k=1 to infinity) kp(1-p)**(k-1)
= 1/p
A 1 in a million chance of failure
in a second gives a mean time to failure of 11.6 days.
The estimated failure probability
of a airframe in a coast to coast flight is 10**(-9).
System Failures
Complex systems are the mercy of
component failures:
- If any single component can
bring the system down equally, then probability of no
component failure is (1-p)**n with n components
- Fail-silent faults (no output)
- Byzantine faults (faulty
output)
Other concerns are whether
communications delays can be bounded (called synchronous in
some circles) or not (asychronous).
Redundancy
- Information: parity, CRC,
hamming code
- Time redundancy (for
intermittent and transient faults) time-out try again
- physical redundancy (extra
resources)
- Active replication
- Primary backup
Issues:
- Degreee of replication
- average and worst case in
absence of faults
- average and worst case with
faults
Active Replication
If only two can detect faults, but
no fix
TMR (triple Modular Redundancy):
uses three and votes output. (figure 4-21)
Degree of Replication
K fault tolerant: if can survive K
faults:
- If fail-silent need K + 1
copies
- If Byzantine, need 2K + 1
copies and vote.
Atomic Broadcast Problem
Replicates need to process messages
in same order.
- Number globally (failure of
number server a problem)
- logical clocks to timestamp
(but reciept of a message does not tell you if any
messages were missed).
Question:
Does vector timestamps solve this?
Primary Backup
figure 4-22
- When to switch
- backup needs to know state
- Can create it by
mirroring computations
- Can access from disk
(assuming all state is committed there)
disk has dual ports (can be mirrored).
Agreement in Presence of Faults
Issues:
- Faults in processors or
communications or both
- Messages authenticated or not
- Communication delays bounded
(synchronoues) or not
- Faults fail-silent, omitted
messages or byzantine
- Metrics:
- Time
- Message traffic
- Storage
Two Army Problem
- Two processes must agree to
undertake a coordinated action
- Neither process may proceed
unilaterally (agreement is necessary)
- Communications is unreliable
It can be shown that agreement in
the presence of an unreliable channel is impossible.
Byzantine Agreement
- N processes which must all
agree, initial value broadcast by a source processor
- Communications is reliable,
synchronous (all process communicate in a round) and
authenticated.
- Some processes may have
byzantine faults
- AGREEMENT: All nonfaulty
processors must agree on the same value
- VALIDITY: If the source
processor is nonfaulty, then all nonfaulty processors
must agree upon the value sent by the source
initially
Impossibility of Agreement with Three
Processors
Let p0, p1, p2 be the three
processors, let p0 be the source. The message is one bit.
- Case 1: p0 is not faulty,
assume p2 is faulty.
- Suppose p0 broadcasts
initial value 1.
- p1 will rebroadcast 1
to p2.
- p2 will broadcast 0 to
p1.
- p1 has conflicting
information and should choose p0's value
- Case 2: p0 is faulty.
- Suppose p0 sends 1 to
p1 and 0 to p2.
- p2 sends 0 to p1 while
p1 sends 1 to p2.
- p1 again has
conflicting info, just like case 1 and like
case 1 must choose value sent from p0
- p2 following same
algorithm as p1 must choose 0 so agreement is
impossible
Figure: Impossibility of 3
Pease showed that it is impossible
to reach consensus if the number of faulty processors (m) is
greater than floor((n-1)/3) and the showed an algorithm
requiring (m-1) rounds of message exchanges
Agreement with Four Processors and One Faulty
Example Seven Processors, Two Faulty
Suppose P0 and P4 faulty:
Case 1: P0 sends a different value
to all processors
They quickly agree that the value
is unknown (no majority even with P4 malicious).
Case 2: P7F2
Case 2: P0 sends 1 to P1, P2 and P3
and 0 to P3, P4 and P5
- Let P4 transmit 1 to P1 P2 and
P3 forcing majority to 1 and let it send 0 to P4 and
P5, forcing majority to 0. No consensus (P4 is
forcing different consensus within two distinct
groups (those who initially received 0 and those who
received 1).
Lamport-Shostak-Pease Algorithm
Define set of deciders - initially all but source
Source sends value to set of deciders
For 3m+1 or more processors execute m+1 rounds of message
passing as follows:
-
Set value to that received from source
-
Remove yourself from decider set received
-
Send value to set of deciders
-
Await replies for each decider, send majority (if
exists) to your source (if any)
Complexity is O(n**m).
Lamport-Shostak-Pease Algorithm: Example Case
2
Round 1: P0 sends 1 to P1, P2 and
P3 and 0 to others
- Round 2: P1 sends 1 to
deciders {P2..6}
- Round 3 for P1:
majority to agree on what P1 received
- P2 sends 1 to
deciders - P2
- P3 sends 1 to
deciders - P3
- P4 sends ? to
deciders - P4
- P5 sends 1 to
deciders - P5
- P6 sends 1 to
deciders - P6
- Round 3P1:
deciders pick majority of values
received in Round3P1
Since P4 in minority and rest tell
truth, all non-faulty pick value 1 as
that sent by P1 in round 2
- Round 2: similar for P2,P3,P5
and P6: all non-faulty processors will agree on
values received by non-faulty processors
- Round 2: P4. Two cases. P4
only lies a little and sends same value to at least
three processors. All non-faulty will agree to
majority value
Case 2: P4 lies a lot sending no majority the same
value. non-faulty agree that value from P4 is
unknown.
After all complete round 2. all
non-faulty will have same value from all other processors.
Now can pick majority from
consistent set of values.
Interactive Consistency Problem
All non-faulty processors must
agree on a vector of values, each processor contributing one
element of the vector.
VALIDITY: if Pi is non-faulty than
the i-th element of the vector must be the initial value
broadcast by Pi, otherwise it doesn't matter what value is
agreed upon.
Fischer proved that with
asynchronous processors and unbounded delays, no agreement if
possible even if one processor is faulty (and even if
fail-silent).
Example Figure 4-23: Original Algorithm
Step 1:
- P1 broadcasts "1" to
others
- P2 broadcasts "2" to
others
- P3 broadcasts "x",
"y", "z" to others
- P4 broadcasts "4" to
others
Step 2:
- P1 forms vector (1,2,x,4)
- P2 (1,2,y,4)
- P3 (a,b,c,d) for P1, (e,f,g,h)
for P2, (i,j,k,l) for P4
- P4 (1,2,z,4)
Step 3: all broadcast this vector
Step 4: Each form majority in i-th
location (or ? is no majority)
Books claims this results in agreement, but what if c = z, g
= x, k = y
- P1 (1,2,z,4)
- P2 (1,2,x,4)
- P3 (-,-,x2,-) to P1,
(-,-,y2,-) to P2, (-,-,z2,-) to P4
- P4 (1,2,y,4)
No agreement, go another round,
send vector, but what if x2=x, y2=y and z2=z? (P3 keeps
switching its vote to confuse the others.)
Example Figure 4-23: Modified Algorithm
Solution: Don't accept Pi's value
from Pi more than once (force consistency of Pi's value from
Pi to Pj. This in effect forces c=x=x2, g=y=y2, k=z=z2.
Step 1:
- P1 broadcasts "1" to
others
- P2 broadcasts "2" to
others
- P3 broadcasts "x",
"y", "z" to others
- P4 broadcasts "4" to
others
Step 2: (blanks denote no value
need be sent)
- P1 forms vector ( ,2,x,4)
- P2 (1, ,y,4)
- P3 (a,b, ,d) for P1, (e,f, ,h)
for P2, (i,j, ,l) for P4
- P4 (1,2,z, )
Step 3: all broadcast this vector
Step 4: Each form majority in i-th
location (or ? is no majority) using value sent from Pi
previously in i-th location of Pi's new vector
Two cases, x, y and z are different
- P1 (1,2,?,4)
- P2 (1,2,?,4)
- P3: who cares
- P4 (1,2,?,4)
x = y but not z (and symmetrical
cases of two agree)
- P1 (1,2,x,4)
- P2 (1,2,x,4)
- P4 (1,2,x,4)
If x=y=z, then can't catch a
consistent liar.
Notice this is four simultaneous
runnings of the Byzantine agreement discussed earlier.
Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 01, 1996.