CS 771/871 Operating Systems
[ Home | Class
Roster | Syllabus | Status | Glossary
| Search | Course
Notes]
Lecture 3 :
Synchronization Issues
Claim:
- Synchronization is unnecessary if you do not share
resources.
(who cares otherwise).
- But without sharing of resources there is no distributed
computing.
(at least the operating system must share something like
the ready queue).
- Almost anything is a resource (any data value can be a
resource).
- Therefore synchronization plays a more prominent role in
distributed systems
- Most centralized synchronization depends on shared memory
(consistent local state).
- Information scattered in system
(lack of global state)
- No common clock
- Avoid centralized solutions
- Time taken to collate
local state
- single point of failure
Question: When
are centralized approaches appropriate?
How can single point of failure be alleviated with centralized?
-
Does anybody really care
what time it is?
-
Race conditions exist on uni-processors also
(concurrent compile edit on same machine).
-
-
- Crystal oscillator vibrating at
nearly constant frequency.
- Counter for number of oscillations
- Interrupt (clock tick) after some
number of counts (set by OS).
- software keeps track of number of
clock ticks
- Add this to known time
- crystal drift causes clock skew
UNIX counts seconds from midnight UTC
January 1, 1970.
Is there a Unix timebomb in your future?
- Astronomical Time (GMT): Based on mean solar second
1/86400th of a solar day.
Problem: Earth's rotation is slowing
Earth's rotation is not constant
- International Atomic Clock (UTC):
very stable. coordinated with astronomical time on Jan.
1, 1958. (9,192,631,770 transitions of cesium 133 atom =
1 mean solar second on that date).
Problem is solar second is changing: leap
second - adjust GMT to use stable atomic clock seconds
but which also has high noon at the right time of solar day.
Official time is broadcast by radio (WWV among
others) and by satellite (GPS).
radio accuracy +-10msec
GPS (100-300 nanoseconds) maybe better with differential GPS
NOTE: the availability of GPS clocks solves
many problems brought up in the text.
References: GPS clock broadcast (local copy)
Internet Time
Based on Seminal work of Leslie
Lamport:
Define "happens before" (->) relationship:
- a -> b if a and b are events in
the same process and a occurred before b
- a->b if a is message sent and b
is same message received
- -> is transitive
Events a and b are causally
related if a-> b or
b-> a
otherwise they are concurrent,
a||b.
QUESTION: what
are the event relationships below?
Let Ci be the clock for process Pi, Use this clock to
timestamp events of Pi.
Sequence generated by Ci is monotonically increasing.
if a->b then C(a) < C(b).
Or
-
C1: for any two events in process Pi, if
a happened before b then
Ci(a) < Ci(b)
-
C2: if a is event of sending message in
process Pi and b is event of receiving that message in
process Pj
Ci(a) < Cj(b)
Implemented by:
-
IR1: Clock Ci is incremented between any
two successive events
-
IR2: A message event a in Pi is assigned
a timestamp tm = Ci(a). On receiving this message (event
b) Pj sets its logical clock to
max (Cj, tm+delta), where delta is a positive increment
(usually 1).
QUESTION: assign logical
clock values to events in above event
diagram.
happens before is a partial ordering
can be turned into a total ordering by assigning order to
processes.
Question: assign a totally ordered clock
to the event diagram
a->b implies that C(a) < C(b), but the reverse is not
true. So cannot tell if two events in different processes are
casually related by looking at their clocks.
However the scheme proposed in ISIS can
resolve this.
Assign a vector of length n = number of communicating process for
each process, where the i-th location contains that processes
current understanding of the value of Pi's logical clock.
- IR1: increment your own clock after each event
- IR2: Pi assigns a message the timestamp = Ci(a) (vector
clock of Pi)
- IR3: on receiving message from Pj with timestamp Tm,
update vector clock Cj
for-all k, Cj(k) = max(Cj[k],tm[k])
Assertion: for-all i,j: Ci[i] >=
Cj[i].
QUESTION: What other assertions can be
made?
If a logical clock deviates from real time (say UTC) by no
more than a specified amount, it is a physical clock.
How to make a logical clock a physical one in a distributed
system is a problem since CANNOT set time backwards (violates
montonicity).
- Periodically (determined by maximum drift rate of
computer clock), synchronize all clocks to time source
(say GPS or WWV) from a TIME SERVER.
- If local time is ahead of real time, than must slow down
incrementing of clock until in sync.
- Measure delay times in the network to send real time HOW?
Time can be set by request from client (Cristian's algorithm)
or by request from server (Berkeley) which polls clients for
local time and sends messages to adjust.
Problem: time server is centralized.
Distributed time: average all local times gathered at periodic
intervals with or without network delay compensation.
Perhaps best solution is to put GPS receivers at all sites.
- Assign each message a unique number
- Receiver remembers all message numbers received
- Receiver discards messages with numbers already seen
But what if receiver crashes and loses message number tables?
When message tables be purged?
- Assign a timestamp and a connection number (chosen by
sender).
- Receiver records most recent time stamp for that
connection
- message with a lower timestamp is discarded
- After time G = CurrentTime - (MaxLifeTime +
MaxClockSkew), discard entry
where MaxLifeTime is how long message can live in the
network
- Write this table to disk periodically
- After crash, reload table, updates G, rejecting any
message generated before crash.
- Why Cache?
- Distinguish read and write caches
- If want to write, have to invalidate all read caches even
if not in use.
- Unless readers have a lease time, old readers cannot
trust cache consistency
- Can renew leases if no one else has requested a write
lease.
- Server can break lease for priority writer.
- If client has crashed, time out by lease value (or
network partition).
Only one process can access a shared resource at a
time (e.g. printer, file, data record).
Can be implemented as critical region.
Question: what is difference between critical
section and resource lock?
Desirable Characteristics of
Mutual Exclusion algorithm:
- Freedom from deadlock
- Freedom from starvation
- Fairness
- Fault Tolerance
Centralized algorithm:
- Send request to enter
critical section to coordinator
- If no one already in, mark
in use and grant request
- If in use, queue request.
- When process leaves
critical region, send message to coordinator to release
- If processes queued, pick
one and grant request.
Centralized is not always
inappropriate.
A single shared resource is inherently centralized - co-locate
coordinator.
- Totally order all events.
- build message containing, critical region, process number
and current time.
- Send to all processes reliably
Question: how can this be
determined?
- A process replies as follows:
- If not in and does not
want, reply OK
- If in, queues request (no
reply)
- If wants to enter, compare
timestamp in request to its own. If lower sends
OK. otherwise queue
- requesting process waits
until everyone sends it an OK message
- When exiting dequeues all
processes and sends OK.
How many messages?
What about failures? before, during and after
What about load (CPU and network)?
Is distributed better?
Distributed Mutual Exclusion Algorithm (Token Ring)
- Form LOGICAL ring
- circulate token around ring
- If token holder, can enter critical region if desired
else pass token on.
Is this fair?
what is max wait?
What if token is lost?
What is machine crashes?
Comparison
of Mutual Exclusion Algorithms
algorithm |
Messages/critical region |
Delay to enter |
Problems |
Centralized |
3 |
2 |
coordinator crash |
Distributed |
2(n-1) |
2(n-1) |
process crash |
Token Ring |
1 to unbounded |
0 to n-1 |
lost token or process crash |
Can also measure throughput, utilization, etc.
Electing new centralized coordinator given a known set of
processes.
Bully Algorithm:
-
Process P notices coordinator no longer responding
-
Sends election message to all processes with higher
numbers
-
If no response, elects itself coordinator and tells
everyone.
-
If higher-up answers, it takes over. P is done
-
When election message received, ACK message and hold
election (if not already in process).
-
Process rebooting holds an election
-
Process P notices coordinator
not responding
-
sends ELECTION message to
successor containing its process number
-
message skips down processes
-
Running processes add their
IDs to the message.
-
When originator sees message
it started, it elects process and circulates results
What if two process notice coordinator crash?
Is process number really what we want to use?
AKA transactions.
All or Nothing semantics
Stable storage (usually implemented as disk mirroring (RAID -
Redundant arrays of Independent Disks))
AA Primitives
- BEGIN_TRANSACTION
- END_TRANSACTION
- ABORT_TRANSACTION
- READ (reversible)
- WRITE (reversible)
- other reversible operations
-
Atomic: indivisible
-
Consistent: no violation of system invariants
-
Isolated: Transactions do not interfere (critical
regions?)
not necessarily deterministic
-
Durable: after commit, changes are permanent
Need to preserve even in face of nested transactions.
Permanence only applies to top level.
Conceptually transaction has copy of shared objects
Does this accomplish the above requirements?
Real copy is expensive
Question: Do we need to copy read-only
objects?
Write only copies blocks actually changed (called shadow
blocks)
Can use Unix's i-node concept to mix private and public
blocks (figure 3-18).
Commit has to atomically update public space.
Changes made to public storage but audit log is kept of old
values
log can be used to rollback or rollforward (after a crash).
Question: How to handle access by other
processes?
Does rollforward violate all-or-nothing semantics?
Atomic commit in a distributed system requires all parties to
agree to commit at the same time (or not to commit).
- One process is coordinator
- coordinator logs intention to
commit
- sends other processes message to
prepare to commit
- Receiving process log ready to
commit and reply (if ready)
- when coordinator receives ready
from everybody, logs commit decision and sends commit
message
- Other processes send finished
message after committing.
- If any process is not ready or has
crashed, coordinator aborts transaction (logs and sends
message to all).
It is the log entry at end by
coordinator which is actual commit.
Question: What
is non-coordinator process crashes and never commits?
is a transaction mutually exclusive?
Concurrency
Control: Locks
-
Read/write locks
-
Granularity
-
Two phase locking
-
Growing Phase: acquire before update, release
all if one unavailable
-
Shrinking Phase
Assertion: if all transactions use two phase locking,
then all schedules are serializable
-
Strict: shrinking is
done after commit(can be part of end_transaction)
avoids CASCADED aborts due to seeing file not committed
-
deadlocks can be avoided by acquiring in some
canonical order
-
Can set time limit on use of resource to detect
deadlock
-
Can build resource allocation graph and look
for cycles
Basically don't have any and detect conflicts only at commit.
If so then abort before commit.
This is deadlock free (see below).
And efficient if probability of simultaneous use is low.
-
Assign using logical clocks to give each transaction
a timestamp at BEGIN_TRANSACTION
QUESTION: how do we synchronize
clocks if we don't know which other processes access
shared resources? Need to know
-
On commit every resource is timestamped with either a
read or write commit time.
-
When accessing resource, check resource timestamp
against transaction one. If transaction one is older,
then abort.
Timestamp Example
Let R be the timestamp of commited
access to resource R.
Let R' be the timestamp of uncommited
access to R
Let Ti be timestamp of transaction
"i"
- If R < Ti, access OK
- If R > Ti, abort transaction
- if R' < Ti, wait till commit
- if R' > Ti, abort
Potentially a serious problem in distributed systems and more
likely to occur because of increased changes for sharing
resources.
-
Ignore (many systems do this)
-
Detect (and recover)
atomic actions permit aborted transactions instead of
aborted processes
-
Prevent (static design away the problem)
-
Avoid (allocate resources carefully) Impractical
Let coordinator keep global resource graph and detect cycles.
Each local processor sends its local resource graph either
- when changed
- periodically
- on demand
Unfortunately, delays in transmission
can cause messages to coordinator to arrive out of order
potentially resulting in a FALSE DEADLOCK.
Using logical clocks, coordinator could
request all processes to verify latest message sent, thus forcing
message to arrive in logical order.
Basic idea is to send a message to processes holding
resources that you are waiting for and see if the message
eventually comes back to you (the cycle).
Messages contain three numbers:
-
Process just blocked
-
Process sending message
-
Process holding resource
QUESTION: should this be resource sending process is
blocked on?
Extend algorithm in book to handle multiple resources
held per process.
Break cycle, by aborting process that initiated message
but what if two processes inquire and find cycle about same time?
Or can record processes in cycle and abort the lowest
priority one
90 % of deadlocks involve only two processes
-
Request all needed resources at same time
-
Request in canonical order
-
Assign each transaction a unique global timestamp and
only kill younger processes which wait on older one (AKA
wait-die)
-
Can take away resources by aborting transaction with
no side effects.
-
Again using timestamps, older process takes away from
younger (by aborting transaction) but younger will wait
for older (AKA wound-wait)
-
Compare wait-die with wound-wait
Critical region can be considered a resource which is locked
(by semaphore, monitor, or coordinator).
Critical region can encapsulate several other resources
(files, devices) which are needed to accomplish a mission. In
this sense helps alleviate deadlock.
Consider a bank with two accounts which are kept on different
machines.
Let's say account A has $1000 and account B has $500.
Suppose the clock on machine containing A is 5 seconds faster
than the clock on the other machine.
Suppose there is a transfer of $100 from A to B.
Is the global state of funds in the bank correct?
Depends on when the state information is gathered. Suppose A
decrements account and sends message to B at 11:10:05 (which is
11:10:00 at B). The message transfer time is 1 second so that at
11:10:01 local time for B the account is credited).
If gathered after transfer message sent from A to B but before
received, then the bank has lost $100 from its global state.
If gathers local state from both machines and uses local
clocks to synchronize to 11:10:03. A's state would be before
debit but B's would be after hence global state shows $100 too
much.
Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 30, 1996.