Lecture 3

CS 771/871 Operating Systems

Lecture 3 : Synchronization Issues

Claim:

Synchronization is unnecessary if you do not share resources.
(who cares otherwise).
But without sharing of resources there is no distributed computing.
(at least the operating system must share something like the ready queue).
Almost anything is a resource (any data value can be a resource).
Therefore synchronization plays a more prominent role in distributed systems
Most centralized synchronization depends on shared memory
(consistent local state).

Distributed Synchronization

Information scattered in system
(lack of global state)
No common clock
Avoid centralized solutions
- Time taken to collate local state
- single point of failure

Question: When are centralized approaches appropriate?
How can single point of failure be alleviated with centralized?

Clock Synchronization

Does anybody really care what time it is?
Race conditions exist on uni-processors also
(concurrent compile edit on same machine).
How does a computer clock work?
What is the official time?

Computer Clocks

Crystal oscillator vibrating at nearly constant frequency.
Counter for number of oscillations
Interrupt (clock tick) after some number of counts (set by OS).
software keeps track of number of clock ticks
Add this to known time
crystal drift causes clock skew

UNIX counts seconds from midnight UTC January 1, 1970.
Is there a Unix timebomb in your future?

Official Time

Astronomical Time (GMT): Based on mean solar second
1/86400th of a solar day.
Problem: Earth's rotation is slowing
Earth's rotation is not constant
International Atomic Clock (UTC): very stable. coordinated with astronomical time on Jan. 1, 1958. (9,192,631,770 transitions of cesium 133 atom = 1 mean solar second on that date).

Problem is solar second is changing: leap second - adjust GMT to use stable atomic clock seconds but which also has high noon at the right time of solar day.

Official time is broadcast by radio (WWV among others) and by satellite (GPS).
radio accuracy +-10msec
GPS (100-300 nanoseconds) maybe better with differential GPS

NOTE: the availability of GPS clocks solves many problems brought up in the text.

References: GPS clock broadcast (local copy)
Internet Time

Logical Clocks

Based on Seminal work of Leslie Lamport:
Define "happens before" (->) relationship:

a -> b if a and b are events in the same process and a occurred before b
a->b if a is message sent and b is same message received
-> is transitive

Events a and b are causally related if a-> b or b-> a
otherwise they are concurrent, a||b.

QUESTION: what are the event relationships below?

Logical Clocks 2

Let Ci be the clock for process Pi, Use this clock to timestamp events of Pi.
Sequence generated by Ci is monotonically increasing.

if a->b then C(a) < C(b).

C1: for any two events in process Pi, if a happened before b then
Ci(a) < Ci(b)
C2: if a is event of sending message in process Pi and b is event of receiving that message in process Pj
Ci(a) < Cj(b)

Implemented by:

IR1: Clock Ci is incremented between any two successive events
IR2: A message event a in Pi is assigned a timestamp tm = Ci(a). On receiving this message (event b) Pj sets its logical clock to
max (Cj, tm+delta), where delta is a positive increment (usually 1).

QUESTION: assign logical clock values to events in above event diagram.

happens before is a partial ordering
can be turned into a total ordering by assigning order to processes.

Question: assign a totally ordered clock to the event diagram

.
Limitations of Logical Clocks

a->b implies that C(a) < C(b), but the reverse is not true. So cannot tell if two events in different processes are casually related by looking at their clocks.

However the scheme proposed in ISIS can resolve this.
Assign a vector of length n = number of communicating process for each process, where the i-th location contains that processes current understanding of the value of Pi's logical clock.

IR1: increment your own clock after each event
IR2: Pi assigns a message the timestamp = Ci(a) (vector clock of Pi)
IR3: on receiving message from Pj with timestamp Tm, update vector clock Cj
for-all k, Cj(k) = max(Cj[k],tm[k])

Assertion: for-all i,j: Ci[i] >= Cj[i].

QUESTION: What other assertions can be made?

Relationships between Logical and Physical Clocks

If a logical clock deviates from real time (say UTC) by no more than a specified amount, it is a physical clock.

How to make a logical clock a physical one in a distributed system is a problem since CANNOT set time backwards (violates montonicity).

Periodically (determined by maximum drift rate of computer clock), synchronize all clocks to time source (say GPS or WWV) from a TIME SERVER.
If local time is ahead of real time, than must slow down incrementing of clock until in sync.
Measure delay times in the network to send real time HOW?

Time can be set by request from client (Cristian's algorithm) or by request from server (Berkeley) which polls clients for local time and sends messages to adjust.

Problem: time server is centralized.

Distributed time: average all local times gathered at periodic intervals with or without network delay compensation.

Perhaps best solution is to put GPS receivers at all sites.

At-Most-Once Message Delivery

Assign each message a unique number
Receiver remembers all message numbers received
Receiver discards messages with numbers already seen

But what if receiver crashes and loses message number tables?
When message tables be purged?

Assign a timestamp and a connection number (chosen by sender).
Receiver records most recent time stamp for that connection
message with a lower timestamp is discarded
After time G = CurrentTime - (MaxLifeTime + MaxClockSkew), discard entry
where MaxLifeTime is how long message can live in the network
Write this table to disk periodically
After crash, reload table, updates G, rejecting any message generated before crash.

Clock Based Consistent Cache

Why Cache?
Distinguish read and write caches
If want to write, have to invalidate all read caches even if not in use.
Unless readers have a lease time, old readers cannot trust cache consistency
Can renew leases if no one else has requested a write lease.
Server can break lease for priority writer.
If client has crashed, time out by lease value (or network partition).

Mutual Exclusion

Only one process can access a shared resource at a time (e.g. printer, file, data record).

Can be implemented as critical region.
Question: what is difference between critical section and resource lock?

Desirable Characteristics of Mutual Exclusion algorithm:

Freedom from deadlock
Freedom from starvation
Fairness
Fault Tolerance

Centralized Mutual Exclusion Algorithm

Centralized algorithm:

Send request to enter critical section to coordinator
If no one already in, mark in use and grant request
If in use, queue request.
When process leaves critical region, send message to coordinator to release
If processes queued, pick one and grant request.

Centralized is not always inappropriate.
A single shared resource is inherently centralized - co-locate coordinator.

Distributed Mutual Exclusion Algorithm (Ricart and Agrawala)

Totally order all events.
build message containing, critical region, process number and current time.
Send to all processes reliably
Question: how can this be determined?
A process replies as follows:
- If not in and does not want, reply OK
- If in, queues request (no reply)
- If wants to enter, compare timestamp in request to its own. If lower sends OK. otherwise queue
- requesting process waits until everyone sends it an OK message
- When exiting dequeues all processes and sends OK.

How many messages?
What about failures? before, during and after
What about load (CPU and network)?
Is distributed better?

Distributed Mutual Exclusion Algorithm (Token Ring)

Form LOGICAL ring
circulate token around ring
If token holder, can enter critical region if desired else pass token on.

Is this fair?
what is max wait?
What if token is lost?
What is machine crashes?

Comparison of Mutual Exclusion Algorithms

algorithm	Messages/critical region	Delay to enter	Problems
Centralized	3	2	coordinator crash
Distributed	2(n-1)	2(n-1)	process crash
Token Ring	1 to unbounded	0 to n-1	lost token or process crash

Can also measure throughput, utilization, etc.

Election Algorithms

Electing new centralized coordinator given a known set of processes.

Bully Algorithm:

Process P notices coordinator no longer responding
Sends election message to all processes with higher numbers
If no response, elects itself coordinator and tells everyone.
If higher-up answers, it takes over. P is done
When election message received, ACK message and hold election (if not already in process).
Process rebooting holds an election

Ring Election Algorithm

Process P notices coordinator not responding
sends ELECTION message to successor containing its process number
message skips down processes
Running processes add their IDs to the message.
When originator sees message it started, it elects process and circulates results

What if two process notice coordinator crash?
Is process number really what we want to use?

Atomic Actions

AKA transactions.
All or Nothing semantics

Stable storage (usually implemented as disk mirroring (RAID - Redundant arrays of Independent Disks))

AA Primitives

BEGIN_TRANSACTION
END_TRANSACTION
ABORT_TRANSACTION
READ (reversible)
WRITE (reversible)
other reversible operations

Properties of Transactions

Atomic: indivisible
Consistent: no violation of system invariants
Isolated: Transactions do not interfere (critical regions?)
not necessarily deterministic
Durable: after commit, changes are permanent

Need to preserve even in face of nested transactions.
Permanence only applies to top level.

Private Workspace

Conceptually transaction has copy of shared objects
Does this accomplish the above requirements?

Real copy is expensive
Question: Do we need to copy read-only objects?

Write only copies blocks actually changed (called shadow blocks)

Can use Unix's i-node concept to mix private and public blocks (figure 3-18).

Commit has to atomically update public space.

Writeahead Log

Changes made to public storage but audit log is kept of old values

log can be used to rollback or rollforward (after a crash).

Question: How to handle access by other processes?
Does rollforward violate all-or-nothing semantics?

Two Phase Commit

Atomic commit in a distributed system requires all parties to agree to commit at the same time (or not to commit).

One process is coordinator
coordinator logs intention to commit
sends other processes message to prepare to commit
Receiving process log ready to commit and reply (if ready)
when coordinator receives ready from everybody, logs commit decision and sends commit message
Other processes send finished message after committing.
If any process is not ready or has crashed, coordinator aborts transaction (logs and sends message to all).

It is the log entry at end by coordinator which is actual commit.

Question: What is non-coordinator process crashes and never commits?
is a transaction mutually exclusive?

Concurrency Control: Locks

Read/write locks
Granularity
Two phase locking
- Growing Phase: acquire before update, release all if one unavailable
- Shrinking Phase
Assertion: if all transactions use two phase locking, then all schedules are serializable
Strict: shrinking is done after commit(can be part of end_transaction)
avoids CASCADED aborts due to seeing file not committed
deadlocks can be avoided by acquiring in some canonical order
Can set time limit on use of resource to detect deadlock
Can build resource allocation graph and look for cycles

Optimistic Concurrency Control

Basically don't have any and detect conflicts only at commit. If so then abort before commit.

This is deadlock free (see below).

And efficient if probability of simultaneous use is low.

Using Timestamps

Assign using logical clocks to give each transaction a timestamp at BEGIN_TRANSACTION
QUESTION: how do we synchronize clocks if we don't know which other processes access shared resources? Need to know
On commit every resource is timestamped with either a read or write commit time.
When accessing resource, check resource timestamp against transaction one. If transaction one is older, then abort.

Timestamp Example

Let R be the timestamp of commited access to resource R.

Let R' be the timestamp of uncommited access to R

Let Ti be timestamp of transaction "i"

If R < Ti, access OK
If R > Ti, abort transaction
if R' < Ti, wait till commit
if R' > Ti, abort

Deadlocks

Potentially a serious problem in distributed systems and more likely to occur because of increased changes for sharing resources.

Ignore (many systems do this)
Detect (and recover)
atomic actions permit aborted transactions instead of aborted processes
Prevent (static design away the problem)
Avoid (allocate resources carefully) Impractical

Centralized Deadlock Detection

Let coordinator keep global resource graph and detect cycles. Each local processor sends its local resource graph either

when changed
periodically
on demand

Unfortunately, delays in transmission can cause messages to coordinator to arrive out of order potentially resulting in a FALSE DEADLOCK.

Using logical clocks, coordinator could request all processes to verify latest message sent, thus forcing message to arrive in logical order.

Distributed Deadlock Detection (Chandy-Misra-Haas)

Basic idea is to send a message to processes holding resources that you are waiting for and see if the message eventually comes back to you (the cycle).

Messages contain three numbers:

Process just blocked
Process sending message
Process holding resource
QUESTION: should this be resource sending process is blocked on?
Extend algorithm in book to handle multiple resources held per process.

Break cycle, by aborting process that initiated message
but what if two processes inquire and find cycle about same time?

Or can record processes in cycle and abort the lowest priority one

90 % of deadlocks involve only two processes

Deadlock Prevention (by Design)

Request all needed resources at same time
Request in canonical order
Assign each transaction a unique global timestamp and only kill younger processes which wait on older one (AKA wait-die)
Can take away resources by aborting transaction with no side effects.
Again using timestamps, older process takes away from younger (by aborting transaction) but younger will wait for older (AKA wound-wait)
Compare wait-die with wound-wait

Locks vs Critical Regions

Critical region can be considered a resource which is locked (by semaphore, monitor, or coordinator).

Critical region can encapsulate several other resources (files, devices) which are needed to accomplish a mission. In this sense helps alleviate deadlock.

Global State

Consider a bank with two accounts which are kept on different machines.
Let's say account A has $1000 and account B has $500.
Suppose the clock on machine containing A is 5 seconds faster than the clock on the other machine.
Suppose there is a transfer of $100 from A to B.
Is the global state of funds in the bank correct?

Depends on when the state information is gathered. Suppose A decrements account and sends message to B at 11:10:05 (which is 11:10:00 at B). The message transfer time is 1 second so that at 11:10:01 local time for B the account is credited).

If gathered after transfer message sent from A to B but before received, then the bank has lost $100 from its global state.

If gathers local state from both machines and uses local clocks to synchronize to 11:10:03. A's state would be before debit but B's would be after hence global state shows $100 too much.

CS 771/871 Operating Systems

Lecture 3 : Synchronization Issues

Does anybody really care what time it is?

Race conditions exist on uni-processors also (concurrent compile edit on same machine).

How does a computer clock work?

What is the official time?

Distributed Mutual Exclusion Algorithm (Token Ring)

Comparison of Mutual Exclusion Algorithms

Election Algorithms

Electing new centralized coordinator given a known set of processes.

Bully Algorithm:

Process P notices coordinator no longer responding

Sends election message to all processes with higher numbers

If no response, elects itself coordinator and tells everyone.

If higher-up answers, it takes over. P is done

When election message received, ACK message and hold election (if not already in process).

Process rebooting holds an election

Ring Election Algorithm

Process P notices coordinator not responding

sends ELECTION message to successor containing its process number

message skips down processes

Running processes add their IDs to the message.

When originator sees message it started, it elects process and circulates results

What if two process notice coordinator crash? Is process number really what we want to use?

Atomic Actions

AKA transactions. All or Nothing semantics

Stable storage (usually implemented as disk mirroring (RAID - Redundant arrays of Independent Disks))

AA Primitives

Atomic: indivisible

Consistent: no violation of system invariants

Isolated: Transactions do not interfere (critical regions?) not necessarily deterministic

Durable: after commit, changes are permanent

Need to preserve even in face of nested transactions. Permanence only applies to top level.

Conceptually transaction has copy of shared objects Does this accomplish the above requirements?

Real copy is expensive Question: Do we need to copy read-only objects?

Write only copies blocks actually changed (called shadow blocks)

Can use Unix's i-node concept to mix private and public blocks (figure 3-18).

Commit has to atomically update public space.

Changes made to public storage but audit log is kept of old values

log can be used to rollback or rollforward (after a crash).

Question: How to handle access by other processes? Does rollforward violate all-or-nothing semantics?

Atomic commit in a distributed system requires all parties to agree to commit at the same time (or not to commit).

It is the log entry at end by coordinator which is actual commit.

Question: What is non-coordinator process crashes and never commits? is a transaction mutually exclusive?

Concurrency Control: Locks

Read/write locks

Granularity

Two phase locking

Growing Phase: acquire before update, release all if one unavailable

Shrinking Phase

Assertion: if all transactions use two phase locking, then all schedules are serializable

Strict: shrinking is done after commit(can be part of end_transaction) avoids CASCADED aborts due to seeing file not committed

deadlocks can be avoided by acquiring in some canonical order

Can set time limit on use of resource to detect deadlock

Can build resource allocation graph and look for cycles

Optimistic Concurrency Control

Basically don't have any and detect conflicts only at commit. If so then abort before commit.

This is deadlock free (see below).

And efficient if probability of simultaneous use is low.

Using Timestamps

Assign using logical clocks to give each transaction a timestamp at BEGIN_TRANSACTION QUESTION: how do we synchronize clocks if we don't know which other processes access shared resources? Need to know

On commit every resource is timestamped with either a read or write commit time.

When accessing resource, check resource timestamp against transaction one. If transaction one is older, then abort.

Timestamp Example

Potentially a serious problem in distributed systems and more likely to occur because of increased changes for sharing resources.

Ignore (many systems do this)

Detect (and recover) atomic actions permit aborted transactions instead of aborted processes

Prevent (static design away the problem)

Avoid (allocate resources carefully) Impractical

Let coordinator keep global resource graph and detect cycles. Each local processor sends its local resource graph either

Distributed Deadlock Detection (Chandy-Misra-Haas)

Basic idea is to send a message to processes holding resources that you are waiting for and see if the message eventually comes back to you (the cycle).

Messages contain three numbers:

Process just blocked

Process sending message

Process holding resource QUESTION: should this be resource sending process is blocked on? Extend algorithm in book to handle multiple resources held per process.

Break cycle, by aborting process that initiated message but what if two processes inquire and find cycle about same time?

Or can record processes in cycle and abort the lowest priority one

90 % of deadlocks involve only two processes

Deadlock Prevention (by Design)

Race conditions exist on uni-processors also
(concurrent compile edit on same machine).

What if two process notice coordinator crash?
Is process number really what we want to use?

AKA transactions.
All or Nothing semantics

Isolated: Transactions do not interfere (critical regions?)
not necessarily deterministic

Need to preserve even in face of nested transactions.
Permanence only applies to top level.

Conceptually transaction has copy of shared objects
Does this accomplish the above requirements?

Real copy is expensive
Question: Do we need to copy read-only objects?

Question: How to handle access by other processes?
Does rollforward violate all-or-nothing semantics?

Question: What is non-coordinator process crashes and never commits?
is a transaction mutually exclusive?

Strict: shrinking is done after commit(can be part of end_transaction)
avoids CASCADED aborts due to seeing file not committed

Assign using logical clocks to give each transaction a timestamp at BEGIN_TRANSACTION
QUESTION: how do we synchronize clocks if we don't know which other processes access shared resources? Need to know

Detect (and recover)
atomic actions permit aborted transactions instead of aborted processes

Process holding resource
QUESTION: should this be resource sending process is blocked on?
Extend algorithm in book to handle multiple resources held per process.

Break cycle, by aborting process that initiated message
but what if two processes inquire and find cycle about same time?

Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 30, 1996.