CS 771/871 Operating Systems

[ Home | Class Roster | Syllabus | Status | Glossary | Search | Course Notes]


Lecture 3 : Synchronization Issues

Claim:


Distributed Synchronization

Question: When are centralized approaches appropriate?
How can single point of failure be alleviated with centralized?


Clock Synchronization


Computer Clocks

UNIX counts seconds from midnight UTC January 1, 1970.
Is there a Unix timebomb in your future?


Official Time

  1. Astronomical Time (GMT): Based on mean solar second
    1/86400th of a solar day.
    Problem: Earth's rotation is slowing
    Earth's rotation is not constant
  2. International Atomic Clock (UTC): very stable. coordinated with astronomical time on Jan. 1, 1958. (9,192,631,770 transitions of cesium 133 atom = 1 mean solar second on that date).

Problem is solar second is changing: leap second - adjust GMT to use stable atomic clock seconds but which also has high noon at the right time of solar day.

Official time is broadcast by radio (WWV among others) and by satellite (GPS).
radio accuracy +-10msec
GPS (100-300 nanoseconds) maybe better with differential GPS

NOTE: the availability of GPS clocks solves many problems brought up in the text.

References: GPS clock broadcast (local copy)
Internet Time


Logical Clocks

Based on Seminal work of Leslie Lamport:
Define "happens before" (->) relationship:

Events a and b are causally related if a-> b or b-> a
otherwise they are
concurrent, a||b.

QUESTION: what are the event relationships below?

 

 


Logical Clocks 2

Let Ci be the clock for process Pi, Use this clock to timestamp events of Pi.
Sequence generated by Ci is monotonically increasing.

if a->b then C(a) < C(b).

Or

  1. C1: for any two events in process Pi, if a happened before b then
    Ci(a) < Ci(b)

  2. C2: if a is event of sending message in process Pi and b is event of receiving that message in process Pj
    Ci(a) < Cj(b)

Implemented by:

QUESTION: assign logical clock values to events in above event diagram.

happens before is a partial ordering
can be turned into a total ordering by assigning order to processes.

Question: assign a totally ordered clock to the event diagram


.
Limitations of Logical Clocks

a->b implies that C(a) < C(b), but the reverse is not true. So cannot tell if two events in different processes are casually related by looking at their clocks.

However the scheme proposed in ISIS can resolve this.
Assign a vector of length n = number of communicating process for each process, where the i-th location contains that processes current understanding of the value of Pi's logical clock.

Assertion: for-all i,j: Ci[i] >= Cj[i].

QUESTION: What other assertions can be made?


Relationships between Logical and Physical Clocks

If a logical clock deviates from real time (say UTC) by no more than a specified amount, it is a physical clock.

How to make a logical clock a physical one in a distributed system is a problem since CANNOT set time backwards (violates montonicity).

Time can be set by request from client (Cristian's algorithm) or by request from server (Berkeley) which polls clients for local time and sends messages to adjust.

Problem: time server is centralized.

Distributed time: average all local times gathered at periodic intervals with or without network delay compensation.

Perhaps best solution is to put GPS receivers at all sites.


At-Most-Once Message Delivery

But what if receiver crashes and loses message number tables?
When message tables be purged?


Clock Based Consistent Cache

 


Mutual Exclusion

Only one process can access a shared resource at a time (e.g. printer, file, data record).

Can be implemented as critical region.
Question: what is difference between critical section and resource lock?

Desirable Characteristics of Mutual Exclusion algorithm:


Centralized Mutual Exclusion Algorithm

Centralized algorithm:

Centralized is not always inappropriate.
A single shared resource is inherently centralized - co-locate coordinator.


Distributed Mutual Exclusion Algorithm (Ricart and Agrawala)

How many messages?
What about failures? before, during and after
What about load (CPU and network)?
Is distributed better?


Distributed Mutual Exclusion Algorithm (Token Ring)

Is this fair?
what is max wait?
What if token is lost?
What is machine crashes?


Comparison of Mutual Exclusion Algorithms

 

algorithm Messages/critical region Delay to enter Problems
Centralized 3 2 coordinator crash
Distributed 2(n-1) 2(n-1) process crash
Token Ring 1 to unbounded 0 to n-1 lost token or process crash

 Can also measure throughput, utilization, etc.


Election Algorithms

Electing new centralized coordinator given a known set of processes.

Bully Algorithm:


Ring Election Algorithm

What if two process notice coordinator crash?
Is process number really what we want to use?


Atomic Actions

AKA transactions.
All or Nothing semantics

Stable storage (usually implemented as disk mirroring (RAID - Redundant arrays of Independent Disks))

AA Primitives

 


Properties of Transactions

  1. Atomic: indivisible

  2. Consistent: no violation of system invariants

  3. Isolated: Transactions do not interfere (critical regions?)
    not necessarily deterministic

  4. Durable: after commit, changes are permanent

Need to preserve even in face of nested transactions.
Permanence only applies to top level.


Private Workspace

Conceptually transaction has copy of shared objects
Does this accomplish the above requirements?

Real copy is expensive
Question: Do we need to copy read-only objects?

Write only copies blocks actually changed (called shadow blocks)

Can use Unix's i-node concept to mix private and public blocks (figure 3-18).

Commit has to atomically update public space.


Writeahead Log

Changes made to public storage but audit log is kept of old values

log can be used to rollback or rollforward (after a crash).

Question: How to handle access by other processes?
Does rollforward violate all-or-nothing semantics?


Two Phase Commit

Atomic commit in a distributed system requires all parties to agree to commit at the same time (or not to commit).

It is the log entry at end by coordinator which is actual commit.
Question: What is non-coordinator process crashes and never commits?
is a transaction mutually exclusive?

Concurrency Control: Locks


Optimistic Concurrency Control

Basically don't have any and detect conflicts only at commit. If so then abort before commit.

This is deadlock free (see below).

And efficient if probability of simultaneous use is low.


Using Timestamps


Timestamp Example

Let R be the timestamp of commited access to resource R.

Let R' be the timestamp of uncommited access to R

Let Ti be timestamp of transaction "i"


Deadlocks

Potentially a serious problem in distributed systems and more likely to occur because of increased changes for sharing resources.

 


Centralized Deadlock Detection

Let coordinator keep global resource graph and detect cycles. Each local processor sends its local resource graph either

  1. when changed
  2. periodically
  3. on demand

Unfortunately, delays in transmission can cause messages to coordinator to arrive out of order potentially resulting in a FALSE DEADLOCK.

Using logical clocks, coordinator could request all processes to verify latest message sent, thus forcing message to arrive in logical order.


Distributed Deadlock Detection (Chandy-Misra-Haas)

Basic idea is to send a message to processes holding resources that you are waiting for and see if the message eventually comes back to you (the cycle).

Messages contain three numbers:

  1. Process just blocked

  2. Process sending message

  3. Process holding resource
    QUESTION: should this be resource sending process is blocked on?
    Extend algorithm in book to handle multiple resources held per process.

Break cycle, by aborting process that initiated message
but what if two processes inquire and find cycle about same time?

Or can record processes in cycle and abort the lowest priority one

90 % of deadlocks involve only two processes


Deadlock Prevention (by Design)


Locks vs Critical Regions

Critical region can be considered a resource which is locked (by semaphore, monitor, or coordinator).

Critical region can encapsulate several other resources (files, devices) which are needed to accomplish a mission. In this sense helps alleviate deadlock.


Global State

Consider a bank with two accounts which are kept on different machines.
Let's say account A has $1000 and account B has $500.
Suppose the clock on machine containing A is 5 seconds faster than the clock on the other machine.
Suppose there is a transfer of $100 from A to B.
Is the global state of funds in the bank correct?

Depends on when the state information is gathered. Suppose A decrements account and sends message to B at 11:10:05 (which is 11:10:00 at B). The message transfer time is 1 second so that at 11:10:01 local time for B the account is credited).

If gathered after transfer message sent from A to B but before received, then the bank has lost $100 from its global state.

If gathers local state from both machines and uses local clocks to synchronize to 11:10:03. A's state would be before debit but B's would be after hence global state shows $100 too much.

 


Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: October 30, 1996.