[ Home | Class Roster | Syllabus | Status | Glossary | Search | Course Notes]
Who am I and what is my background in operating systems?
What are distributed systems?
Distributed System should:
This is in contrast to centralized computing (timesharing) or independent PC's.
But distributed systems have their disadvantages, chief among them is the complexity and relative unavailability of <<SOFTWARE>>. Most of the problems of non-distributed systems still exist, but we have added many new problems to solve in order to achieve some of the perceived advantages among them are network congestion and rerouting and security.
Also most distributed systems complicate the job of the user by forcing them to be aware of various aspects of the distributed system (Don't you love URL's).
Go over syllabus.
Since a distributed systems is a collection of (usually geographically) distributed hardware which is made to look like a unified computing environment by clever software, we need to look at different hardware and software configurations.
hardware configurations range from tightly coupled SIMD and shared memory parallel architectures to loosely coupled heterogeneous independent but cooperating machines (e.g. internet).
Flynn's classification is based on number of different instruction and data streams, ranging from SISD (Single Instruction Single Data stream - traditional single CPU machines) to MIMD (Multiple Instruction Multiple Data streams - most of what we are concerned with). Also includes the common parallel architecture SIMD and rather bizarre MISD possibilities.
MIMD machines can be further divided into those which share memory (multiprocessors) and those which do not (multicomputers). These can be further divided into bus or switched depending on the communications path from the CPUs to the memory. Common examples of bus communications is CABLE-TV (broadcast) , while telephone is a switched system (private).
Another distinction is between tightly coupled systems with short delays and high bandwidth and tight control between processors and loosely coupled with arbitrary delays, possibly low bandwidth and more autonomous control.
With a shared memory accessible over the bus, it is possible to get memory coherence that property whereby all CPUs will access the same value at a particular memory location. Problem is contention for the bus and the shared memory. Common solution is cache memory kept local at the CPU, but can lead to incoherent memory. One solution is write-through cache in which all writes to the cache also write to memory. Any write to memory invalidate that cache location on other CPUs. Reads can be done local to the cache. Even so there is a practical limit to the number of CPUs which can share memory (32-64).
NOTE: this problem is not limited to multiprocessors shared main memory but is also a concern with workstations connected on a LAN. However there is common solution is to lock files, or records at the file server.
To get beyond the limitation imposed by contention on the shared bus, switches can be employed. Common switching networks are Crossbar and Omega. Crossbar switches achieve a high degree of parallel interconnectivity by having a path between every pair of communicating devices using a switch between every pair of devices (N*N cross points). Omega switches use fewer switches but with the possibility of contention. (Hybrid between bus and crosspoint). Figure 1-6 shows a 2x2 omega switch.
QUESTIONS
How many switches (sources of delays) between two devices in each
network?
What about contention?
Is an n x n omega network the same as a crossbar switch?
Because multicomputers do not need to share (main) memory, the speed of access and contention problems are much alleviated (but not eliminated). In fact one can consider the main memory to be the cache of the disk memory at the file server. The common bus architecture is Ethernet LAN.
Tightly coupled multicomputers are frequently interconnected by a MESH or a HyperCube network (see figure 1-8).
QUESTIONS
What is max delay in each network?
How much contention (parallelism)?
How many I/O ports (interconnect busses) needed?
The job of the operating system is to mold incalcitrant hardware into a beautiful virtual machine.
One distinction is based on the degree of autonomy between processors (tightly vs loosely coupled).
See table in figure 1.12
Another way to look at differences is to consider the traditional hierarchical structure of a centralized OS:
Now consider a network of resources consisting of
Now consider different placements of the InterProcess
Communications (IPC) module within the traditional hierarchy.
If between 1 and 2, then File Service can be provided remotely
and transparently.
If between 2 and 3, then shared remote devices are supported
transparently.
If between 3 and 4, then shared memory
If integrated into 4, true distributed OS.
NOTE: one can make access to remote resources appear transparent by adding software above the OS. This is particularly easy if the OS has a light weight kernel and exports many of the management to user level process (like file management, I/O management). then these modules can utilize the network to access remote resources.
One of the earliest attempts at a network OS was the National
Software Works, undertaken in the middle 70's.
Consisted of heterogeneous computers connected by ARPANET.
Implementation was entirely at the application level (reminds one
of internet and web browsers, search engines, etc). But developed
a IPC which provided common functionality on diverse systems.
Dealt with addressing (naming) problems which is still a big
issue in distributed systems.
This early attempt had performance problems.
Other early attempts were based on remote procedure calls
(RPC) built on top of a centralized OS with network access
(figure 2-6 Goscinski).
Consisted of the following steps:
Consider some of the design tradeoffs.
QUESTION: what other ways to solve?
QUESTION: what are the design issues?
The newcastle connection was an early attempt
at developing a network OS based on the UNIX OS. Used an
extension of the UNIX hierarchical file naming structure to tie
different systems together (loosely coupled).
Draw out the naming structure.
Replaced library routines between user and kernel levels. this
intermediate level communicated with other using RPC.
Because all processes are subjected to intermediate processing
for kernel requests, slows down everybody.
There are no widely used commercially available distributed OS today, although there are many networked OS.
So why study? Because the design issues are important in general and the trend is towards more distributed systems (LAPLINK, mobile computing). It is the future and parts of it are already here.
One of the fundamental problems in distributed OS is lack of global state and up-to-date information.