Lecture 1

Fall 2000: CS 771/871 Operating Systems

Lecture 1

Who am I and what is my background in operating systems?

What is an operating system?

What about this old book?

Is the advanced study of operating systems dead?

What is an advanced operating system course about?

Go over syllabus.

What are distributed systems?

Distributed System

A distributed system is a collection of independent computers that appear to the users of the system as a single computer (tanenbaum).

Distributed System should:

control network resource allocation to allow their use in the most effective way
Provide convenient virtual machine
hide distribution of resources
provide protection
provide secure communication (Goscinski)

This is in contrast to centralized computing (timesharing) or independent PC's.

Why study them? Because of perceived advantages:: better price/performance (PCs are cheap, supercomputers are not)
improved speed (Speed of light is limit, many operations in parallel)
naturally supports distributed application (banking); increased reliability (failures can be isolated); incremental growth (keep old machines, just add more); data sharing (computer supported cooperative work - or games); device sharing (printers, scanners, DVD writers are expensive); communications (support groups of people working together); flexibility (match load to idle machines more easily).

But distributed systems have their disadvantages, chief among them is the complexity and relative unavailability of <<SOFTWARE>>. Most of the problems of non-distributed systems still exist, but we have added many new problems to solve in order to achieve some of the perceived advantages among them are network congestion and rerouting and security.

Also most distributed systems complicate the job of the user by forcing them to be aware of various aspects of the distributed system (Don't you love URL's).

Software Concepts

The job of the operating system is to mold incalcitrant hardware into a beautiful virtual machine.

One distinction is based on the degree of autonomy between processors (tightly vs loosely coupled).

Combinations of hardware and software

Network OS: high degree of autonomy, possible different operating systems, few system wide resources (printers, network file system). Client Server protocols. User aware of distribution of resources.
RLOGIN, RSH, SETENV DISPLAY, FTP
are examples of explicit user actions which reflect lack of transparency of location of resources.
On the other hand, NFS makes the location of your files transparent within the network complex.
True Distributed OS: single system image presented to user: virtual uniprocessor. Note how this contrasts with traditional timesharing systems which make a single CPU look like many virtual CPUs. Requires:
- Global interprocess communications
- global protection
- global process management
- transparent distributed file access
Multiprocessor Timesharing OS: tightly coupled. common ready queue. Shared memory, file system is like single CPU version, possible specialization of processors.
QUESTION: why must scheduler run in a critical section?
QUESTION: What else must be run in a critical section?

See table in figure 1.12

Another way to look at differences is to consider the traditional hierarchical structure of a centralized OS:

File Management
I/O Device Management
Memory Management
Process Management

Now consider a network of resources consisting of

file servers
printers
plotters
scanners
name servers
personal computers or workstations
processors

Now consider different placements of the InterProcess Communications (IPC) module within the traditional hierarchy.
If between 1 and 2, then File Service can be provided remotely and transparently.
If between 2 and 3, then shared remote devices are supported transparently.
If between 3 and 4, then shared memory
If integrated into 4, true distributed OS.

NOTE: one can make access to remote resources appear transparent by adding software above the OS. This is particularly easy if the OS has a light weight kernel and exports many of the management to user level process (like file management, I/O management). then these modules can utilize the network to access remote resources.

One of the earliest attempts at a network OS was the National Software Works, undertaken in the middle 70's.
Consisted of heterogeneous computers connected by ARPANET. Implementation was entirely at the application level (reminds one of internet and web browsers, search engines, etc). But developed a IPC which provided common functionality on diverse systems. Dealt with addressing (naming) problems which is still a big issue in distributed systems.

This early attempt had performance problems.

Other early attempts were based on remote procedure calls (RPC) built on top of a centralized OS with network access (figure 2-6 Goscinski).
Consisted of the following steps:

User Process communicates using provided IPC to local Remote Access System (RAS) with a request.
Local RAS sends request to an appropriate Remote RAS.
Remote RAS acknowledges to local RAS
local RAS transmits acknowledgment to user process with information to set up direct communication path to remote RAS.
User process sends pertinent data to remote RAS.
which access appropriate resource on remote system
sends acknowledge to user process
remote RAS awaits completion of request
and send acknowledge back to user process.

Consider some of the design tradeoffs.
QUESTION: what other ways to solve?
QUESTION: what are the design issues?

The newcastle connection was an early attempt at developing a network OS based on the UNIX OS. Used an extension of the UNIX hierarchical file naming structure to tie different systems together (loosely coupled).
Draw out the naming structure.
Replaced library routines between user and kernel levels. this intermediate level communicated with other using RPC.
Because all processes are subjected to intermediate processing for kernel requests, slows down everybody.

There are no widely used commercially available distributed OS today, although there are many networked OS.

So why study? Because the design issues are important in general and the trend is towards more distributed systems (LAPLINK, mobile computing). It is the future and parts of it are already here.

Design Issues:

Transparency: Hide underlying distributed implementation. (easier to do at the user level than the programming level). Different levels of transparency:
1. Location: location of resources unknown
2. Migration: resources can move without changing their names
3. Replication: number of copies unknown (cache)
4. Concurrency: share resources automatically and unobtrusively
5. Parallelism: programmer unaware of parallel activity on his behalf
Flexibility: Unresolved issue: traditional vs micro kernel. Microkernel utilizes servers which provide higher level OS functions. Can customize a set of OS functions to application. For example, different file systems (DOS, UNIX, MAC) could be provided services.
Reliability: If probability of failure of a single CPU is p, then the probability of n CPUs failing at the same time is p**n. For example, if probability of failure is .1 for a single CPU, then the probability that 3 CPUs fail simultaneously is .001. Replicated hardware is the key to the reliability of mission critical systems.
In practice failures may not be independent and in fact if there are interdependencies between components, the reliability may even decrease. (in the worst case it is the probability that at least one component fails - consider a pipeline architecture which requires all CPUs to be working to get anything done. Then the probability that all CPUs are working is (1-p)**n which for our example is .729.
Availability: is the fraction of time the system is usable.
Reliability also includes data integrity and security concerns. (copies of key files increases availability but may compromise the integrity of the data).
Fault Tolerance: is the ability to provide service even in the face of system failures.
Performance: various metrics, some end user oriented (response time), others resource oriented (throughput, utilization). Raw numbers (speed of CPU or network) are often misleading. Benchmarks are frequently used to compare systems. But makeup of benchmark is application specific.
Communications is typically the bottleneck.
For parallelism to work must consider the appropriate grain size.
Scalability: From LANs to Internet. from home PCs to smart telephones. Scalability usually implies eliminating all centralized resource handling.
Distributed algorithms are distinguished by:
- No machine has complete information of system state
- machines make decisions based on local information
- failure of one machine will not cause system failure
- no global clock assumed

One of the fundamental problems in distributed OS is lack of global state and up-to-date information.

Chapter 2.

What is a Process?

What is the address space? memory space, set of shared objects?

Are critical sections important in a distributed system?

Are the mechanisms described in chapter 2 relevant?

Is non-determinism bad? is it avoidable?

ASSIGNMENT:

Create a website for your contributions to this course. send me the URL for this site. On this site add the following:
1. Your "classic" definition of what is an operating system
2. Define what an operating system of the future will do
3. Stake a claim in what area(s) you plan to develop expertise
Explain why auxiliary variables (section 2.7.3) are needed? (this is your first homework assignment - e-mail your answer.

Fall 2000: CS 771/871 Operating Systems

Lecture 1

Software Concepts

Combinations of hardware and software

Design Issues:

Copyright chris wild 1996. For problems or questions regarding this web contact [Dr. Wild]. Last updated: August 29, 1996.

Copyright chris wild 1996.
For problems or questions regarding this web contact [Dr. Wild].
Last updated: August 29, 1996.