Distributed Systems
Motivations:
- Improved performance
- Resource sharing
- Reliability
- Communication
Can be based on tightly coupled or loosely coupled system
- Tightly coupled (Parallel Systems) -- processors share memory, maybe clock, communication through back-plane or shared memory.
- Loosely coupled (Distributed Systems) -- collections of loosely coupled connected by a network.
Can be run by a network operating system (users know about the network) or by distributed operations system (users generally need no knowledge about network).
Network topologies
- Fully or partially connected (mesh, perhaps using high speed trunks)
- Hierarchical (tree, like the phone system)
- Star
- Ring -- most common for old IBM, at ODU we had FDDI at one time (Fiber Distributed Data Interface). 100 Meg.
- Bus -- most common Ethernet, here at ODU. Max 100 Meg. in theory.
For OS, must solve several basic issues:
- Name resolution: how do processes address one another?
- Routing: how are messages (path) sent through network?
- Connection strategy: how do two processes send a sequence of messages?
- Contention: how are conflicting demands for service resolved?
Naming
Example: on Internet, names are of form machine.dept.inst.domain.country
(In the US, we think there are no other countries, so the country, .US, is usually omitted. We do this to let all people in all foreign countries know that they are really second class computer users.)
Since people here use different machines, we usually omit the machine name.
Names are resolved by Domain Name Server, DNS, (Originally each machine had file giving "real" addresses, but impossible to keep all copies current). Each level has a Name Server, a process responsible for giving real address.
To resolve name "mach.cs.cmu.edu", (edu for educational institutions, org for organization, com for companies) OS kernel knows local of edu server and sends request to it to find location of cmu. Message then sent to server at cmu to ask about cs. And cs knows about mach. (An address of "from:" not really true, since servers often keep cache of recently requested addresses.)
At receive site, message must go to a particular process. Networking receive process often a unique port number, and message is delivered to that process.
Have similar set of related connection schemes:
- circuit switching -- (fixed between two processes)
- message switching -- (established for duration of message) each message must have destination address
- Packet switching -- message sent as sequence of packets.
- virtual circuit: path established for duration of session.
- dynamic routing (Datagram): but now each packet must find its own way.
But this omits routing problem: to get message from here to there, must find a particular route: here are some of the choices:
- fixed: when system brought up, path set up in advance via table
- split-traffic: 2+ fixed routes
- flooding: emergency broadcast or status query.
- random: with bias.
- hot potato: with bias.
- backward learning: congestion avoidance.
- distance-vector: global info sent locally(old IP method).
- link-state: local info sent globally(current IP method).
(if lots of hops between here and there, and loads on those hops change radically, virtual circuit does not respond to load changes. dynamic routing can)
Contention
Network must be shared:
- CSMA/CD -- (carrier sense multiple access/collision detect) each sender waits until network is idle, then sends. This will cause collisions, so each sender must detect and resend message if so. This is what Ethernet uses.
- Token -- sender can only send when it has Token: waits for token, holds it and sends message, then transfers token to next site. This is what FDDI/Token Bus uses.
- Message slots -- each sender has a slot in which only it can send messages; waits for its slot. Next network mechanism is ATM (Asynchronous Transfer Method), it has slots. Used by AT&T. 155.51 Meg.
OSI 7-Layer Model versus TCP/IP model versus Actual Model
Networks (particularly wide area) may lose message packets. We have both reliable and unreliable protocols since for some applications, slight losses don't matter (audio/video), but others do (files).
TCP/IP -- Commonly used reliable network protocol (Transmission Control Protocol/Internet Protocol). IP gets message parts from one place to other, TCP keeps copy of sent pieces in case parts are lost so they can be resent.
TCP |
UDP/IP -- Commonly used unreliable protocol (User Data Protocol/Internet Protocol) If pieces are lost, who cares?
Sockets -- Common programming interface for communications.
(for more details, take CS 455)
Network Operating Systems
Typical services
- Remote login
- telnet
- OS sets up network connection; user logs into remote normally.
- File transfer
- ftp
- OS set up network link. Remote site may require regular log in or may support "anonymous" connects for either sending or copying.
- get "filename" (or mget)
- put "filename"
- bin
- hash
- cd
- ls or dir
Distributed Operating Systems
Have mechanisms for
- Data migration -- should data be transferred from remote to local site? performance question.
- Computation migration -- OS may support Remote Procedure Call (RPC). Intended to look like any other procedure call, but OS transfers parameters to remote site (may also determine which site handle the particular procedure), and gives result back to caller. See below.
- Process migration -- Goals:
- load balancing
- computation speedup
- hardware preference
- software access
- data access
3-Tier Systems
Cloud Computing
Remote Procedure Call
- Set up by OS
- Each site has rpc daemons (remember
ps -aux
? look for rpc daemons. Try ps -aux | grep rpc
)
- Each site has several ports. Network messages have destination address which machine reads (oh, this one's for me), then checks for port number (an integer) in packet, gives packet to that port, and a particular process will be listening for packets delivered to that port. Port numbers for standard services services may be predetermined (so sender knows which port to send message to on remote), or server may have Matchmaker so sender sends request asking for port number for a particular service to matchmaker.
Remote calls more complex to deal with than local calls because of networks; requests can be lost (then must be resent) or received multiple times (sender may have assumed call was lost). So each call must be time stamped by sender, then received can check time and eliminate duplicates since it keeps history.
Difficult to distinguish between lost messages, failed sites and failed links.
Things often depend on timers of some type. Each site may send periodic message saying "I am up" Each site may send message "Are you up?" to another site and wait a specified amount of time. If it has alternate routes after receiving no reply, it might resend over the alternate route.
Copyright ©2014, G. Hill Price
Send comments to G. Hill Price