CS771/871 Operating Systems

Final Exam

NOTE: Exams are due by 12 noon on Thursday December 12. Exams submitted electronically in an editable form (HTML, most wordprocessor formats, ascii) will be graded and results posted by e-mail to you. I will verify exam submissions by e-mail as soon as they are received.

  1. Compare Write Through vs Write Once cache protocols (20 points).
    Minimizing access to memory is the major concern of any cache protocol. Why? because memory access is slower than cache acces and memory access ties up bus preventing other processors from accessing the bus (including the I/O devices if on the bus).
    Major problem with Write Through is the need to go to memory for every write (cache buys you nothing on writes).
    Major problem with Write Once is processor switch when cache is dirty. Every read/write by a non-owner process for a word marked as DIRTY, requires a bus transfer. In the worst case of two competing processes on different processors, every read or write of a word requires a bus transfer. So there is little advantage to a cache in this case.
    (not recognizing inability to have more than one word cached once dirty even for reads -5)

    Your comparison should include

    1. a definition of cost/benefit metrics which could be used in this evaluation.
      Let's calculate average access time for a read (Rt) and write W(t).
      For Write Through: Rt = Pc1*Ct + (1-Pc1)*Mt,
      Wt = Mt
      where Pc1 is probability of word cached on this processor
      Ct is time to access cache and Mt is time to access memory
      For Write Once: Rt = Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt
      Wt = Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt
      where Pc2 is probability of word cached on this processor, Pd is probability that word is owned by another processor and Bt is the time to access over the bus from the other processor.
      NOTE Pc{1,2} will be different for these two protocols because once dirty in write once, only one cache can own a word even if all processors want to be readers.
      Total access time is for Write Through: Pr(Pc1*Ct+(1-Pc1)*Mt) + (1-Pr)Mt
      where Pr is probability of a read (vs a write) access.
      for Write Once: Pr(Pc2*Ct+Pd*Bt+(1-Pc2-Pd)*Mt) + (1-Pr)(Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt)
      Probability of read cache hit, write cache hit, Pd (3 each)

    2. How would you decide which protocol is better for a given application? (develop a formula using the metrics defined above. e.g. if X < Y then write through is better).
      You would need to measure or estimate the above probabilities for an application. (3 points) Then if
      Pr(Pc1*Ct+(1-Pc1)*Mt) + (1-Pr)Mt < Pr(Pc2*Ct+Pd*Bt+(1-Pc2-Pd)*Mt) + (1-Pr)(Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt)
      Write Through is better.
      Missing inequality 3 points.

  2. Discuss the security of amoeba capabilities. In particular discuss the following questions: (18 points)
    We will not count breaking security by exhaustive search of a large space.

    1. Can a process generate a valid owner capability?
      NO. would not match check field at server. (6 points)

    2. Can a process steal a valid owner capability?
      Yes, by sniffing the communications channel or memory or disks or back up tapes. (6 points)

    3. Can a process modify a valid non-owner capability to become the owner?
      NO. reduced capabilities are related to original through a function of writes and check field (Stored at server) to generate a new check field. From new check field in restricted cannot generate original check field (new check field authenticates the reduced capability). (6 points).

  3. In amoeba, a process generates a 64 bit FLIP address at random. How safe is this approach?(20 points)
    Several issues, one is the probability that two processes will pick the same address. Second is that one process will try to impersonate a server. Third is crashed process picks a new FLIP address so client is aware and can enforce AT MOST ONCE semantics. (need at least one of these last two, 3 points for 1, extra credit 3 for both).
    Probability of duplication. depends on number of existing FLIP addresses (if there are 2**63 processes - probability is very high)
    Assume there are P addresses already in use, then P/(2**64)
    (need both P and 2**64 - worth 17 points no mention of P -9) is the probability of picking a duplicate address. As a sanity check, let's assume there are 2**16 = 65,536 processes on the network. then the probability is 1 in 2**48.

  4. A malicious process is trying to impersonate a existing server (whose put-port is known) by generating a 48 bit random number, putting it through the known one-way function and see if the put-port is generated. If each number takes 1 msec to verify, how long will it take to guess the get-port address on the average? best case? worst case?(18 points)
    Best 1 millisecond
    Worst case: 1000*60*60*24*365 = 3.2*10**10 milliseconds per year
    divided into 2**48 = approx 2.8*10**14 = approx 10,000 years
    (6 points each)

  5. Amoeba's fault tolerant group commuications can survive the loss of up to k processors. Discuss the special cases when k=0 and when k=n (where n is the number of processors in the original group). Be sure to give the communications complexity in terms of the number of packets sent for normal and abnormal operations) (20 points).
    k=0 is the non fault tolerant case. Normal message load is discussed in book at something more than 2 (due to quiescent processors sending ACKs not piggy backed). There is no abnormal mode. or could count missing messages, time outs and regroup as abnormal. sequencer missing varaible number of retries before declare dead, if missing broadcast - will retransmit , if regroup, 1 broadcast, n-1 replies, elect old coordinator if alive, else not reliable necessarily. ( 7 points)
    k=n is not possible since need k+1 processors to survive k failures. However k=n-1 is possible. Here every processor acknowledge every message, there is no need for a history buffer
    (3 points)
    One originating message broadcasted to group, n-2 ACKs from redundant processors to coordinator, 1 accept by coordinator = N messages in normal mode. (5 points)
    In abnormal, some numbr of messages lost till failed coordinator or redundant processor is noted (don't count this there may be other ways to find failed processor - point is any processor failure will be noticed upon next broadcast message), Need an election algorithm. Processor noticing failures sends one "participate in regroup" message, awaits n-2 replies (assuming only one processor has failed), elect highest surviving processor, no need to request or send messages since all processors are up to date at all times. N-1 messages at least
    (5 points)

  6. Given the inherently stochastic nature of a distributed environment (due to, among other factors, network congestion, processors load, autonomous actions by independent users), explore the value of the different memory consistency models (strict, sequential, causal, PRAM) as follows. For each model, name an application (or part of an application) which requires AT LEAST as strong a model to run correctly according to expected behavior for that application (as defined by you, but which should be defensible).
    NOTE: there should be no explicitly programmed synchronization used (like semaphores, ie. should not resort to a weak consistency model to work correctly)
    NOTE: you may also argue that there does not exist any real applications which require that model (and why).
    Because of the somewhat speculative nature of this question, I made it an second extra credit assignment worth 10 points.
    There was some misinterpretation about this question. A memory consistency model is something enforced by the hardware (possibly with operating system software support) and as such its granularity is at the individual word level. SO given that only a hardware unit of memory is under the consistency model contract, what do we have?
    Read Modify Write - Since this is at least two memory operations, none of the consistency models will guarantee that there will not be an inconsistency - so need explicit programmer synchronization.
    Producer/Consumer (some processes write other read). e.g. stock market reports, most broadcast information servers. Strict consistency would be desirable, unless it came at the cost of slowing down to the slowest consumer (the one furthest away on the network). What is probably more reasonable is that there is a bound on the age of the information (since as a consumer I may not be looking for the information instanteously anyway). Also by the time I act on it, the information may have changed, unless I prohibit updates during the consuming phase. So getting the absolute latest seems less important then getting casually consistent view.
    Consider a producer and a consumer, if the consumer acts just before producer produces new information , it gets the old, if afterwards the new, but in the absence of any synchronization between, the choice is arbitrary - if you can live with this arbitrariness, then yon don't need strict. Consider two consumers, which one goes first and gets old value is arbitrary, so don't need strict consistency either.
    However for two producers which are casually linked (sensors tied to a reaction) then strict consistency would be useful. ALthough because of the difficulty of achieving in a dsitributed, if my life depended on it. I need a better model.

  7. (Extra Credit) Using the LINDA tuple distributed memory system, design a data flow computation system (similar to that proposed in the wild/gupta dataflow architecture) for computing expressions containing the basic arithmetic binary operations (+,-,*,/). The data flow diagram in this case represents the syntax tree for binary expressions (eg. a+b, a - b/c, (x - y)*z/(m + n)). Your design should include

    1. the make up of the tuple(s) needed to implement a data flow driven computation (considering whether to represent the node, arc, or both in a tuple)

    2. Discuss the nature of the processors required. For example, would you propose special purpose or general purpose processors?

    3. Taking example(s) you consider suitable, give the set of LINDA operations which would load the tuple space with the data flow operations for that example(s).

    4. Give an algorithm which performs the data flow computation (in psuedo code but using LINDA operations to access the tuple space).

    5. Discuss ways to partition the tuple space to achieve effective distribution of the memory.

    6. (15 points)
    Here is one of the better answers submitted.

    Problem 7

    1. Two types of tuples are required:
      • Operand value tuples as ("A",2.5), the type signature is (string, float). Used for (input or intermediate values.
      • Operation tuples as ("+","A","B","C"), the type signature is (string, string, string, string). This example represents A+B=C. The node name, the input arc names, and the output arc name are included in the computation tuple.
    2. I suggest an architecture with one general purpose processor that generates the computations according to the algorithm explained in part 4 below, and a group of special purpose processors that are capable of doing the following worker computation:

      in(?op, ?name1, ?name2, ?result_name);
      in(name1, ?var1);
      in(name2, ?var2);
      operation = decode(op);
      R= var1 operation var2;
      out(result_name, R);

    3. Example: D=(A+B)/A , A=1, B=2
      The following Linda operations load the tuple space with the data flow operations for this example.

      out("A", 1);
      out("A", 1);
      out("B", 2);
      out("+","A","B","C");
      out("/","C","A","D");

    4. 
      // syntax tree node structure
      typedef struct _node{
      string node_name;
      string out_name; //name of link to parent, or result in case of root
      _node *left;
      _node *right;
      }node;
      
      
      Algorith DataFlow(node *root)
      {
        //parses the syntax tree which is 
        //assumed to be already built
        //to load the tuple space, and return 
        //the number of binary operations
        n=LoadTupleSpace(root);
      
        //Start n workers of the type described 
        //in part 2 above
        StartWorkers(n);
      }
      
      LoadTupleSpace(node *p)
      {
        if (p->left not= nil)
          LoadTupleSpace(p->left);
        if (p->right not= nil)
          LoadTupleSpace(p->right);
        Visit(p);
      }
      
      Visit(node *p)
      {
        if (p->left in nil and p->right is nil)
          out(p->out_name, p->node_name);
        else
          out(p->node_name, p->left->out_name, 
              p->right->out_name, p->out_name);
      }
      
      
    5. The tuple space can be split based on the type signature of the tuples. The operand value tuples can be hashed based on the name field (first field). The operation tuples can be split based on the value of the operation field. With this latter feature we can have more specialized processors that can do only one arithmetic operation, the algorithm given in part 4 above will have to be modified so that the computations generator genrates the exact number of workers corresponding to the number of operations of each operation type.