NOTE: Exams are due by 12 noon on Thursday December 12. Exams submitted electronically in an editable form (HTML, most wordprocessor formats, ascii) will be graded and results posted by e-mail to you. I will verify exam submissions by e-mail as soon as they are received.
Compare Write
Through vs Write
Once cache
protocols (20 points).
Minimizing
access to memory is the major concern of any cache
protocol. Why? because memory access is slower than cache
acces and memory access ties up bus preventing other
processors from accessing the bus (including the I/O
devices if on the bus).
Major problem with Write Through is the need to go to
memory for every write (cache buys you nothing on
writes).
Major problem with Write Once is processor switch when
cache is dirty. Every read/write by a non-owner process
for a word marked as DIRTY, requires a bus transfer. In
the worst case of two competing processes on different
processors, every read or write of a word requires a bus
transfer. So there is little advantage to a cache in this
case.(not
recognizing inability to have more than one word cached
once dirty even for reads -5)
Your comparison should include
a
definition of cost/benefit metrics which could be
used in this evaluation.
Let's
calculate average access time for a read (Rt) and
write W(t).
For Write Through: Rt = Pc1*Ct + (1-Pc1)*Mt,
Wt = Mt
where Pc1 is probability of word cached on this
processor
Ct is time to access cache and Mt is time to
access memory
For Write Once: Rt = Pc2*Ct + Pd*Bt +
(1-Pc2-Pd)*Mt
Wt = Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt
where Pc2 is probability of word cached on this
processor, Pd is probability that word is owned
by another processor and Bt is the time to access
over the bus from the other processor.
NOTE Pc{1,2} will be different for these two
protocols because once dirty in write once, only
one cache can own a word even if all processors
want to be readers.
Total access time is for Write Through:
Pr(Pc1*Ct+(1-Pc1)*Mt) + (1-Pr)Mt
where Pr is probability of a read (vs a write)
access.
for Write Once: Pr(Pc2*Ct+Pd*Bt+(1-Pc2-Pd)*Mt) +
(1-Pr)(Pc2*Ct + Pd*Bt + (1-Pc2-Pd)*Mt)
Probability
of read cache hit, write cache hit, Pd (3 each)
How would
you decide which protocol is better for a given
application? (develop a formula using the metrics
defined above. e.g. if X < Y then write
through is better).
You
would need to measure or estimate the above
probabilities for an application. (3 points) Then if
Pr(Pc1*Ct+(1-Pc1)*Mt) + (1-Pr)Mt <
Pr(Pc2*Ct+Pd*Bt+(1-Pc2-Pd)*Mt) + (1-Pr)(Pc2*Ct +
Pd*Bt + (1-Pc2-Pd)*Mt)
Write Through is better.
Missing
inequality 3 points.
Discuss the
security of amoeba
capabilities.
In particular discuss the following questions: (18 points)
We
will not count breaking security by exhaustive search of
a large space.
Can a
process generate a valid owner capability?
NO.
would not match check field at server. (6 points)
Can a
process steal a valid owner capability?
Yes,
by sniffing the communications channel or memory
or disks or back up tapes. (6 points)
Can a
process modify a valid non-owner capability to
become the owner?
NO.
reduced capabilities are related to original
through a function of writes and check field
(Stored at server) to generate a new check field.
From new check field in restricted cannot
generate original check field (new check field
authenticates the reduced capability). (6 points).
In amoeba, a
process generates a 64 bit FLIP address at random. How
safe is this approach?(20 points)
Several
issues, one is the probability that two processes will
pick the same address. Second is that one process will
try to impersonate a server. Third is crashed process
picks a new FLIP address so client is aware and can
enforce AT MOST ONCE semantics. (need at least one of
these last two, 3 points for 1, extra credit 3 for both).
Probability of duplication. depends on number of existing
FLIP addresses (if there are 2**63 processes -
probability is very high)
Assume there are P addresses already in use, then
P/(2**64) (need
both P and 2**64 - worth 17 points no mention of P -9) is the probability of
picking a duplicate address. As a sanity check, let's
assume there are 2**16 = 65,536 processes on the network.
then the probability is 1 in 2**48.
A malicious
process is trying to impersonate a existing server (whose
put-port is known) by generating a 48 bit random number,
putting it through the known one-way function and see if
the put-port is generated. If each number takes 1 msec to
verify, how long will it take to guess the get-port
address on the average? best case? worst case?(18 points)
Best
1 millisecond
Worst case: 1000*60*60*24*365 = 3.2*10**10 milliseconds
per year
divided into 2**48 = approx 2.8*10**14 = approx 10,000
years
(6
points each)
Amoeba's fault
tolerant group commuications can survive the loss of up
to k processors. Discuss the special cases when k=0 and
when k=n (where n is the number of processors in the
original group). Be sure to give the communications
complexity in terms of the number of packets sent for
normal and abnormal operations) (20 points).
k=0
is the non fault tolerant case. Normal message load is
discussed in book at something more than 2 (due to
quiescent processors sending ACKs not piggy backed).
There is no abnormal mode. or could count missing
messages, time outs and regroup as abnormal. sequencer
missing varaible number of retries before declare dead,
if missing broadcast - will retransmit , if regroup, 1
broadcast, n-1 replies, elect old coordinator if alive,
else not reliable necessarily. ( 7 points)
k=n is not possible since need k+1 processors to survive
k failures. However k=n-1 is possible. Here every
processor acknowledge every message, there is no need for
a history buffer (3 points)
One
originating message broadcasted to group, n-2 ACKs from
redundant processors to coordinator, 1 accept by
coordinator = N messages in normal mode. (5 points)
In abnormal, some numbr of messages lost till failed
coordinator or redundant processor is noted (don't count
this there may be other ways to find failed processor -
point is any processor failure will be noticed upon next
broadcast message), Need an election algorithm. Processor
noticing failures sends one "participate in
regroup" message, awaits n-2 replies (assuming only
one processor has failed), elect highest surviving
processor, no need to request or send messages since all
processors are up to date at all times. N-1 messages at
least
(5 points)
Given the
inherently stochastic nature of a distributed environment
(due to, among other factors, network congestion,
processors load, autonomous actions by independent
users), explore the value of the different memory
consistency models (strict, sequential, causal, PRAM) as follows. For each model, name an
application (or part of an application) which requires AT
LEAST as strong a model to run correctly according to
expected behavior for that application (as defined by
you, but which should be defensible).
NOTE: there should be no explicitly programmed
synchronization used (like semaphores, ie. should not
resort to a weak consistency model to work correctly)
NOTE: you may also argue that there does not exist any
real applications which require that model (and why).
Because of the somewhat speculative nature of this question, I made it
an second extra credit assignment worth 10 points.
There
was some misinterpretation about this question. A memory
consistency model is something enforced by the hardware
(possibly with operating system software support) and as
such its granularity is at the individual word level. SO
given that only a hardware unit of memory is under the
consistency model contract, what do we have?
Read Modify Write - Since this is at least two memory
operations, none of the consistency models will guarantee
that there will not be an inconsistency - so need
explicit programmer synchronization.
Producer/Consumer (some processes write other read). e.g.
stock market reports, most broadcast information servers.
Strict consistency would be desirable, unless it came at
the cost of slowing down to the slowest consumer (the one
furthest away on the network). What is probably more
reasonable is that there is a bound on the age of the
information (since as a consumer I may not be looking for
the information instanteously anyway). Also by the time I
act on it, the information may have changed, unless I
prohibit updates during the consuming phase. So getting
the absolute latest seems less important then getting
casually consistent view.
Consider a producer and a consumer, if the consumer acts
just before producer produces new information , it gets
the old, if afterwards the new, but in the absence of any
synchronization between, the choice is arbitrary - if you
can live with this arbitrariness, then yon don't need
strict. Consider two consumers, which one goes first and
gets old value is arbitrary, so don't need strict
consistency either.
However for two producers which are casually linked
(sensors tied to a reaction) then strict consistency
would be useful. ALthough because of the difficulty of
achieving in a dsitributed, if my life depended on it. I
need a better model.
(Extra Credit) Using the LINDA tuple distributed memory system, design a data flow computation system (similar to that proposed in the wild/gupta dataflow architecture) for computing expressions containing the basic arithmetic binary operations (+,-,*,/). The data flow diagram in this case represents the syntax tree for binary expressions (eg. a+b, a - b/c, (x - y)*z/(m + n)). Your design should include
the make up of the tuple(s) needed to implement a data flow driven computation (considering whether to represent the node, arc, or both in a tuple)
Discuss the nature of the processors required. For example, would you propose special purpose or general purpose processors?
Taking example(s) you consider suitable, give the set of LINDA operations which would load the tuple space with the data flow operations for that example(s).
Give an algorithm which performs the data flow computation (in psuedo code but using LINDA operations to access the tuple space).
Discuss ways to partition the tuple space to achieve effective distribution of the memory.
in(?op, ?name1, ?name2, ?result_name);
in(name1, ?var1);
in(name2, ?var2);
operation = decode(op);
R= var1 operation var2;
out(result_name, R);
out("A", 1);
out("A", 1);
out("B", 2);
out("+","A","B","C");
out("/","C","A","D");
// syntax tree node structure typedef struct _node{ string node_name; string out_name; //name of link to parent, or result in case of root _node *left; _node *right; }node; Algorith DataFlow(node *root) { //parses the syntax tree which is //assumed to be already built //to load the tuple space, and return //the number of binary operations n=LoadTupleSpace(root); //Start n workers of the type described //in part 2 above StartWorkers(n); } LoadTupleSpace(node *p) { if (p->left not= nil) LoadTupleSpace(p->left); if (p->right not= nil) LoadTupleSpace(p->right); Visit(p); } Visit(node *p) { if (p->left in nil and p->right is nil) out(p->out_name, p->node_name); else out(p->node_name, p->left->out_name, p->right->out_name, p->out_name); }