where p is the probability of a page fault per instruction.
time: disk: latency 4 milliseconds
seek 11 milliseconds
so say 15 milliseconds (which is 15,000,000 nanoseconds)
Software processing (with careful coding) say 100 nanosecs. Thus:
e.m.a.t. = ( 1 - p ) * ( 100 ) + p * ( 15,000,000 ) (in nanosecs)
= 100 - 100p + 15000000p
= 100 + 14,999,900p
To have no more than a 10% degradation (110 nanosecs emat)
110 > 100 + 15,000,000p (these numbers are only approx. so forget the small stuff)
or
1/1,500,000 > p
Hmmmm. What about changing hardware to add 1 bit to each frame that is originally 0 (when page is first loaded), but is changed (by hw) with any write to that frame.
How to select victim?
To analyze memory reference behavior (for design and understanding) take trace of "typical" executing program to get address references. Translate to page numbers. Generate sequence of page number changes (since two consecutive references to same page will not generate page fault)
(sorta true -- when would it not be true?).
Consider the sequence:
1 2 3 4 1 2 5 1 2 3 4 5
For FIFO
if 1 frame then 12 faults.
2 frames
* * * * * * * * * * * * still 12 faults
1 1 2 3 4 1 2 5 1 2 3 4
2 3 4 1 2 5 1 2 3 4 5
3 frames
* * * * * * * * * 9 faults
1 1 1 2 3 4 1 1 1 2 5 5
2 2 3 4 1 2 2 2 5 3 3
3 4 1 2 5 5 5 3 4 4
4 frames
* * * * * * * * * * 10 faults
1 1 1 1 1 1 2 3 4 5 1 2
2 2 2 2 2 3 4 5 1 2 3
3 3 3 3 4 5 1 2 3 4
4 4 4 5 1 2 3 4 5
(would be good to redo above example for LRU)
How to implement:
Approximations for LRU -- Reference bit per page set by HW with each reference. Periodically (say every 100 millisecs) copy all reference bits shifted onto, say, 8-bit string per page. Then clear reference bit. Thus:
11111111 means this page ref at least once each of last 8 periods.
10000000 means?
00101011 means?
If these bit strings are treated as integers, then smallest are (approx.) LRU.
Second Chance Algorithm -- Memory as circular list.
Seach (circular) for first frame with reference bit not set, clear as search is made. Worst case search is complete cycle and choose original.
Mininum number of frames -- If one instruction can reference three pages (for example) what would happen if process is only allocated 2 frames? (On OS/360, MVC, a 6 byte instruction, can require 6 pages: instruction can be on page boundary, each address may also.)
How to Allocate?
CPU utilization better with global. Process speed more repeatable with local. With local, what should be the size of frame allocation? (#avail frames/#processes?)
Working Sets -- (Not a precise definition) The working set of a process is the minimal number of pages to keep page faults low. (also defined as the number of different page references in last delta T time)
If WSS(i) is size of working set for process i, then D (demand) =
, where n is number of active processes.
If D << real memory, start a new process.
If D >> real memory, thrashing will occur. Swap a process out, to release its frames.
Prepaging: if know WS, then bring all into main memory. (How could OS determine WS?)
Array reference order
Consider:
VAR a[ 1..10000, 1..10000 ] OF char;
FOR i := 1 TO 10000
FOR j := 1 TO 10000
a[i,j] := ' ';
vs
a[j,i] := ' ';
what's the difference? (row major vs column major order)
Best page size
Inverted page tables are used by some IBM machines.
Note that page tables are big if many processes are active. If size of virtual memory >> size of physical memory, could save a lot of space if only one page table for entire system (size of number frames of real memory). Then memory reference works:
+---+
+---+ log. | |
|CPU| ----> (pid, p, d ) --------> (i, d)---->| |
+---+ addr | ^ | |
| +------+ \ | | |
| | | | | | |
-->| | > i ---- | |
(search | | | | |
page tbl |pid,p | / | |
for | | | |
pid,p> | | | |
+------+ | |
page table | |
+---+
If page not in memory, must then search process's real page table to find out where (on disk) it is. To speed things up, this is done in associative registers (no real search of page table).
VM & MFT: Years ago, IBM reported significant speed-ups if users would run one of their non-virtual memory systems (MFT) on top of their (then) new VM (for virtual machine) virtual memory operating system. Customers do not like to throw away old code that works. This seemed unlikely because MFT ran on VM as a regular user task (no special privileges). When a regular user task requested an operating system service, it requested it from MFT, which then "tried" to do it. This would cause a protection violation (since to VM, a mere used task was trying to do an OS thing). But then VM would examine what the user task was trying to do. If the user task were MFT, then VM would perform the service, return to MFT, which would then return to the original user tasks. Thus almost all interrupts had to be handled twice, first MFT would try, then VM would actually do it. Obvious doubling of effort.
Point is, even with the significant overhead of two OSs, user tasks would run noticeably faster. How could this be?
Answer: Basic advantage of virtual memory: more active processes, so CPU is less likely to be idle while waiting on a few processes (under the old MFT) to complete, for example, an I/O transfer.
| Index | Previous | Next |
|---|
Copyright ©2005, G. Hill Price
Send comments to G. Hill Price