Memory Management

How to manage memory. In the past, this has been the key resource (only after the CPU). Effective solution requires both hardware and software support. Key issues: speed and flexibility.

Recall:

           +---------+
          +---------+ \
         +---------+ \|
        +---------+ \|+
       +---------+ \|+      e.g. memory.c
       | Source   \|+
       | Program  |+
       +----------+
            |
            v
       +----------+                       \
       | Compiler |                       |
       |   or     |         e.g.  gcc     >  Compile time
       | Assembler|                       |
       +----------+                       /
            |
            v
       +----------+                           By and large, UNIX does
       |  Object  |         e.g. memory.o     not use file types, but
       |  Module  |                           systems software (like
       +----------+                           loaders) rely on "magic
             |                                 numbers", i.e., a systems
+---------+  |  +---------+               \    routine will pick a particular
| Compiler|  |  |  User   |+--------+     |    integer value at the beginning
| Object  |  |  | Object  ||other   |     |    of a file (to identify it as
| Library |  |  | Library ||libs, X |     |    an object file, for example).
+---------+  |  +---------+|windows |     |    Another systems program will
    |        v        |    +--------+     |    check the magic number to see
    |   +----------+  |       |           |    if things make sense.
     -> |  Link    |<-        |           |
        |  Editor  |<---------            > Load time
        +----------+                      |
+---------+  |                            |
| System  |  |                            |
| Library |  |                            |
+---------+  V                            |
     |  +----------+                      |
     |  | Loader   |                      /
      ->|          |
        +----------+
             |
             v         +------------+     \
        +----------+   | Dynamically|     |
        | binary   |   | Loaded     |     |
        | image in |   | Binaries   |     >  Execution time
        | mem ready|   +------------+     |
        | to exec. |        |             |
        +----------+<-------              /
Much of history of memory manage is attempts to cover problem of too little memory. Traditionally (i.e., a long time ago) process could not start execution until complete binary image was available in main memory.

Alternatives:

These ideas are old. I thought they had gone away for ever -- expect on old military machines. Many came back with the advent of PCs.

Memory Allocation Schemes

Many of these are no longer used, but these ideas built on one another. To understand where we are now is more easily explained if you see the evolution of the techniques. At least I think so. (Besides, I put up with it. You should have to, at least some.) In most of what is discussed, you can see an evolution from on idea to the next.

Single Partitions:

        +-------------+ 0
        |             |
        |   OS        |
        |             |     OS usually loaded in low memory.
        +-------------+
        |             |
        |  Static     |
        |             |
        |  User       |
        |             |
        |  Space      |
        |             |
        +-------------+
        |  Dynamic    |
        |  User       |
        |  Space      |
        +-------------+
        |/////////////|
        |/////////////|
        |/////////////|
        |/////////////|
        |/////////////|
        |/////////////|
        |/////////////|
        |/////////////|
        +-------------+ Max Mem.

   
        +-------------+ 0
        |             |
        |   Static    |
        |   OS        |     OS usually loaded in low memory.
        |   Space     |
        +-------------+
        |   Dynamic   |
        |   OS Space  |  |
        +-------------+  |  Can grow as needed
        |/////////////|  V
        |/////////////|
        |/////////////|
        |/////////////|
        +-------------+
        |  Dynamic    |
        |  User       |  ^
        |  Space      |  |  Can grow as needed
        +-------------+  |
        |  Static     |
        |  User       |
        |  Space      |
        +-------------+ Max Mem.
With the invention of a Relocation Register:
                       +----------+             +---+
                       |       xxx| Rel. Reg.   |   |
                       +----------+             |   | physical
        +---+ logical       |   physical        |   |
        |CPU|-------------> + ----------------->|   | memory
        +---+ address           address         |   |
                                                |   |
                                                |   |
                                                +---+
     or base & limit registers:

             limit+------+     +----------+              +---+
              reg |      |     |       xxx| Rel. Reg.    |   |
                  +------+     +----------+              |   | physical
        +---+ logical | yes          |   physical        |   |
        |CPU|------>  < -----------> + ----------------->|   | memory
        +---+ address |              address             |   |
                      | no                               |   |
                      V                                  |   | 
                    error                                +---+
we have more flexibility. This simplifies system software by delaying BINDING TIME physical addresses.

Multiple Partitions

 
        +-------------+ 0
        |   OS        |
        |   Space     |
        +-------------+        Suppose P1 completes.  Then what?
        |   P1        |
        |             |          Scan job queue for next process to
        |             |          inititate.
        |             |            First fit
        +-------------+            Best fit
        |   P2        |            Worst fit
        |             |
        |             |         Reassignment of memory (after process
        +-------------+         complete) usually results in external
        |   P3        |         fragmentation.
        +-------------+
        |/////////////|
        +-------------+ Max. mem
Can address fragmentation problems with compaction (if its worth the overhead). This causes problems with I/O, DMA.

Paging

                                     address       +----+
                 logical   ----------------        |    |
                 address  /                V       |    |
       +---+      +---+---+          +---+---+     |    |  p:  offset into
       |CPU|------| p | d |          | f | d |---->|    |      page table
       +---+      +---+---+          +---+---+     |    |  d:  offset into
                    |                  ^           |    |      page
                    |       +---+      |           |    |
                    |     / |   |      |           |    |
                    |  p <  |   |      |           |    |
                    |     \ |   |      |           |    |
                     -----> | f | -----            +----+
                            |   |                  frames
                            |   |
                            +---+
                         page table
Advantage: No external fragmentation. The compiler need have no knowledge that what it treats as, say, a 32-bit address, is treated as a page table offset and a page offset. Does not change this code.

Disadvantage: Size of page tables. Internal fragmentation

HW support (TLB, etc) To avoid two memory refs (remember, relative to CPU speeds, accesses to memory are slow), have small associative memory (also called associative registers and translation look-aside buffers).

Typically a 10% time increase in unmapped memory references.

Too many pages to keep entire page table in associative memory. Only keep those actively referenced.

Example:

Suppose (as in text) mem. access is 100 nanoseconds and it takes 10 nanoseconds to search assoc. mem. Then if hit rate is 0.9, then
         effective access time = 0.9 * 110 + 0.1 * 210
                               = 120
(if hit rate were 0.98, then 112, not very different from 110)

Motorola 68030 has a 22 entry Translation Look-aside Buffer;
Intel 80486 has 32 registers, claims hit rate of 0.98.

This scheme supports some memory protection and shared memory.

Primary use of shared memory is to keep only one copy of executable memory resident; several different processes can all run the same copy. Each has its own program counter. Also requires a compiler which generates reentrant code: each process has its own memory location for all local variables.

Segmentation

Programs consist of memory related meaningful units. pages (as above) are entirely artificial. Say, ftn1, ftn2, memblock1, memblock2, ..., main. (memblocks may be arrays, set of local variables associated with a particular procedure).

Each address can be of form ( segid, offset ) = ( s, d )

                             / +-----+-----+
                            |  |     |     |
                -------> s <   |     |     |
               |            |  |     |     |
     +---+     |             \ |limit| base|   +----+
     |CPU|--> (s,d)            |     |     |   |    |
     +---+       |             |     |     |   |    |
                 |             +-----+-----+   |    |
                 |               /      \      |    |
                 |              / yes    \     |    |
                  -----------> <  ------> + -->|    |
                               |               |    |
                               |no             |    |
                               v               |    |
                             trap              +----+
This can easily support memory sharing.

Advantage: If routine never called, never loaded.

Problem: Fragmentation (sige segments are of different sizes)

Paged Segmentation

Can combine segmentation and paging by taking the d of above and changing it to (p,d) as in paging.

     +---+   +---+---+---+
     |CPU|-->| s | p | d |
     +---+   +---+---+---+
               |   |
               |    ------------------------+----
               |                            V    |                    +---+
               |             ------------> >=    |                    |   |
               |            |               |    |                    |   |
               |        +------+----+       V    |                    |   |
               |       /|      |    |     trap   |                    |   |
               V s+r  | |      |    |            |    / +---+         |   |
               + ---- < |      |    |            |    | |   |         |   |
               ^      | |      |    |            |    | |   |         |   |
               |       \|length|ptbr|-----   t   V p+t< |   |         |   |
               | r      |      |    |     -----> +    | |   |  +-+-+  |   |
               |        |      |    |                 \ | f |- |f|d|->|   |
             STBR       +------+----+                   |   |  +-+-+  |   |
                        Segment table                   +---+         |   |
                                                                      |   |
                                                                      |   |
                                                                      +---+
Based on Multics.

(Finally) Virtual Memory

Motivations:

  1. More, more, more!
  2. Much memory is never referenced on a single execution of a program
  3. Locality: programs have strong tendency to repeatedly reference a collection of "close" memory locations, at least over the short run.

Virtual memory gives programs to appearance of access to much larger physical memory than really exists. Performance depends on:

Pages

        +---+             +---+-+           +---+
      0 | A |           0 | 4 |x|         0 |   |
      1 | B |           1 |   | |         1 |   |
      2 | C |           2 | 6 |x|         2 |   |           -----------
      3 | D |           3 |   | |         3 |   |         /             \
      4 | E |           4 |   | |         4 | A |       |\              /|
      5 | . |           5 | 9 |x|         5 |   |       |  ------------  |
      6 | . |           6 |   | |         6 | C |       |  A      E      |
                                                        |                |
      .   .             .   .             .   .         |    BC          |
                                                        |                |
      n | . |           n|    | |         n |   |        \         D    /
        +---+            +----+-+           +---+          ------------
     Logical view      page    ^            main           backing store
     of memory         table   |            memory            (disk)
     (program view            in mem        \                           /
                              flag            ------------v-------------
                                                  Physical Memory
How do memory references works?
  1. Handle address resolution just like paging, but
  2. If required frame is not in main memory, treat as I/O interrupt (this event is called a page fault): (what if no free frames? then select victim, move victim's memory frame to backing store to free space, then fetch process's frame from disk. More later)

As always, this isn't always easy. Depends in part on architecture of machine. In particular, on some machines, "partially executed" instructions are a problem. Most instructions cause no problems -- can reexecute in entirety. But some do. Examples:

  1. Autodecrement (or increment). Common instruction to support looping. E.g. on PDP-11 MOV (R2)+,-(R3) copies contents of register R2 into location pointed to by register R3 and R2 is incremented by 2, R3 decremented by 2 (since PDP is byte addressable) Must reset registers if instruction does not complete.

  2. IBM 360 (and its derivatives) has "move long" (up to 256 bytes) instruction. These can overlap page boundaries. Also source and destination may overlap.

Must be able to undo effect of partially completed instruction.

Overhead of Demand Paging

When page fault occurs:

  1. trap to OS
  2. Save registers and state
  3. Determine interrupt cause (assume page fault here)
  4. Check for legal address
  5. Determine location of page on disk
  6. Select "victim" if no free frames.
  7. Determine location of victim on disk.
  8. Issue write request for victim.
    1. wait on queue for device
    2. wait seek & latency time
    3. transfer
      (this assume local disk. if remote disk, must build TCP/IP packet, wait for turn to get on Ethernet, decode TCP/IP packet at server (when server gets around to it; it may be busy with something else), then do a, b, c) Send msg back when complete)
  9. Allocate CPU to another process (do context switch)
  10. Interrupt from disk controller (or Ethernet controller)
  11. Issue read req. for missing page a, b, c just like 8. also if remote server then go through TCP/IP and Ethernet and another server).
  12. Allocate CPU to another process (another context switch)
  13. Interrupt for disk or Ethernet controller
  14. Update page table
  15. Reschedule process (place on Ready Queue)
  16. Restore registers, etc.


Index Previous Next

Copyright ©2004, G. Hill Price
Send comments to G. Hill Price