Monday, April 14, 2014

SICP - Storage Allocation and Garbage Collection

Last lecture in the series, talk about garbage collection. Apparently LISP uses a mark and sweep algorithm. Or maybe a related algorithm developed by Minsky in the sixties. Toward the end of the lecture Sussman talks about the halting theorem, basically proving that there is not procedure that can generally check whether or not a given procedure is safe to run. Interesting stuff.




Storage Allocation and Garbage Collection

Godel invented a system to encode complicated expressions as numbers

Objects are represented by numbers then
(CONS X Y) => 2^X * 3^Y
(CAR X) => # of factors of 2 in X
(CDR X) => # of factors of 3 in X

very reasonable scheme until numbers get ridiculously large

CONS structure arranged like boxes, little pigeon holes that we can put a value in. Physical memory is a linear stack of fixed sized boxes. Each box has an address.

In order to impose our tree structure on this linear memory, we have to store the address of another memory block.  Our list structure can be implemented by placing the cars and cdrs in pairs of memory blocks, with each block containing either a value or the address of another memory block.

When we use ASSIGN in the register machine, it is these memory addresses that we are using.

(assign a (car (fetch b)))
====>
(assign a (vector-ref (fetch the-cars)
                      (fetch b)))


(assign a (cdr (fetch b)))
====>
(assign a (vector-ref (fetch the-cdrs)
                      (fetch b)))

We have to have some way of getting the next memory block for the next CONS. One algorithm is the free list.  

The free list is all the free memory in the system.  This is a linked list that stores a series of memory addresses.

lol Sussman starts going on about what would effectively be unlimited memory, seems to think 10^14 (about 100 trillion) would be about infinite. He seemed to think it was doable... soon enough I suppose.

In the mean time, memory is expensive and should be used efficiently.  We end up with garbage in memory.

LISP is garbage collected, and uses a method to check if a piece of memory can be referenced by anything ever again.

Items in registers are pointing to items in the list structure memory. The machine cannot access list structure memory unless it is connected by back to the registers.

If we start in the registers and trace out where we can get in memory, everything else is garbage and can be freed.

LISP uses the Sweep and Mark algorithm for gc.  There is another algorithm that does not require us of the stack because it does backtracking.

The Mark phase walks the tree to "mark" any memory that is still accessible from the registers.  Once all the used memory is marked, unmarked memory is swept and freed.

We may not want to look at all of our memory every time.  Another gc algorithm just looks at the useful parts of memory.  This was developed by Minsky et al. Minsky's algorithm copies all useful data into auxiliary memory using depth first search. Once it is does walking memory, it copies the whole block from auxiliary memory back into memory

As of 1986 it is the fastest garbage collector. Interleaving the operation allows garbage collection to take place in real time.

Garbage collectors have to be small because a complicated gc will be impossible to debug.

Going to end the series with a look at things that are impossible to compute.

We may want to know if there exists an infinite loop in a program.

Lets imagine that we have a mathematical function s that tests if procedure p is safe to apply to argument a:

s[p, a] = {true if (p a) converge to a value without error
           false if (p a) loops forever or makes an error }
 
can we write a procedure that computes the value of this function? Suppose we have a procedure called SAFE? that computes the value of s.

Given that we have SAFE?, lets define diag1

(DEFINE DIAG1
(LAMBDA (P)
(IF (SAFE? P P)
(INF)
3)))

(DEFINE INF
(LAMBDA ()
((LAMBDA (X) (X X))
(LAMBDA (X) (X X)))))

What happens if we apply DIAG1 to DIAG1. Is it safe? We don't know. If it is SAFE? then we go into an infinite loop, which isn't safe. But if it isn't safe it is 3 which is safe... so there is a contradiction.

(DEFINE DIAG2
(LAMBDA (P)
(IF (SAFE? P P)
(OTHER-THAN (P P))
FALSE)))
(DEFINE OTHER-THAN
(LAMBDA (X)
(IF (EQ? X 'A)
'B
'A)))

So suppose we want to execute (DIAG2 DIAG2). It always reduces to (OTHER-THAN DIAG2 DIAG2)... Proves the halting theorem.

Sussman thinks that Top-down design does a poor job of getting to what people really want... Uses some electrical engineering examples


No comments:

Post a Comment