Friday, February 14, 2014

SICP - Streams Part 1

This was an interesting lecture.  More and more I find myself at a loss to come up with a good post-lecture summary for the top of these blogs, probably because like any programmer I hate needless repetition and everything about the video that matters is alread in my notes. So here ya go:



Streams Part 1

As a result of assignment and state, a variable no longer specifies a value, but a locations where a value resides, and that value can change. We have to think about time.

Pairs are no longer just CARs and CDRs, but they have identity. They are objects and we have to think about sharing.

Our programming language is no longer just about mathematics but is more mechanistic.

We got into this mess because we were looking to build modular systems that chunk apart naturally. Example when we wanted to separate random numbers from Monte Carlo, and separate that from Cisaro method for Pi.

When we looked at code for electrical systems, different components are objects and their state is stored in wires. In our programs we want to mirror reality.

Maybe the reason that building these systems is difficult is because we have the wrong view of reality.
If we look at the electrical system as representing a signal processing system, our components act as a filter that transforms an electrical signal. This way of thinking is called stream processing.

Compare two procedures:
Image we have a tree of integers
   ___
  /   \__
 /\   /  \
1 /\ 13  /\
 2  7  12  94

(DEFINE (SUM-ODD-SQUARES TREE)
(IF (LEAF-NODE? TREE)
(IF (ODD? TREE)
(SQUARE TREE)
0)
(+ (SUM-ODD-SQUARES
(LEFT-BRANCH TREE))
  (SUM-ODD-SQUARES
(RIGHT-BRANCH TREE)))))

Here we are recursively traversing the tree, adding the squares of odd numbers.

Assume we want to find the kth Fibonacci number and find the odds:

(DEFINE (ODD-FIBS N)
(DEFINE (NEXT K)
(IF (> K N)
'()
(LET ((F (FIB K)))
(IF (OFF? F)
(CONS F (NEXT (1+ K)))
(NEXT (1+ K))))))
(NEXT 1))

This is how we've been writing iterative loops.

First procedure enumerates the leaves of a tree, we filter them to see which ones are odd, we map them to square, then accumulate them with addition.

Second program we enumerate the interval from 1 to n, we map the Fibonacci number, filter for oddness, and accumulate with CONS creating a list.

This commonality is not obvious when we look at the programs. We see that in the sum odd squares program, the enumerator and accumulator are not in one place. We don't have our hands in these thing explicitly because we don't have an appropriate language.

We need to define the signals between the signal processing nodes, we call them streams.

(CONS-STREAM X Y) - Constructor for streams
(HEAD S)
(TAIL S)
George's contract:
For any X and Y
(HEAD (CONS-STREAM X Y)) -> X
(TAIL (CONS-STREAM X Y)) -> Y

The exact same axioms for CONS, CAR, and CDR

THE-EMPTY-STREAM - which is just like the empty list. For now we will pretend that streams are just another word for lists.

(DEFINE (MAP-STREAM PROC S)
(IF (EMPTY-STREAM? S)
THE-EMPTY-STREAM
(CONS-STREAM
(PROC (HEAD S))
(MAP-STREAM PROC (TAIL S)))))

(DEFINE (FILTER PRED S)
(COND
((EMPTY-STREAM? S) THE-EMPTY-STREAM)
((PRED (HEAD S))
(CONS-STREAM (HEAD S)
             (FILTER PRED
         (TAIL S))))
(ELSE (FILTER PRED (TAIL S)))))

(DEFINE (ACCUMULATE COMBINER INIT-VAL S)
(IF (EMPTY-STREAM? S)
INIT-VAL
(COMBINER (HEAD S)
         (ACCUMULATE COMBINER
             INIT-VAL
 (TAIL S)))))
 
(DEFINE (ENUMERATE-TREE TREE)
(IF (LEAF-NODE? TREE)
   (CONS-STREAM TREE
THE-EMPTY-STREAM)
(APPEND-STREAMS
(ENUMERATE-TREE
(LEFT-BRANCH TREE))
(ENUMERATE-TREE
(RIGHT-BRANCH TREE)))))

(DEFINE (APPEND-STREAMS S1 S2)
(IF (EMPTY-STREAM? S1)
S2
(CONS-STREAM
(HEAD S1)
(APPEND-STREAMS (TAIL S1)
               S2))))

DEFINE (ENUM-INTERVAL LOW HIGH)
(IF (> LOW HIGH)
THE-EMPTY-STREAM
(CONS-STREAM
LOW
(ENUM-INTERVAL (1+ LOW) HIGH))))

This is the beginning of a language for describing streams.

(DEFINE (SUM-ODD-SQUARES TREE)
(ACCUMULATE
+
0
(MAP
SQUARE
(FILTER ODD
(ENUMERATE-TREE TREE)))))

Here we are enumerating the tree, filtering for oddness, mapping those to the square function, and accumulating the results using addition starting from zero.

(DEFINE (ODD-FIBS N)
(ACCUMULATE
CONS
'()
(FILTER
ODD
(MAP FIB (ENUM-INTERVAL 1 N)))))

Here we enumerate the interval from one to n, map fib along that interval, filter for odd, and accumulate with CONS starting from empty list.

This stream processing structure gives us conventional interfaces, allowing us to glue things together.  Using map, filters, and accumulators is a very general framework.

Going to look at two, more complicated examples. Assume we have a stream of streams...
{{1 2 3 ...}, {10 20 30 ...} ... {}, {}}

(DEFINE (FLATTEN ST-OF-ST)
(ACCUMULATE
APPEND-STREAMS
THE-EMPTY-STREAM
ST-OF-ST))

This flattens out the stream of streams into one stream.

(DEFINE (FLATMAP F S)
(FLATTEN (MAP F S)))

We'll use this to solve a common problem:

Given N, find all pairs 0 < J < I <= N
Such that I + J is Prime

N = 6 I J I + J
2 1 3
3 2 5
4 1 5

(FLATMAP
(LAMBDA (I)
(MAP
(LAMBDA (J) (LIST I J))
(ENUM-INTERVAL 1 (-1+ I))))
(ENUM-INTERVAL 1 N))

For generate the pairs using enum-intervals.
Now we add the filter:

(FILTER
(LAMBDA (P)
(PRIME? (+ (CAR P) (CADR P))))
(FLATMAP
(LAMBDA (I)
(MAP
(LAMBDA (J) (LIST I J))
(ENUM-INTERVAL 1 (-1+ I))))
(ENUM-INTERVAL 1 N)))

Then we add the final map to create the list of I, J, and I + J:

(DEFINE (PRIME-SUM-PAIRS N)
(MAP
(LAMBDA (P)
(LIST (CAR P)
 (CADR P)
 (+ (CAR P) (CADR P))))
(FILTER
(LAMBDA (P)
(PRIME? (+ (CAR P) (CADR P))))
(FLATMAP
(LAMBDA (I)
(MAP
(LAMBDA (J) (LIST I J))
(ENUM-INTERVAL 1 (-1+ I))))
(ENUM-INTERVAL 1 N)))))

Nested loops start looking like compositions of flatmaps of flatmaps of flatmaps...
We can introduce syntactic sugar using COLLECT

(DEFINE (PRIME-SUM-PAIRS N)
(COLLECT
(LIST I J (+ I J))
((I (ENUM-INTERVAL 1 N))
(J (ENUM-INTERVAL 1 (-1+ I))))
(PRIME? (+ I J))))

Collect the result as i runs through the interval from 1 to n and as J runs through the interval from 1 to I-1 such that I + J is prime.

The "Eight Queens" problem?: Lay out eight queens on a ches board so that noone is attacking anyone else.

George needs to create representations for a board and a position.

(SAFE? <row> <column> <REST-OF-POSITIONS>) - Given that there are queens in the rest of positions, is it safe to put a queen on the position defined by this row and column.

Backtracking is a usual approach to the problem. We create a tree of the possible positions, if we make it to the bottom of the tree we have a solution. "Backtracking Search" ... not unlike depth first search algorithm.  This algorithm is complicated by a concern for time.

Imagine we don't care about time, and have a tree down to k-1 columns.  To extend this tree, we just need to worry about adding level k and filtering for positions that are safe.

(DEFINE (QUEENS SIZE)
(DEFINE (FILL-COLS K)
(IF
(= K 0)
(SINGLETON EMPTY-BOARD)
(COLLECT ... )))
(FILL-COLS SIZE))

(COLLECT
(ADJOIN-POSITION TRY-ROW
K
REST-QUEENS)
((REST-QUEENS (FILL-COLS (-1+ K)))
(TRY-ROW (ENUM-INTERVAL 1 SIZE)))
(SAFE? TRY-ROW K REST-QUEENS))
In the collect, we want to find all ways to put down queens in the first k-1 columns, and then find all ways of trying a rows, then collect together all the position collections. This gives us all possible solutions. We have a stream and each element is a solution. This resulted because we abandoned the idea that this is a time process with state. Mullet brings up the question of calculation time and the fact that the second method that finds all solutions would be slower than the tree search that only finds the first solution...

Abelson talks about the catch with this elegant style of programing with streams.
;;; find the second prime between
;;; 10,000 and 1,000,000

(HEAD
(TAIL (FILTER
PRIME?
(ENUM-INTERVAL 10000 1000000))))

We would need to enumerate the interval, filter for prime, and take the second element. Problem is we are using an enormous amount of resources to solve a simple problem. If we arrange things the right way, however, we can achieve the same efficiency as non-stream programs.

Instead of thinking of the stream as being completely processed by each stream procedure, think of it as a string being pulled through a succession of tubes. We tug on the end and each procedure operates on a bit of the stream and we get a new result at the end.

This is the key to why streams are NOT lists. We know how to treat procedures as objects.

(CONS-STREAM X Y)
Abbreviation:
(CONS X (DELAY Y))
(HEAD S) -> (CAR S)
(TAIL S) -> (FORCE (CDR S))

DELAY takes an expression and produces a promise to compute a result. So a stream is made up of x and a promise to compute y.  TAIL takes the promise and computes it.


(DELAY <exp>) ;;; promise
Abbreviation for (LAMBDA () <exp>)

(FORCE <promise>) = (promise)




Delay coupled the apparent order of events in our program from the actual order of events in the computer.

It would be very inefficient to recompute each element of a stream, since it would look like (TAIL (TAIL (TAIL...

So DELAY is actually MEMO-PROC, which stores the value of a promise once it is computed, so that subsequent calls do not need to recompute the result.

(DEFINE (MEMO-PROC PROC)
(LET ((ALREADY-RUN? NIL) (RESULT NIL))
(LAMBDA ()
(IF (NOT ALREADY-RUN?)
(SEQUENCE
(SET! RESULT (PROC))
(SET! ALREADY-RUN? (NOT NIL))
RESULT)
RESULT))))

Hack called "memo-ization". Streams should run with same efficiency as other programs.

Any amount of side-effects are going to mess up everything (thus no substitution model).

Explaining MEMO-PROC:
MEMO-PROC gets a procedure. MEMO-PROC creates a new procedure. This new procedure has two environment variables RESULT and ALREADY-RUN? that are initialized to NIL. The first time we call this procedure, it sees that ALREADY-RUN? is NIL, so it runs PROC and saves the result in RESULT. Next time MEMO-PROC is called, it sees that it has already run and spits out RESULT without executing PROC again.

List structure of stream:

[ | ]---(PROMISE)
 |
 1

We don't actually have the whole list, but a procedure (DELAY) that will generate the rest of the list.

[ | ]---(xxx)
 |        |
 1       [ | ]---(PROMISE)
          |
 2
 
The rest of the list is only created as we call it into being with FORCE.

No comments:

Post a Comment