Wednesday, February 12, 2014

SICP - Assignment, State, and Side Effects

This lecture is done by Jay Sussman and he really seems to dislike assignment. It muddles up all this very pretty functional code.  It's a bit of an anachronism that he talks about Fortran, BASIC and Pascal, though the setting is half the charm here. Mullet about made my head explode again, oy vey.  Overall a very valuable look at some fundamental programming concepts like variable scope and environments. He didn't talk about side effects directly but it did make me think of a word... what was it [navigates to Google...] idempotence, thank you Wikipedia. 


I ran the counter example in the Scheme interpreter:



Here are todays notes:

Assignment, State, and Side-effects

Functional programs encode mathematical truthes

(DEFINE (FACT N)
(COND ((* N 1) 1)
(ELSE (* N (FACT (- N 1))))))

Processes evolved by such programs can be understood by substitution.
Methods may be distinguished by the choice of truths expressed.

(SET! <var> <value>)

When we introduce assignment, order of evaluation matters. With all the programs we have written so far, order of operation doesn't matter.

(DEFINE COUNT 1)
(DEFINE (DEMO X)
(SET! COUNT (1+ COUNT))
(+ X COUNT))
=> (DEMO 3)
5
=> (DEMO 3)
6

The same expression leads to two different answers, therefore DEMO is not a function in the mathematical sense. This kills the substitution model, because substitution model is static, describes unchanging truths, not changing or dynamic truth.  

We look again at the factorial function, using the old iterative process: (Functional version)

(DEFINE (FACT N)
(DEFINE (ITER M I)
(COND ((> I N) M)
(ELSE (ITER (* I M)(+ I 1)))))
(ITER 1 1))

Same algorithm with assignments. (Imperative version)
(DEFINE (FACT N)
(LET ((I 1)(M 1))
(DEFINE (LOOP)
(COND ((> I N) M)
(ELSE
(SET! M (* I M))
(SET! I (+ I 1))
(LOOP))))
(LOOP)))

If the SET! statements are reversed, this can introduce a bug into our program. Sussman really doesn't like this because he keeps calling it "ugly"...

You can't DEFINE the same variable twice, though the computer may not catch. Same rule applies to LET its just a smaller scope.

(LET ((var1 e1)(var2 e2))
e3)
==
((LAMBDA (var1 var2)
e3)
e1
e2)

Internally this is what happens when you call LET. Ugh Mr. Mullet is at it again... shoot me now.

We say that a variable, v, is "bound in an expression", E, if the meaning of E is unchanged by the uniform replacement of a variable, w, not occurring in E, for every occurrence of v in E.

Bound Variables (I've had display issues with funky fonts so I'm not going to try to display the upside down A for [FOR EVERY] or the backwards E for [THERE EXISTS]...
[FOR EVERY] x [THERE EXISTS] y P(x y)
variables x and y are bound, because it is not dependent upon the letters we use. If we use w instead of x the meaning doesn't change.

Quantifier is what bounds variables ([FOR EVERY], [THERE EXISTS], integral, etc...)
Can substitute Lambda expression λ

This gives us modularity, it shouldn't matter what names we use when building our programs.

Unbound variable
(λ (x) (* x y))

In this case, y is not bound. If y = 3 and z = 4, changing y with z changes the result of the expression.

We say that a variable v is "free in an expression", E, if the meaning of E is changed by the uniform replacement of a variable, w, no occurring in E, for every occurrence of v in E.

I the above example, y is a free variable.

(λ (y) ((λ (x) (* x y)) 3))
the asterisk is a free variable. If we change them all with the plus symbol, it changes the meaning of the expression.

Talking now about variable scope.

If x is a bound variable in E then there is a lambda expression where it is bound. We call the list of formal parameters of the lambda expression the "bound variable list" and we say that the lambda expression "binds" the variables "declared" in its bound variable list.  In addition, those parts of the expression where a variable has a value defined by the lambda expression which binds it is called the "scope" of the variable.

Lambda expression is the only way to get a bound variable. Define can ultimately go away and we can use just lambda expressions.

Now we need to make a new model for computation, one that represents names as referring to places, because if we change something, it needs to have a place where it is stored. "Environments"  - a way of doing substitutions virtually. An environment is a function or a table, made up of frames. These frames are chained together with parent links.  

In this new model of computation, procedures are two parts. One of these two parts points to the environment. The rules for evaluating these procedures are:

Rule 1: A procedure object is applied to a set of arguments by constructing a frame, binding the formal parameters of the procedure to the actual arguments of the call, and then evaluating the body of the procedure in the context of the new environment constructed.  The new frame has as its enclosing environment the environment part of the procedure object being applied.

Rule 2: A lambda expression is evaluated relative to a given environment as follows: a new procedure object is formed, combining the text (code) of the lambda-expression with a pointer to the environment of evaluation.

example of why we use assignment (since it makes are beautiful, clean mathematical code so messy...)
(DEFINE MAKE-COUNTER
(λ (N)
(λ ()
(SET! N (1+ N))
N)))

GLOBAL environment includes many things we are familiar with: +:, -:, CAR, etc...

Our procedure object points to this environment, DEFINE adds MAKE-COUNTER to this environment.  

(DEFINE C1 (MAKE-COUNTER 0))

First we construct a frame where N=0, and a pointer to the parent environment (GLOBAL). Now we evaluate the body of the procedure. The body is a procedure, so we create a procedure object, which points to the environment we created where N=0. C1 is created in the GLOBAL environment by DEFINE.

(DEFINE C2 (MAKE-COUNTER 10))

The construction of this procedure object is nearly identical to C1. What is worth noting is that there are two N variables, living in isolated environments. 


Easier to just drop a screenshot in here than try to reproduce this diagram in text.


What we have are two computational objects, with their own local state. In there world there are many "objects" we may want to think about that have internal variables that are largely decoupled.

Actions and Identity
We say that an action, A, had an effect on an object, X, (or equivalently, that X was changed by A) if some property, P, which was true of X before A became false of X after A. We say that two objects, X and Y, are the same if any action which as an effect on X has the same effect on Y.

;;; Cesaro's method for estimating Pi:
;;; Prob(gcd(n1,n2)=1) = 6/(pi*pi)

(DEFINE (ESTIMATE-PI N)
(SQRT (/ 6 (MONTE-CARLO N CESARO))))

(DEFINE (CESARO)
(= (GCD (RAND) (RAND)) 1))

(DEFINE (MONTE-CARLO TRIALS EXPERIMENTS)
(DEFINE (ITER REMAINING PASSED)
(COND   ((= REMAINING 0)
(/ PASSED TRIALS))
((EXPERIMENT)
(ITER (-1+ REMAINING)
  (1+ PASSED)))
(ELSE
(ITER (-1+ REMAINING)
PASSED))))
(ITER TRIALS 0))

(DEFINE RAND
(LET ((X RANDOM-INIT))
(LAMBDA ()
(SET! X (RAND-UPDATE X))
X)))

If we define this without assignment, the procedure is monolithic and there is no way to separate Cesaro's experiment from the general monte carlo method. Because Cesaro needs two random numbers, the random number generator is leaking into the Cesaro procedure, which in turn is leaking into the monte carlo procedure. This means that this can't be decomposed any further.

"The more things change, the more they stay the same."



No comments:

Post a Comment