Tuesday, March 27, 2018

MIT 6.042 - Mathematics for Computer Science (Incomplete)

Back, oh, probably in September 2017 or so, a couple of MOOC buddies and I decided we wanted to tackle 6.042 together.  For my part, I felt that having a better understanding of the underlying principles of proofs would be a huge boon when tackling advanced CS courses that lean on them heavily.  Thinking back to 6.006 in particular, when going through CLRS (which is very proof heavy), I think I would have benefited from better fundamentals in this area.  So, as is my habit, I'm keeping a log of my experience here;  part notes, part stream of consciousness reflection.

We're mostly following the 2015 version of the course, however I really enjoy the 2010 lectures by Tom Leighton, so I'm double dipping a bit there.  The most natural way to divide up the coursework is by "week", corresponding to each of the 12 psets.  We decided on Overleaf for document creation and collaboration, as we agreed it would be best to suck it up and learn a bit of LaTeX if we wanted to get the full "genuine" experience with the pset "submissions".

Index for 2015 course: link
Discrete Math playlists that were helpful:






Week 1: Proofs, Well Ordering, and Logical Formulas


This week covers the first three chapters of the textbook.  The first chapter introduces some terminology, like proposition, predicate, axiom.  One of the early concepts that took a minute to wrap my head around was the idea of an implication.  An implication is an if-then type proposition, denoted with a double arrow symbol:  A ⇒ B (if A then B, or A implies B).  This statement is true when either B is true OR A is false (i.e. all values except A = T and B = F, which is F).

Two propositions A and B are equivalent (A ⇔ B) if the implication is bidirection, i.e. A ⇒ B and B ⇒ A.  For this to evaluate to true, either both A and B must be false, or A and B must be true.

The following truth table summarizes some of the basic logical primatives, with implication and equivalence, followed by xor (⊕), and (∧), or (∨), negation (¬), and contrapositive implication.  Notice that an implication A ⇒ B is equivalent to it's contrapositive ¬B ⇒ ¬A:


ABA ⇒ BB ⇒ AA ⇔ BA ⊕ BA ∧ BA ∨ B¬B¬A¬B ⇒ ¬A¬A ⇒ ¬B
TTTTTFTTFFTT
TFFTFTFTTFFT
FTTFFTFTFTTF
FFTTTFFFTTTT

The Tom Leighton lectures are an excellent resource to get a feel for proof by contradiction and proof by induction, but unfortunately he doesn't cover proofs by the Well Ordering Principal (which is on the homework). The induction examples, in particular, really helped me start to grasp how to approach these proofs.  Even so, when it came down to buckling down and doing the homework, I still had to find some extra help.

PSet 1, Problem 1 asks us to prove that log_4(6) is irrational.  My first step was to look up the various log transformation rules, which I found on RapidTables.com: Logarithm Rules.  I did a search on proving the rationality of a log and found several walkthroughs on YouTube, but his one in particular gave me a good idea on how to proceed (even if it was pretty rough): Prove log6 Base 3 is Irrational.  I ended up doing a lot of unnecessary work to reduce my problem (log_4 of 6) to that one (log_3 of 6), but came up with a correct, if clunky proof.  When I was discussing it with my study group I realized a much more direct and elegant proof which one of them ran with and fleshed out.  =)

Problem 2 presented an inequality, which ultimately came down to proving that an exponential is greater than a polynomial for some nonnegative intergers.  A proof by induction that n^2 < 2^n by Eddie Woo [link] was fantastic and gave me the correct intuition, however the assignment wanted the proof to use the Well Ordering Principle (WOP), so I couldn't use quite the same approach.  Another video explaining the equivalence of WOP and induction [link] also gave me some assistance in bridging the gap, and ultimately I came up with a proof.

Problem 3 was pretty easy (just creating some basic truth tables and propositional logic), but Problem 4 I eventually had to give up on.  I just didn't have the right context and intuition to make any progress, and I got tired of spinning my wheels.  I spent hours rolling it around in my head, looking up lectures on how carry look-ahead adders and carry save adders and carry skip adders worked... but the whole "add1" module bit, and building a half adder out of that... did not compute.  So I mostly skipped it.  So pset1 I probably get about 60% credit... doh.  :/



Week 2: Quantifiers and Predicate Logic, Sets


A predicate is a proposition with variables.  One very simple predicate is something like this:

P(x,y) ::= [ x = y ]

This predicate would be true when x and y are the same, and false when they aren't.

Quantifiers are the upside down A (∀) and the backward E (∃), which mean "for all" and "there exists" respectively.  These can be useful for reasoning about an entire domain.

∀ is also called the "universal quantifier", and is true when every element of the domain satisfies a predicate.  Take the statement "All children love candy".  We can restate this using predicate logic with the "for all" quantifier:

L(c) ::= [c loves candy]
∀ c ∈ Children. L(c)

This can also be thought of as an "AND" across all values in the domain, so it's like saying:

L(Alice) ∧  L(Sally) ... ∧ L([last kid])

The other quantifier, ∃, (also called the "existential quantifier") is true when at least one value in the domain satisfies the predicate.  So if we said "Someone cheated on the test", we could state that as:

C(s) ::= [s cheated on the test]
∃ s ∈ Students. C(s) 

This quantifier is analogous to a series of "OR" across all elements in the domain:

C(Alice) ∨  C(Sally) ... ∨ C([last student])

Evaluating these quantifiers depends on the domain of discourse.  A predicate that is true for the natural numbers may be false for the real numbers, or the complex numbers. Quantifiers can also be combined, so ∀x∃y would be "for all x, there exists a y..." and ∃x∀y would be "there exists an x such that all y..."

If order for a predicate to be valid, it must be true for all domains.  The example they use is as follows:

∀z [Q(z) ∧ P(z)]  ⇒  ∀x [Q(x)] ∧ ∀y [P(y)]

One aspect of this that threw me off was the idea that x, y, and z are all assumed to be a part of the same domain.  This example introduces the idea of Universal Generalization, which states that proving P(c), then we can deduce that  ∀x.P(x), with some caveats (c cannot appear in P).

The other big idea for this week is an introduction to "sets".  A set is a collection of mathematical objects, with the set itself being treated as a single object.  Well known sets include the real numbers (ℝ), the complex numbers (ℂ), the rational numbers (ℚ), the integers (ℤ), and the natural numbers (ℕ).  A set could also be something as simple as just the values 1 and 2: {1, 2}, or the empty set, which as nothing in it and is denoted with Ø.

From a strictly mathematical standpoint, there is no restriction that a set contain only a single "type"... so a set could contain a string, a number, a boolean value, etc.  There is also no inherent ordering, and an element is either in the set or not, so no notion of having an element in the set more than once.

Membership is denoted using the epsilon symbol, read as "is in" or "member of".  So we would write "x is a member of the set of Real numbers" like this:

x ∈  ℝ

One set can also be a subset or superset of another set.  So if we wanted to say that A is a subset of B, and B is a superset of C, we could write these two statements:

A ⊆ B
B ⊇ C

The subset and superset symbols are analogous to the greater than and less than symbols.  The version with the bar underneath allows for the sets to be the same (greater/less than or equal basically), whereas without the bar, one set is a strict super or sub set.  Both sub/super set symbols and the membership symbol can be negated with a slash through it (i.e. ∉, ⊄, ⊅).

Finally, the union (∪) and intersect (∩) operations create new sets from two or more sets.  A union creates a set with all the elements of the constituent sets combined, and an intersection creates a set with only those elements that are contained in both sets.  A union of the set {1, 2, 3} and {3, 4, 5} would create the set {1, 2, 3, 4, 5}, while the intersection would create the single element set {3}. The union of a set with a superset (say A and B above) produces the superset, while intersecting the same two sets produces the subset:

A ⊆ B
A ∪ B = B
A ∩ B = A

A set can also be defined using a condition, denoted with the pipe | and read as "such that".  So the set of all even numbers could be read as "x is a member of the integers such that x is even" and written as:

E(x) ::= [x is even]
{x ∈  ℤ | E(x) }

The last concept covered the power set, which is the set of all the subsets of a set.  So the power set of the set {1, 2, 3} is { {1, 2, 3}, {1, 2}, {1, 3}, {2, 3}, {1}, {2}, {3}, {}}.  It can be written formally as

POW(A) ::= {B | B ⊆ A}

I started to work on pset 2, but got frustrated and gave up because I didn't have the context to understand what the questions were even looking for, and with no solutions I couldn't even find out what the canonical answer was, so I said "screw it" and decided to focus on the resources that made sense for me, i.e. the lectures, readings, and my notes.



Week 3: Binary Relations, Induction


Binary Relation on Wikipedia, and a number of videos on YouTube [1][2][3], were helpful for understanding relations, and particular in understanding cardinality terms (sub-, in-, and bi- jective).

A binary relation associates elements of one set, called a domain, with elements of another set, called the codomain (definition from 2015).  Meyer's lecture looks at binary relations much the way you would think of them in a database context, with sets of students, subjects, and advisors all related in different ways (students register for subject, advisor advises student, advisor teaches subject, etc.).   These relations, in Meyer's examples as well as most of what I found on YouTube, are often depicted using blobs to denote the domains, with arrows mapping elements of the domain to the codomain.



The "Image" of a relation is the elements of the codomain being mapped to.  So the image of R(A) would be {1}, the image of R(D) is {3, 4},  the image of R({A,B}) is {1,2}, etc.  The image of the whole relation is also called the "range", and is all the elements in the codomain being mapped to, in the above example this is {1, 2, 3, 4}.  Another way to think of the image is as the endpoints of arrows starting in X.  Using set builder notation, the image can be described as:

{y ∈ Y |  ∃x ∈ X. x R y}   
(Set of y in Y such that there exists an x in X participating in the relationship x R y)

The inverse relationship basically turns the arrows around.  Flipping the role of the domain and codomain, the inverse relation (call it "R^") maps the elements of Y to the elements of X.  R^(1) is {A}, R^(2) is {B, C}, etc. Since the "arrows" can be thought of as a graph of the relation, the inverse takes this graph and reverses the direction of all the edges.

A function is a special case of a relation that associates every element of the domain with at most one element from the codomain.  So in the image above, the relation R is not a function because D has two arrows out.  A total relation related every element in the domain with one or more elements in the codomain.  The above relation is not total, because E has no arrows going out.  A total function is a relation in which every element of the domain has exactly one arrow coming out.

Relations can also be described based on the properties of how they map to the codomain.  A surjection is a relation in which all the points in the codomain have at least on arrow coming in.  The above graph is not a surjection because 5 is not pointed to.  A surjection is also called a "onto" mapping, i.e. Set A maps onto Set B when every element of Set B has one or more arrows pointing to it.  An injection is a relation in which every element in the codomain has at most one point coming in. The above relation is not an injection because 2 has two incoming arrows.

Finally, a bijection is a relation in which each element in the domain maps to exactly one element in the codomain, i.e. a total function that is both an injection and a surjection.  If a relation is bijective, it implies that the two sets in the relation are the same size.

Some shorthand for these relations include:

A bij B ::= bijection from A to B, iff |A| = |B|
A surj B ::= surjective function from A to B, iff |A| ≥ |B|
A inj B ::= total injection from A to B, iff |A| ≤ |B|


The other "big idea" for this week is induction.  I'm going to say, here, that I liked the Tom Leighton lectures on induction (2010, Lecture 1 and Lecture 2) a lot better than Albert Meyer's on the subject. Both use the examples of tiling with L shaped tiles, coloring horses, making n cents with 3 and 5 cent stacks, and scoring the unstacking game.

Meyer's coverage of equivalence of WOP, Induction, and Strong Induction was pretty hand wavey, like he just couldn't really be bothered with it.  Ugh, and then he spends what feels like forever justifying their pedagogical choices -_____-

The induction rule states that if a property holds for a base case size of "n" (generally zero), then we can assume the property is true for all n > 0, and use this assumption to prove that the property holds for n + 1.  That is:

P(0), ∀n. P(n)  ⇒  P(n + 1)

The analogy of "dominoes" is used to illustrate the way that for each value of n, the truth of that value is used to prove the next n, just as each domino knocks over its neighbor.  And similar to dominoes, if there is a gap in the validity of the implication, then this leads to an error in the induction.  The bogus proof involving the horse colors demonstrated this problem, when the implication for n = 1, i.e. P(1)  ⇒  P(2) failed since there was no overlap between the two sets of size 1 that higher set sizes relied on.

Inductive proofs generally use the following format:

Proof: (by induction on n)
The induction hypothesis, P(n), is: ...
Base case (n = 0): ...
Inductive Step: Assume P(n) where n ≥ 0, and prove P(n + 1): ...



Week 4: State Machines, Recursive Definition, Infinite Sets


State Machines


State machines are used to model step by step processes.  Both the 2010 and 2015 editions of the course use the fountain puzzle from Die Hard 3 to illustrate how states and transitions can be defined for a state machine. Neither makes for terribly compelling watching, but I don't want to beat the "Tom is the only lecturer worth a shit" dead horse any more.

The Die Hard example lends itself nicely to a toy state machine example.  The state is represented as a tuple, (b, l) where b (the big jug) is in the range [0,5] and l is in the range [0,3], i.e. the "big" and "little" jugs respectively. We can then define a series of transitions representing various actions we can take with the jugs:


  1. (b, l) ⇒ (b, 3)  fill the little jug
  2. (b, l) ⇒ (5, l)   fill the big jug
  3. (b, l) ⇒ (b, 0)  empty the little jug
  4. (b, l) ⇒ (0, l)  empty the big jug
  5. (b, l) ⇒ (0, b + l)  pour b into l, when (b + l) ≤ 3
  6. (b, l) ⇒ (b + l - 3, 3)  pour b into l, when (b + l) > 3
  7. (b, l) ⇒ (b + l, 0) pour l into b, when (b + l) ≤ 5
  8. (b, l) ⇒ (5, l + b - 5)  pour l into be, when (b + l) > 5

So, to get exactly 4 gallons in the big jug, we can use the following transitions:

(0,0) ⇒ (5, 0) [2]  ⇒ (2, 3) [6] ⇒ (2, 0) [3] ⇒ (0,2) [5] ⇒ (5,2) [2] ⇒ (4,3) [6]

They both use the example of a diagonally moving robot on an infinite grid to illustrate the concept of a preserved invariant.  A preserved invariant is a property of the state machine that holds true for every reachable state.  In the case of the diagonal robot, every state transition changes the sum of the coordinates by ±2 or 0, which means that for a robot starting at the origin, the sum of the coordinates will always be even.

Another example of a preserved invariant occurs in the variation on the Die Hard puzzle when the jugs are 9 and 3 gallons.  In this case, the amount of water in either jug is always divisible by 3 (thus the puzzle is unsolvable and Bruce dies).  The Invariant Principle basically adapts the concept of induction to state machines.

Two concepts that are important in determining the correctness of a program are partial correctness and termination.  Partial correctness basically says that the results, if any, of a program are correct. It is partial in the sense that some programs may not produce any result at all.  So if a program is partially correct and can be shown to terminate, then it will be shown to be correct.

Derived variables are values that are calculated from the state.  The total water in both jugs, or the sum of the diagonal robots coordinates, are examples of a derived variable.  A derived variable can be useful in proving properties about an algorithm.  In the fast exponentiation example, the derived variable z can be shown to be strictly decreasing, with the algorithm terminating when z becomes zero.



Recursive Definition (Recursive Data Types and Functions)


Recursive Data is defined by building it up from simpler versions of the same data type. This is done by defining a base case and a constructor case.  An example of the base case might be the empty set, the empty string, some function f(0), whatever the case may be.  The base case will represent the simplest possible valid representation of the data type.  This base case then forms the foundation of larger constructs that are formed using the constructor case.

Structural Induction is a proof method that is analogous to regular induction, and is used to prove properties about a recursive data structure.

To prove that P(x) holds ∀ x ∈ R, it is necessary to prove the base case and the constructor case:

Base case:  P(b) for each base case b ∈ R
Constructor: P(c(x)) for each constructor, c, assuming induction hypothesis P(x)

Example:

Prove E ⊆ Even by structural induction on x ∈ E with induction hypothesis "x is even"
Base case: P(0) is even
Constructors:   c1(n) ::= n + 2, c2(n) ::= -n

The book and lectures both cover the matching brackets problem.

Recursive functions are defined in terms of a base cases and constructors.  For a recursive function f(x), the base case is defined as f(b) and each constructor c is defined in terms of x and f(x).

Ambiguous recursive definitions occur when a given input to a recursive function can result in multiple outputs.



***********************


I'm Done.  I can't make myself get excited enough about this material to continue.  The 2010 lectures were great, Tom Leighton is a joy to watch.  The Albert Meyer lectures are horrible.  Most of the probability stuff is covered in depth in 6.008.  I just can't do it anymore...


***********************

Week 5: GCDs, Congruences, Euler's Theorem
Week 6: RSA Encryption, Digraphs
Week 7: DAGs and Scheduling, Partial Orders
Week 8: Coloring, Trees, Stable Matching
Week 9: Sums and Products, Asymptotics
Week 10: Bijections, Repititions, Pigeonholing
Week 11: Discrete and Conditional Probability
Week 12: Independence, Random Variables, Expectation
Bonus Week: Markov & Chebyshev Bounds, Sampling, Pagerank

No comments:

Post a Comment