Friday, May 22, 2015

Berkeley CS61B - Data structures

I hadn't ventured into the open courseware realm for a while, and I've wanted to get a bit of formal education on data structures and algorithms for a while, so I finally took the plunge a couple weeks ago and started 61B.  For quick reference, here are the relevant locations:

Course Website
Course Lectures
Big O Cheatsheet
Wikipeadia articles on data structures.

Lectures:


Summary of data structures and their better implementations.  Lecture slide 40 gives a good overview of these:

  • Lists
    • Doubly Linked lists
    • Array Lists
  • Maps/Sets
    • B-tree (2-3 tree)
    • Red-Black tree
    • Trie
  • Graphs
  • Priority Queue
    • Heap Ordered Tree
    • Balanced Tree
  • Disjoint Sets
    • Weighted Quick Union
    • WQU with Path Compression

The course also covered a number of sorting and graph traversal algorithms:

  • Sorting
    • Selection Sort (not very efficient)
    • Insertion Sort (really good on mostly sorted data)
    • Heap Sort
    • Merge Sort
    • Quick Sort 
    • Radix Sort (LSD and MSD)
  • Graphs
    • Traversal
      • Depth First Search
      • Breadth First Search
    • Minimum spanning tree
      • Kruskal's
      • Prim's
    • Shortest Paths - Dijkstra's
    • Single Path - A*


Homework 1 (NBody Sim):

This was a pretty fun little homework project.  I couldn't stand the idea of playing around with using javac and java from the terminal, so I busted out Eclipse and used that.  Once I got everything imported and the classpath squared away, I was off to the races.  Spent about 2-3 hours on it, and ultimately was able to get the base program working more or less correctly:


It's a nifty simulator for gravitational interaction.  The math looks daunting at first blush but it actually breaks down quite nicely, for example, these methods for calculating pairwise force and the x component of that:

    public double calcPairwiseForce(Planet p){
        double m1, m2, r2, G;
        m1 = this.mass;
        m2 = p.mass;
        r2 = Math.pow(calcDistance(p),2);
        G = 6.67E-11; 
        return G * m1 * m2 / r2;
    }
    
    public double calcPairwiseForceX(Planet p){
        double dx, r, F;
        dx = p.x - this.x;
        r = calcDistance(p);
        F = calcPairwiseForce(p);
        return F * dx / r;
    }

The homebrew unit testing was easy enough to use once I figured out I could create a class with a main() method that called all the other main methods and just use that as my entry point for a debug configuration.

The one funny bug I ran into was the result of not thinking about my loops carefully enough.  The culprit was this guy:

    public void setNetForce(Planet[] planets){
        xNetForce = 0;
        yNetForce = 0;
        for(Planet planet: planets){
            if(planet.x == this.x && planet.y == this.y){
                return;
            } else {
                xNetForce += calcPairwiseForceX(planet);
                yNetForce += calcPairwiseForceY(planet);
            }                       
        }       
    }

See the problem? Took me a while, but I figured out that by returning when a planet matched itself, it was ignoring any celestial bodies that came after it.  This led to the peculiar behavior of Venus happily orbiting the sun, and everyone else floating off into oblivion without a care.  This happened because Venus was last, and the Sun was second to last in the "planets" array, so Mercury, Mars, and Earth hit themselves and stopped checking before they ever got to the sun.  Had the sun been first this bug would have been much more subtle and likely uncaught (because they would have just been ignoring other planets and the effect on their orbit would have been unnoticable to the naked eye.)

The fix was quite elegant:

    public void setNetForce(Planet[] planets){
        xNetForce = 0;
        yNetForce = 0;
        for(Planet planet: planets){
            if(planet.x == this.x && planet.y == this.y){
                //do nothing, don't return, because then none of the
                // other planets are considered.... oops
            } else {
                xNetForce += calcPairwiseForceX(planet);
                yNetForce += calcPairwiseForceY(planet);
            }                       
        }       
    }

My Homework 1 code is on GitHub.

Homework 2 (Bitwise):

This one certainly wasn't as sexy as homework one, since it just involves creating addition and multiplication functions for a calculator using bitwise operations.  However, it was surprisingly tricky, with a nice payoff.  This tiny snippet of code made me feel like a damn genius:

    public int add(int x, int y) {
        // YOUR CODE HERE
        int mask = x ^ y;
        int carry = (x & y) << 1;  
        
        if(carry != 0)
            mask = add(mask, carry);
        
        return mask;
    }

That is 100% original (well at least I didn't copy it from anywhere, it's probaby been done before lol), and I loved the elegance of it.  It's probably the first time I was able to use recursion effectively in an unguided setting.

The multiplication I feel like I cheated a little bit, since all it does is loop over the addition function.  I found an equally elegant bitwise multiplication implementation on StackOverflow, but I didn't use it.  It felt like plagiarism lol.

I fiddled around for quite a while trying to get the "StaffCalculator" class to work right (you are supposed to black box unit test it), and I finally gave up.  Probably would have worked fine as-is from the command line, I just wasn't inclined to deal with it.  My Homework 2 on Github.

Homework 3 (OOP & Inheritance):

Part 1 of this one is pretty straight forward, just copy the code into Eclipse, run it, and see what the output looks like.

Part 2 got a little more interesting.  Quite a lot of recursion, and I decided I wanted to fully unit test the code I wrote.  This proved SUPER useful, which just makes me want to do that much more unit testing in my actual projects.  I love all those green checkmarks lol:


Part 3 was pretty straightforward, a lot of copy and paste from part two for most of the plumbing.  The most novel part was implementing "apply".  I ended up using a different approach, because it is necessary to keep the list sorted and certain "applied function" could change the order (like something that divides 100 by the number in the list... or something that just returns a random number).

Part 4 was also pretty vanilla.  Really the biggest payoff by this point was getting the hang of adding unit tests (by the end I had 38).  There wasn't anything particularly exciting about creating comparators.  Useful, but not terribly sexy lol.

My Homework 3 on Github.

Homework 4 (Guitar Synth):

Well hw4 was actually surprisingly fun.  At first I was skeptical, working through the first few parts dealing with abstract types and the ring buffer.  But then I got to build a synthesizer, and THAT was very cool.  The ArrayRingBuffer build up in the earlier parts plays a key role in implementing the Karplus Algorithm for synthesizing a guitar sound.

I did run into one funny bug (well it wasn't funny at the time, it was making me crazy!)  where every time I called pluck(), the frequency increased... it got higher and higher until around pluck() 100 when it would loop back around.  It was super weird and making me crazy.  I had println() statements all over the place, I made the last and first variables public so I could print them out... It was very strange because it seemed like every time the array emptied out and was filled again (which happens in pluck()), first and last would get a little more out of sync (since the array was always basically full, they should have always been about 99 elements apart).  But the gap would close, which was having the effect of making the array smaller (and thus the frequency higher).  Finally I tracked it down to the enqueue method:

  public void enqueue(double x) {
      if(isFull())
          throw new RuntimeException("Buffer is full!");
      
      //this piece of code was breaking it!
      if(!this.isEmpty())
          last = advance(last);
      
      rb[last] = x;
      fillCount += 1;
  }

It was an easy enough fix:

  public void enqueue(double x) {
      if(isFull())
          throw new RuntimeException("Buffer is full!");
      
      rb[last] = x;
      last = advance(last);
      
      fillCount += 1;
  }

I didn't run into any real headaches after that.  I implemented the "Guitar Hero" extra, as well as the Harp modification.  My Homework 4 on Github.

[NOTE]
My four year old thought this would be a terribly fun game to play, so to keep her off of my laptop, I put it on a spare laptop we used for the kids.  This proved to be an interesting exercise in java packaging.  First off, no, you can't just copy over the ONE class file, all the class files for the dependencies have to go as well.  Second attempt, this doesn't mean just the class files in the bin directory... this includes the library class files as well (StdLib).  And you can't just drop the jar in, you need the class path.  Ultimately, I got the job done.

From the bin directory:
jar -xf stdlib-package.jar  #extracted the princeton libs
jar -cvf GuitarHero.jar hw4/*.class synthesizer/*.class edu/princeton/cs/introcs/*.class #creates jar

On target machine:
java -cp GuitarHero.jar hw4.GuitarHero

Kids thought it was great for a few minutes, then lost interest. Figures...


Homework 5 (Generics):

Generics in Java aren't particularly sexy, but this was a useful exercise even if it wasn't as exciting as some of the other homeworks.  This homework has you implementing a generic list, a genaric map, and a generic array list that implements the AbstractList java class.  Once again, writing unit tests helped me discover and ferret out a few bugs that might otherwise have slipped through unnoticed.
My Homework 5 on Github.


Homework 6 (BSTMap and Asymptotics):

Another one that wasn't terribly exciting but still an excellent learning exercise.  I was surprised that "remove" was considered optional, since it's the trickiest part of getting a BST to work right... though ultimately it wasn't that bad, I'll admit I had to google how to delete from a BST, but once I had the understood the concept, the implementation wasn't too bad (just promote the min from the right tree... or the max from the left tree).  Took the tests from homework 5 for the ULLMap and did a quick find/replace, which immediately gave me a headstart on a test suite (which ferreted out a couple bugs... gotta check for those nulls lol).  My Homework 6 on Github.

Homework 7 (Hashing):

This homework was pretty huge... all the others I was able to finish in a night but this one took me three.  It's divided into 7 parts.  The first couple parts are really basic, introducing the concept of .hashCode() and having you write a couple (and taking a moment to post a kitten picture lol).

Part 3 was interesting.  It dealt with optimizing a recursive Fibonacci method using memoization rather than reimplementing it as an iterative method.  I used the StopWatch class to do some speed tests and was pretty amazed by the results:


Part 4 got us back into some binary, diddling bits and such.  Pretty short and sweet.
Part 4B, however, was much more involved.  This has us creating a Username which is basically just 3 alphanumberic chars, and hashing it.  The "Optional Challenge" of using regex for the "validation" actually made it easier.  The UsernameBank will store a list of Usernames, and suggest an available one.  For both Username and UsernameBank, I wrote some rather dodgy, but still useful, unit tests to test distribution and performance: for UsernameBank.suffestUsername, I filled the UsernameBank up about 85% of the way and the tried generating some available usernames.  generally it took about a dozen or so tried, so it was pretty performant.  The other weird test was on Username, testing that the default constructor produced consistently random usernames.  Terrible as a "unit" test but it did help me tweak the way characters were generated to get an even distribution.

Part 5 was creating the MyHashMap class.  This was another brutal one.  I think the coolest part on this one was overriding toString so that it printed out the layout of the internals:


Part 6 involved hashing the peices and board for the Checkers game that constitutes Project 0;  I ended up going the ubernerd route and played around with bitwise operations.  My hashCode() override for the board was way over the top more complicated than necessary, but hey that's part of the learning process.  I must have done something right though, because the goal was to have hash collisions fewer than 10% of the time, and I was able to generate 10000 random boards with 0 collisions... whoo, go me!

Finally, part 7, the map race.  The implementation here wasn't too bad, though I had to keep bumping up the heap size on my JVM due to various heap overflow and GC problems.  Once I knocked it up to 6gb and dropped the top run down to 20 million, I was able to get relatively consistent results.  To get the random numbers I used the Fisher-Yates "inside out" algorithm to initiallize an array of unique random ints, which I fed into the maps.  Also ran with the native Java implementations a couple times... MyMapHash definitely needs some work, oy vey (these are all in ms):

Put
1 Million5 Million10 Million20 Million
TreeMap179797362492454452
149188032175443374
160784021892145435
142785331922344948
156292692125147292
Java Treemap91580151626838391
95581061638838145
HashMap106559931054495846
91550251003943676
8234933899920768
93545381003846042
89357001048222079
Java HashMap1071709291325025
1021685297326218

Get
1 Million5 Million10 Million20 Million
TreeMap110281092086449888
101473121773836533
101068521764142253
106573252135440173
103676341797042924
Java Treemap79666711516032479
79469961519932948
HashMap11576421173762
10078413392926
9661313885417
12170412864782
10077114084100
Java HashMap363099372392
364409471562

Remove
1 Million5 Million10 Million20 Million
TreeMap137599822604668794
121691732167443175
123282572116952259
129186302240249406
124791842117653180
Java Treemap76968311506834300
76966321540533167
HashMap261216154189863
231190437976539
192145638359708
272226338626978
1812208338910218
Java HashMap6658213283420
6668212592695

As always, my Homework 7 code is on GitHub;

Homework 8 (Sorting):

The last homework covers implementing QuickSort and MergeSort.  This was a much lighter homework than the last, but I enjoyed it.  There is already a test suite in place and pretty descriptive specs in the comments.  MergeSort was particularly interesting...

One of the methods they have us build is "makeQueueOfQueues"... I built it an it worked, but I couldn't figure out what in the hell we needed it for.  I didn't think I was going to need it for merge sort because I planned on doing that recursively.  Come to the spec and it says "don't write a helper", and I mergeSort didn't exist on the queue itself... but I figured out if every item in the queue went into it's own one-item queue (using makeQueueOfQueues), then I could just loop through and successively merge the individual queues together (the merge method put them in order).  So in the end, it was a pretty niftly solution:

    public void mergeSort(String sortFeature){
        if(size < 2)
            return;
        
        CatenableQueue<CatenableQueue<User>> qq = makeQueueOfQueues();
        
        while(qq.size() > 1){
            CatenableQueue<User> q1 = qq.dequeue();
            CatenableQueue<User> q2 = qq.dequeue();
            qq.enqueue(mergeTwoQueues(sortFeature, q1, q2));    
        }
        
        userQueue = qq.dequeue();  
    }

I was curious about the performance characteristics, so I raced the two in much the same way the maps were raced in Homework 7.  I was suprised that MergeSort won about 80% of the time:


The takeaway there is probably that my QuickSort routines need work... oh well ¯\_(ツ)_/¯
Homework 8 code on GitHub.  I'll work the projects and put them in another post.

No comments:

Post a Comment