Sunday, December 6, 2015

Berkeley CS61B - Data structures (Projects)

Back in May I worked through the Data Structures class through UC Berkeley.  I got the homeworks finished, but I put off working on the projects right away.  In August I got projects 0 and 1 finished.
In late October I finally got around to finishing Project 2.  Finally, in early December I finished Project 3 (I got sidetracked working through MIT's Intro to Algorithms class... and doing like, a billion HackerRank challenges lol)



Project 0 (Checkers61B)


The first project is a checkers game, with a bit of a twist.  Bomb pieces "explode" and take out the diagonally adjacent pieces when they capture a piece, unless that piece is a shield.  This was another project that I tried to emphasize unit testing, and I ended up with 48 unit tests.  The only "extra" I implemented was to highlight the currently selected square.  Overall it was a fun little exercise, but there wasn't anything really earth shattering about it. My code is on github: Project 0




Project 1 (NGorgnet)


This project was significantly more involved than the checkers project.  The Ngordnet program loads a list of hyponyms and word counts and lets you create a number of interesting plots against them.  It uses a basic text based UI.  Code on Github: Project 1.  There were probably plenty of interesting lessons that I learned while coding this up, but it's been over a month since I finished and honestly I can't remember, lol.  Oh well, here are some screenshots:










Project 2 (Gitlet)

This project implements a pared down version of git in Java (git is written in C if I'm not mistaken).  This is a BIG project (relative to everything else we've done in this class, anyway).  The assignment write up includes a pretty good narrative explanation of what's expected, but the tests I found on github where absolutely crucial in allowing me to really get a good idea for what the baseline behavior is supposed to do.  

It took me a bit to figure out the particular language used to describe the tests.  I'll deconstruct one so you get the basic gist (unintentional pun noted...). Funny thing is, if I'd read the section at the very end of the document (which mostly in Chinese), it explains the notation.


Over the course of converting all the test cases from the above format into JUnit tests, I built up some pretty robust helpers based on the three tests included in the skeleton code.  The test case above looks like this in my test code:

    @Test
    public void irebase_ontoPast(){

        //Arrange        
        gitlet("init"); //com0        
        createFile("foo", "hi");        
        createFile("eternal", "I never change");        
        gitlet("add", "foo");        
        gitlet("add", "eternal");        
        gitlet("commit", "say hi");        
        gitlet("branch", "dev");        
        createFile("foo", "hello");        
        createFile("yo", "bar");        
        gitlet("add", "foo");        
        gitlet("add", "bar");        
        gitlet("commit", "say hello");        
        createFile("foo", "hawayu");        
        createFile("baz", "good morning");        
        gitlet("add", "foo");        
        gitlet("add", "baz");        
        gitlet("commit", "good morning");

        //Act        
        String[] result3 = gitletErr("i-rebase", "dev");

        //Assert        
        assertEquals("Already up-to-date.", result3[0]);        
        assertEquals("Already up-to-date.", result3[1]);    
    }

This project had a lot of interesting facets.  FileIO is big, obviously.  I created a "Commit" class that stores the essential info about a given commit (timestamp, id, list of files), which has to be persisted, so I got to play around a bit with object serialization (or rather, I found code on stack overflow to do it lol).  I took a peek in the .git source folder to get some ideas on how to structure the repo, so all my commits live in an "objects" subfolder, and the branches in "refs/heads" with a HEAD file pointing to the current branch.

Also peeked at the .git source code, but it was mostly greek to me.  I did, however, notice they were using the command pattern, so I decided to implement that in my code.  The pluralsight course on Design Patterns has a module on the command pattern, and that was a useful starting off point.  Ultimately proved a very beneficial move, as it let me easily separate each command (which is the whole point of the pattern lol).  Factories proved very useful too, especially when it came to isolating the file system interactions.

One parting find... turns out that File.lastModified() only gives you 1 second resolution on linux machines ~:( ... spend several hours chasing down this pesky bug that was preventing one of the commands from working right because it kept thinking two files had the same timestamp.... I ended up adding a content check to compensate.  As usual, my code can be found on Github: Project 2.

Funny enough, turns out that toward the end of November I started getting a lot of traffic on this Github repo, even had a fork, which is pretty rare for me.  Made me curious, so I looked into it a bit and turns out that the Fall 2015 61B class has pretty much the same project due on December 9.  Of course, Hilfinger changed up some of the requirements, so I'm not sure how useful the Spring 2015 version of Gitlet is going to be...

Traffic on Github repo as deadline looms...


Project 3 (Fun with tries)


The final project was a lot simpler (especially since I wasn't inclined to go after the Gold points which were probably a lot harder than the base project lol).  The implementation of Trie was pretty straight forward, and the AlphabetSort version didn't require much tweaking to get it to work as specified.

AutoComplete was interesting.  I followed the suggested implementation and used a Ternary Search Trie to make it work.  I used a couple PriorityQueues for the nodes that were being searched.  The skeleton code included GUI and timer test classes.  For the timing tests, I got the following results (I have no idea if these are any good since I have no basis of comparison, but here you go).  These were on my Levono Z570 laptop running Ubuntu 14.04:

  • Constructor with cities.txt:  320.00 ms
  • Constructor with baby-names.txt:  14.33 ms
  • Constructor with wiktionary.txt:  5.33 ms
  • topMatches()
    • k = 5, random 2-letter queries, cities.txt: 81,195 calls/sec
    • k = 10, random 6-letter queries, cities.txt: 166,661 calls/sec
    • k = 10, random 3 letter queries, baby-names.txt: 177,611 calls/sec
    • k = 10, random 1 letter queries, baby-names.txt: 128,136 calls/sec
    • k = 10, random 0 letter queries, baby-names.txt: 6,564,671 calls/sec
    • k = 3, random 2 letter queries, baby-names.txt: 487,955 calls/sec

I tweaked the timer class a bit to account for my project file setup (just made the filenames String constants with full file paths).  Running the GUI just required setting the full path to the desired text file as a program argument under run config.  When you select the text you want, it will open up your browser and search Google for it.  All my code for this project is on Github:  Project 3.

AutocompleteGUI with cities.txt loaded


No comments:

Post a Comment