Saturday, March 7, 2015

Fun with ciphers and generated text with JavaScript

Many moons ago, on a day I had very little else to do, I had this silly notion to create a Google Spreadsheet that would generate a line of text "mad-lib" style.  I'd already created a "team name" generator that basically just picked an adjective and a noun and put them together, which made for some rather amusing results, such as "The Pensive Tangerines", "The Drunk Stumpies", and "The Kindly Rippers".



Being a bit of a cad, I thought it would be funny to do one of the form of "He <put> his <adjective> <thing> into her <adjective> <thing>"... I know, very seventh grade, but I've gotten a good many laughs from the result.  Recently I dedided to revisit this idea and create a quickie (no pun intended) JavaScript app that would do the same thing.  Because the subject matter didn't live up to the level of professionalism I normally try to maintain here, I put the result on it's own blog.   As inappropriate as the language choice was, the coding itself was quite fun, so I thought I would pick a more G rated subject and reproduce the exercise ;)

Generating a random sentence


So rather than my purile attempt before, we'll walk through "The <quick> <brown> <fox> <jumped over> the <lazy> <dog>"... this should be fun, ey?  To make things extra interesting, we're going to cipher our word lists.  This isn't a true, secure encryption, just a bit of obfuscation, so someone who hits F12 and looks at our code doesn't ruin the fun by spying all our words.  So, our grocery list:

  • Need 6 word arrays, one for each field in the subject string
  • Need a method to randomly fetch a string from an array
  • Need functions to cipher and decipher text

The word arrays are the easy part:

var word1 = ["quick", "fast", "slow", "spritely", "overjoyed", "hyperactive", "idiotic"];
var word2 = ["brown", "purple", "portly", "fat", "skinny", "mottled", "bug eyed"];
var word3 = ["fox", "kangaroo", "spider monkey", "circus midget", "robot"];
var word4 = ["jumped over", "poked", "made a sandwich for", "smacked down", "made faces at"];
var word5 = ["lazy", "angry", "amused", "ugly", "thoroghly annoyed", "depressed"];
var word6 = ["dog", "baby", "iguana", "martian", "bunny", "Gary Busey"];

With an unlimited amount of time I'm sure we could get pretty creative here, but this will do for now.  So the question is how do we pull a word out of an array at random.  This basically boils down to writing a "randBetween" function and using it to generate a random index value.  A quick search got me to (naturally) a stack overflow article, which got me to this code:

function randBetween(min, max) { return Math.floor( Math.random()*max ) + min }

I actually used the bitwise OR ( a | 0 ) when I did it the first time, but I think there are good arguments against doing it that way (it MAY be faster but it is less clear, and for what we are doing I like clarity over performance).  Now that we have randBetween, we can write a simple function to get a random word from our array

function getRandomWord (array) { return array[randBetween(0, array.length)]; }

Because our randBetween function uses floor, the value of "max" will never be returned... so the actual values we get back will always fall between min and max-1.  This means we get an index back that lies between 0 and array.length - 1, which works out perfectly, since the max index is always one less than the length.  Now we should be able to build a working sentence generator:

function generate() { return "The " + 
    getRandomWord(word1) + " " + 
    getRandomWord(word2) + " " + 
    getRandomWord(word3) + " " + 
    getRandomWord(word4) + " the " + 
    getRandomWord(word5) + " " + 
    getRandomWord(word6); }

I was surprised at how funny the results were:


Creating the cipher and decipher functions


At this point, it functions perfectly well, however because our words are all stored in arrays of plain text, it's trivial to open up the source and see all our words.  "So what?" you ask... good question. I have no idea why I cared but I wanted to try to obfuscate the content.  Nothing terribly robust, just enough to disuade casual curiousity.  So I figured a simple cipher would suffice.  In classic style, I wasn't satisfied doing a simple Caesar cipher... oh no, I had to be a tad bit fancier than that.  So each letter of a word is ciphered on an index based on it's index in the word, and words in an array are ciphered based on their position in the array.  But I'm getting ahead of myself here... first we have to cipher a single letter.

Before we can cipher the letter, there are a couple things we need to know:
  • How do we convert characters to numbers?
  • What range of characters are we going to use?
Converting to and from character codes is simple enough.  "a".charCodeAt(0) will give you the character code of "a" (97), and String.fromCharCode(97) will give you "a".  To get a handle on the range of characters we want to use, we'll look at a table of ascii values. I limited my first implementation to just lowercase letters, which caused wonky behavior when I hit spaces and periods, so here we'll widen the range a bit.  We'll start at 32 (space) and go up to 126 (tilde).  This gives us 95 characters to work with.

So, we'll build this up a little at a time.  The first thing we need to do is take a character into our function.  For starters, well just convert it to and from a character code:

function cipherLetter(a) { 
    return String.fromCharCode( a.charCodeAt(0) );
}

Now lets try passing in an index to increment the character code of our letter:

function cipherLetter(a, index) {
    return String.fromCharCode( a.charCodeAt(0) + index );
}

Now, we want to keep the character codes with the range [32, 126], wrapping shifted characters from the end back to the beginning (so tilde shifted 1 would return a space).  Well look at just he logic for the indexing:

 a.charCodeAt(0) + index             //need to wrap this around
(a.charCodeAt(0) + index - 32)       //start from a 0 base
(a.charCodeAt(0) + index - 32) % 95  //mod length of range. gives relative index of the new character
32 + (a.charCodeAt(0) + index - 32) % 95 //return to a base of 32, starting from space

So the final function should look like this:

function cipherLetter(a, index) {
    return String.fromCharCode( 32 + (a.charCodeAt(0) + index - 32) % 95 );
}

Now that we have the letter shifting logic, doing words and arrays is a simple matter of looping:

function cipherWord(word, index) {
    var cword = "";    for(var i = 0; i < word.length; i++) {
        cword += cipherLetter(word.split("")[i], i + index);    }
    return cword;
}

function cipherArray(array, index) {
    var carray = [];    for(var i = 0; i < array.length; i++) {
        carray.push(cipherWord(array[i], i + index));    }
    return carray;
}

Now we can test it out:

console.log(cipherArray(word1, 1));

gives us 

["rwlgp", "hdwy", "vpt}", "wuxp|nv%", "t|lzsy%qq", "n!xn|lo"w&u", "plry up"]

To get the decipher, we just work in reverse, starting with deciphering a letter.  This time, instead of adding the index, we want to subtract the index. Rather than working from the beginning of our range (32), we need to base our index on the end of our range (126).  

126 - a.charCodeAt(0)                       //figure out how far base leter is from end
126 - a.charCodeAt(0) + index               //increase that by the index
(126 - a.charCodeAt(0) + index) % 95        //wrap that around to get the final index
126 - ((126 - a.charCodeAt(0) + index) % 95) //subract from end to get the final letter code

Now we insert the deciphering algorithm into our function:

function decipherLetter(a, index) {
    return String.fromCharCode( 126 - ((126 - a.charCodeAt(0) + index) % 95) );
}

The other two functions are nearly identical to their cipher counterparts:

function decipherWord(cword, index) {
    var word = "";    for(var i = 0; i < cword.length; i++) {
        word += decipherLetter(cword.split("")[i], i + index);    }
    return word;
}

function decipherArray(carray, index) {
    var array = [];    for(var i = 0; i < carray.length; i++) {
        array.push(decipherWord(carray[i], i + index));    }
    return array;
}

Sure enough, if we run these through the paces:

var encoded = cipherArray(word1, 1);
console.log(encoded);
console.log(decipherArray(encoded, 1));

We get the expected result:


Better obfuscation with base64 encoding


While writing about the cipher methods I used above, I got to thinking about other ways of obfuscating the text, and it occured to me that one could probably use base64 encoding to disguise the word lists, and as it turns out, it's WAY simpler that way.  Here are the two methods you could use:

function obfuscate(array) { return window.btoa(array.toString()); }
function clarify(string) { return window.atob(string).split(","); }

Basically what is happening here, is that the array is stringified, which turns it into a comma seperated list, which is then encoded to base64. So:

["quick", "fast", "slow", "spritely", "overjoyed", "hyperactive", "idiotic"]

becomes

"cXVpY2ssZmFzdCxzbG93LHNwcml0ZWx5LG92ZXJqb3llZCxoeXBlcmFjdGl2ZSxpZGlvdGlj"

The clarify() function neatly reverses the process, turning the base64 string into a comma seperated string of plain text, which is then split into an array on the commas.  Easy peasy. I got so caught up writing my cipher that I forgot the actual problem.  This is actually a much more effective and elegant solution to my problem (hiding my content).  But writing the cipher and decipher methods was fun anyway, so no harm no foul lol.

The live demo and concluding thoughts


I decided it would be fun to wrap this all up with a Jsfiddle demo.  A couple observations I had while dicking with it:
  • In cipher, the string delimiter (comma) is in the same range as the values, so a character ciphering to a comma is going to break deciphering.  I was able to work around it by using a whitespace character outside the range (char code 31), this makes the string look like one solid line.
  • The ciphered string also messes with you if you try to retrieve it with [element].innerHTML.  This was fixed by switching to "value" for the text areas.
Really just that much more evidence that base64 encoding is the better strategy.  With no further ado, for your fiddling pleasure:



No comments:

Post a Comment