Failed the Turing Test: Java Multithreading in Google AppEngine

I'm currently working on a security app using Google AppScript. One bit of functionality the product owner wanted implemented was checking ip addresses against a blacklist. The original idea was for a self generated blacklist, but before I started implementing that, I wanted to see if there were any blacklists out in the wild that I could leverage. While there are many blacklists out there (a few of the big ones are on this wikipedia page "Comparison of DNS blacklists"), almost without exception they use a specially formatted DNS query to get results. This is a huge dealbreaker for a JavaScript app, as there is no way to do a DNS query in AppScript (yeah, node.js does it, but alas, not using node, so...). So I made the decision to stand up my own Java REST endpoint in AppEngine.

I hadn't touched Java proper in months, so I figured there would be a bit of reacclimitization to be done. Honestly, though, the biggest hurdle I faced was just getting the initial project template running without Eclipse and/or Maven complaining about everything. I got the dreadded "jface null pointer exception" drama, which had something to do with misconfiguration of the deployment descriptor or dynamic webpages. I must have monkeyed around with that for a couple hours before I started digging around for a better starting template (I tried the "hello endpoints" template from Google's documentation, and the Maven endpoints-skeleton archtype with no luck). Finally I fired up the "Hello World" example from the Udacity "Developing Scalable Apps in Java" course I took a year ago, and that one worked without a problem. So I was finally ready to start. Ugh...

Querying the blacklists: naive approach

Now that I had a working REST service that I could hit from AppScript, it was time to make it do something useful. StackOverflow had led me to a bash script on Github (blcheck) that included 100+ blacklist domains. Querying a DNSBL is done by reversing the domain or ip being checked, and appending the blacklist domain. This is then given to a program like dig on a *nix system. So checking 127.0.0.2 (a common test ip for these blacklists) against the SORBS blacklist would look like this:

Each of the responses has a certain meaning. While they all seem to follow the 127.x.y.z format for return values, what x, y, and z mean seem to differ from blacklist to blacklist. So for my purposes, I'm just checking whether there is a hit or not.

In Java, it's possible to do the same kind of query using InetAddress.getByName(host) where host is a String. The way these blacklists work, if the value being queried is not in their list, they return NXDOMAIN, which is basically the technical way of saying "Not Found", and unsurprisingly, getByName will throw an UnknownHostException when this occurs. In my code, all I'm interested in is a rough count of how many positive blacklist hits a given ip address gets, so I just increment a counter on a good response, round file UnknownHostExceptions, log any other exceptions (got some kind of Socket exception once), and return the total hit count.

public int getHitCount(String[] bls){
    int count = 0;
    for(String bl : bls){           
        String host = revip + "." + bl;
        if(isHit(host))
            count++;
    }
    return count;
}

public boolean isHit(String host) {
    long start =  System.currentTimeMillis();
    boolean isHit = false;
    
    try {
        InetAddress result = InetAddress.getByName(host);           
        System.out.println(result.toString());
        isHit = true;
    } catch (UnknownHostException e) {
        // Gulp
    } catch (Exception e){
        System.out.println(e.getMessage());
    }
    long duration =  System.currentTimeMillis() - start;
    System.out.println("Query of " + host + " took " + duration);
    return isHit;
}

On my first attempt, I queried all the blacklists sequentially, and even though they are fast, doing a hundred of them adds up pretty quick. I was typically seeing 30-50 second response times from my REST service. Obviously if I have to do a couple thousand of these in a resonably short time, this is not going to work at all. In the screenshot below, the test is using the "production" blacklist, which it cycles through, in order, one at a time:

I made a cheesy graphic to illustrate what is happening:

Basically, we make a query, wait for a response, query again, wait again, ad nauseum, until we are through the list.

My initial solution was to just make as many calls as possible from AppScript to make up for the long response time. I did this using triggers. Using some clever cache magic, I was able to divide up the work of resolving a long list of ips among successive trigger calls. Then it was just a matter of spamming the trigger (set it to call every minute).

The problem with this approach is that there is a 6 hour total trigger execution time limit (for premium accounts, consumers get even less). So it didn't matter if I could theoretically finish the work in time with a bunch of triggers, because after six hours of work they would all just quit. At 50 seconds a pop, 2000 calls were going to take 27+ hours to finish. Even with the 15-20 second responses I was typically seeing after shortening the list of blacklists, it just wasn't going to work (15 seconds was about the best I usually did, and 2000 of those was still over eight hours). So I knew I had to make the REST service faster.

Yikes! Too slow...

My first idea to make the web service faster was just to trim the fat on my "production" black list. I think this helped get me from the 30-50 second range you see in the logs above to the 15-20 second range I was seeing later. Obviously this helped but it wasn't enough.

The asynchronous approach: Futures<>

Now, for the longest time I didn't think you could do multithreading in Java AppEngine apps. The JRE whitelist doesn't include java.utils.concurrent, so I figured I was stuck with task queues. And I really had no idea how taskqueues were going to help me. But as I was looking around the web for a solution, I found this article about doing async UrlFetch calls in AppEngine. His example was super simple: create a list of Futures, then iterate through them calling .get() on each. While the author didn't seem to like the approach for anything other than a simple illustration, I thought it could work in my application.

Now, in the example, the author is calling the fetchAsync() method on the URLFetchService instance. The InetAddress class doesn't have an analogous method, so I was going to have to try something else, but I figured since that example used Futures, maybe I could just implement something from scratch with Futures. I didn't know anything about Java Futures, so I googled it and came across a tutorial, java.util.concurrent.Future Basics. This gave me the basic building blocks, but I wasn't sure if all the things they were using would work in AppEngine.

So another search and I found the final bit of the puzzle, some sample code on Github that demonstrated using appengine.api.ThreadManager to get a ThreadFactory and instantiate a thread pool.

The getHitCount() method will now closely resemble the example async UrlFetch code. First it iterates through all the urls in the blacklist, attaching the reversed ip address and passing it to the asynchronous version of isHit, which returns a Future<String>. It's important to note that these calls all return immediately, they do not wait for the result. Once all the urls in the blacklist are processed, we iterate through the list of futures and process the results.

public int getHitCount(String[] bls){
    
    int count = 0;
    List<Future<Boolean>> futes = new ArrayList<Future<Boolean>>();
    //get everybody started
    for(String bl : bls){           
        String host = revip + "." + bl;
        futes.add(isHitAsync(host));
    }
    
    //rack up the results
    for(Future<Boolean> result : futes){
        try {
            if(result.get())
                count++;
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
        
    return count;
}

The isHit method doesn't have to change, however we do need to wrap it in a Future, which is done in a method called isHitAsync. The Future is returned immediately, while the result can be retreived with the .get() method on that Future, which will return the value from isHit():

public Future<Boolean> isHitAsync(final String host){       
    ExecutorService pool = getPool();       
    return pool.submit(new Callable<Boolean>() {            
        public Boolean call() throws Exception {
            return isHit(host);         
        }       
    });
}

The getPool() method is what returns the instance of the thread pool. It really should be a singleton in its own class, but for now this works (my first attempt I was instantiating a new pool on every call, which quickly broke the 50 thread maximum). I haven't tested this theory, but since the request in being processed on one thread, I think I can get away with up to 49 additional threads. So it creates a fixed thread pool with up to 49 threads.

private ExecutorService pool = null;
private ExecutorService getPool(){
    
    if(this.pool == null){
    
        ThreadFactory factory = null;
        try{ 
            factory = ThreadManager.currentRequestThreadFactory();
        } catch (Exception ex) {
            
        }           
        
        if(factory != null) {
            pool = Executors.newFixedThreadPool(49, factory);
        } else {
            pool = Executors.newFixedThreadPool(49);
        }
    }
    
    return pool;
}

So now rather than executing one at a time, up to 49 threads are working simultaneously, so much of the waiting is now done in parallel. As quicker requests finish, a thread is freed to work on another job on the queue. Now the long queries that used to block every other operation are isolated to one seperate thread, and simply finish after everything else is done:

And just for kicks another cheesy diagram:

It shouldn't really matter what order you call get() on the list of Futures. If they were sorted by ascending chronological order, each successive call would only have to wait the marginal amount of time for the call to finish (so if items 1-n took 1-n ms to finish (i.e. item 1 took 1ms, item 2 took 2ms, etc.), then each successive call to get() would only take 1ms. If the reverse is true, if the result is retrieved first, then that call will block for n ms, and all the rest will return in essentially 0ms.

My AppEngine blacklist checker app is available on Github: blacklist-checker. I know, terribly original name. I might have to think of something more interesting in the future. For now it does it's job (much more quickly), and that is what really counts.

Failed the Turing Test

Saturday, July 16, 2016

Java Multithreading in Google AppEngine

Querying the blacklists: naive approach

The asynchronous approach: Futures<>

No comments:

Post a Comment