When I asked Gino (alias Roberto Baldi mostmobbed) a software solution for the “halting problem”, he told me “should not be so difficult”!
“In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.” (from Wikipedia)
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist; but as Gino says, Turing was a pessimist…
Why I’m scratching about the halting problem? Because I need power at BugsVoice user’s fingertips. In particular, I need to allow the execution of user’s JavaScript code on my server, and knowing if these scripts end may be comforting.
If you already heard about XSS you probably know that “third party” code execution, authorized or not, could be a nightmare even in the client… can you imagine how can be ugly on your server a “pirate” code?
Going more in detail, BugsVoice is a service that receives a “bug” from a customer’s server, process the request locally, serves a friendly feedback to the user and stores the bug in its database.
JavaScript (JS from here on) server-side execution is involved in the “processing” phase.
We supply a pre-filled and certified set of “rules” for processing bugs, but we even allow customers to create their own rules.
JS gives you the power of inspecting the error to understand what happened and gives your customer an error better than a “500 server error” in order to comfort it and recover a situation where your application credibility is going down. An interesting reading about error recovering and error feedback is the book “Defensive design for the web” from 37 Signal.
The complete BugsVoice process includes mainly three parts:
1) on the customer’s server side, an error page that catches the exception, collects as much information as possible (logged user, time, server status, database status, memory etc.) and redirects the user to our BugsVoice server (see how to configure an error trapping page on BugsVoice blog for more details).
2) our server, reading user preferences recovers the error template. Each template is fully dynamical and customizable; it introduces some “variables” that can be filled from the error happened. Then our server creates two JS objects: the “bug” object filled with the error collected and the “template” object filled from layout skeleton.
3) the JS rules are executed to fill “template” from “bug” or for rejecting the request.
4) the layout is rendered to the user by using “template” and “bug” objects. The bug is stored on our server.
5) the user feedback is collected and stored.
6) a “thank you” page is displayed to the user.
Then there is the error management but this is interesting “only” for BugsVoice’ users, not for this post. Here some error pages from BugsVoice:
So every user can create its own rules in order to inspect, for instance, the received bug’ stacktrace trying to discover if a database problem happens, or if there is a problem with the latest version of some browser.
Coming back to rules execution:
during step 3) we get rules from the user configuration and we execute them on our server. We use the Java SE 6 scripting features supplying an ECMAScript engine to run rules. A scripting engine instance is isolated from the JVM environment and you must declare the resource (libraries) you want to made available in the execution context.
Before executing them, the context is fed by “bug” and “template” objects. Then we run the rules…(drum roll!).
A basic (and friendly) rule example :
if (bug.code==404) errorPage.errorMessage="Page missing: you get this error because of...";
Of course this code is safe, but what happens if an evil user composes a pleasant rule like
while(true);
or
function snake(s){
return "s"+snake(s);
}
snake(":-<");
… or even worst?
Sadly Turing beats Gino 1-0, and there is no general solution to the question “does this rule ends?”.
The only possible solution is to narrow the scope of the problem by introducing some fences.
A solution is to set up an external observer using multi-threading and watch dogs in order to kill processes after a while, but the best solution is to avoid infinite loop situations.
Rules in our context are used mainly for discovering string patterns in the error stacktrace and for building better feedback; we do not need to iterate or create complex functions, so reducing the set of possible JS statement is possible without loosing “power”.
Luckily in JS there is a limited set of statements for iteration and recursion; so if we are able to “kill” bad intentions by forbidding dangerous statement like “while”, “for” or function definition we can run rules with confidence.
This way we reduce the complex halting problem to the (quite) easy problem of HTML sanitization (where you must remove some unaccepted tags. See XSS war: a Java HTML sanitizer ).
Actually identifying “while” or “for” statements in a complex code is not as easy as finding the “while” string. The find/replace approach it’s too rough, and here we need a more accurate solution in order to understand the difference between
while (true);
and
var dummy= “while(true)”;
that is obvious for us but not for a string searcher…
You must use something to analyze the code token by token.
ANTLR 3 supply all you need for tokenizing, parsing and walking your code. You need a JS grammar and then ANTLR will build all the stuff. We used the ES3 grammar from Xebic Reasearch (BSD license) based on the original work of Patrick Hulsmeyer, that fits perfectly our needs.
With the AS3 grammar we built parser, lexer and walker to analyze rule’s code to intercept every forbidden statement and avoid accepting dangerous scripts (at least I hope this). Only the rules that pass the test will be saved on the system and will be available to the script engine.
Ok, I can confess you, the post’ title is a little misleading, there is no way to solve the halting problem at least without cheating!