Saturday, December 11, 2010

The abstract JVM

What is the sign of the value one gets by multiplying 87457 with -37239? This is immediately clear. We need not actually perform the multiplication. Instead, we know that "plus times minus yields minus". We are abstracting away from the concrete numbers and reformulate the question in another algebra with only 4 possible sentences (plus times plus, plus times minus, minus times plus and minus times minus). In other words we perform an abstract interpretation.

What is the sign of 23-187? Again, we need not compute the numeric result to answer the question. We know that when we subtract a number from a smaller number that the result will be negative. For some reason, it is (sometimes, but consider 498374635243648 and 498374635246348) easier for our brain to compare numbers than to subtract them actually. This is especially interesting when one knows that contemporary computers think differently. For them, subtracting numbers is easy. Hence, if one needs to compare two numbers, one subtracts the second from the first and when the sign is negative, then the first number was smaller.

A similar method of abstract interpretation could be employed in our quest for knowledge about certain properties of native functions (i.e. java methods). The problems, solution strategies and (epistemo)logical basics have been detailed in previous posts. Here is a summary:

  1. Native functions cause problems through null values, exceptions and side effects if and when this is not correctly declared.
  2. The compiler should do its best to verify the truth of the assertions made by the programmer.
  3. It will often be only possible to prove that the assertion is false. There are cases where even this will be impossible. In the case of null values we must insist on a positive proof that the function cannot return null.
I envision an abstract JVM, that can interpret the bytecode of a java method and infer evidence about the methods behavior that is interesting for us.
There are quite some tools that do similar things, though abstract interpretation is mostly called static analysis. For example FindBugs, a tool to find bugs in java programs, or soot, a Java optimization framework. The latter even has a (not at all documented) purity check. Unfortunately, it insists on "whole program analysis", i.e. one must name a class with a main-method. (I can imagine that this is because of the problem with interface methods - one can say nothing at all about an interface method per se. Yet, in a whole program analysis it is perhaps possible to enumerate all classes that may possibly occur, and then one also has the set of all classes that implement a given interface and can inspect the concrete methods.)

The FindBugs tool uses the Byte Code Engineering Library (BCEL), and I think this makes also a good choice for our purposes.

Our abstract JVM would operate on an abstract machine state, that contains abstract local variables and the abstract stack. Like in the multiplication example above, we of course abstract from concrete values. The only interesting thing for us are the types of references, their visibility and whether it is possible that they are null. The abstract interpretation of a method starts out with an empty stack and the set of the argument local variables including this with their declared types. All arguments are visible (of course) and not null (anticipating the ability to rule out null values anywhere in the Frege realm).
The method might start with a aload_0 instruction. In a real JVM this pushes the value of this on top of the stack. In our abstract JVM we push a copy of abstract local variable 0 on the stack, which is the triple (type of this, visible, not null). The next one could be a getfield #1. Here our abstract machine must lookup the type of the field #1 and must replace the top of the stack with the triple (type of field, visible, possibly null) . And so on. Eventually, there will be some return instruction, and in case it's areturn and we have a possible null value on the stack, we note this. We know also when in the course of interpretation there was a putfield instruction and a visible target object was on the stack.
By the way: we see also the return type, and when it is U extends X, we cannot accept declarations that say it's just a X. For we must insist on knowing the most specific type of all objects that are injected into Frege, lest our abstract interpretation is worthless. Consider, the byte code of X.foo might be pure, when the byte code of U.foo is not. No "inheritance", no "subtyping" there, sorry, no way to give us an X for an U. It's not my fault that the Java language pays no attention to such matters except for checked exceptions. In a proper OO language, it should be impossible to overwrite a method that returns t objects with one that returns t objects or null, in general, the overriding method must not have more freedom than the overridden one.

Back to the abstract JVM: I think, this is also how byte code verifiers work: they keep track of the types of stack values and check - among other things - if the method calls are valid. This gives me some hope that the abstract JVM could be reasonably fast. In contrast, the tools/frameworks mentioned above seem to need enormous amounts of memory and cpu time. We'll see how it turns out.

Of course, our interpretation being abstract means that we will have false positives. Consider:
public X foo() { return false?null:this; }
It doesn't matter that in reality this function never returns null. Unless the Java compiler optimizes the conditional expression a bit so that we can't see it in the byte code, our abstract machine will think that the method indeed may return null.
However, if the null-tracking turns out to be trustworthy, why not employ this knowledge in cases like
public X foo(Y bar) {
     if (bar == null) this.x = 42; 
     return this; }
Under the assumption that bar is not null, the method could then be classified as pure.
I am really curious how frequently the abstract JVM will assure us that a method is null-safe.

It is clear that the correctness of our results depend on the level of "abstractness" we employ. The problem leads naturally to a program that starts out with a very high level of abstraction, but can be developed further and further to take more and more knowledge into account. One example would be to feed back what we know about possible null values so that we sometimes do not need to follow a ifnull byte code. Or take the getfield example from above. Isn't it a bit too defensive to assume that a field is possibly null? What if the field is final and all constructors of the class initialize it with a non null value?
The nice thing is that every less coarse abstraction potentially has a very positive feedback. For, assume that on a certain abstraction level  we concluded that some method deep in the java runtime system can return null. This could be the reason for alleging  many innocent functions that use the former one in the calculation of their result to also return null. Thus, on a slightly finer abstraction level we could prove null-safety for many functions that do not themselves exhibit the detail that is now taken into account.
The downside is of course, that computing resources needed for abstract interpretation will increase with every decrease in abstractness, perhaps so much so that one must finally decide to abstract certain hard to compute details away again and leave it at that. Experience will show.

Next task is to think about a sound "safety signature", a data structure where we can record the behavior of a method once it is computed (the alternative being to interpret anew each repeated function call). Problem is that the behavior can depend on the methods arguments. Take the example above, foo is pure under the assumption that its argument is not null. If it is null, then this is modified. But this matters only if this is visible, i.e. a method that does new X(...).foo(null) is innocent.

No comments:

Post a Comment

Comments will be censored by me as I see fit, most likely if they contain insults or propaganda for ideologies I do not like. Comments that are on topic will not be censored. If I leave a comment uncensored this does not imply that I agree with the opinions expressed therein.