zondag 6 juli 2014

Moar JIT progress

So, it seems I haven't blogged in 3 weeks - or in other words, far too long. It seems time to blog again. Obviously, timotimo++ has helpfully blogged my and other's progress in the meantime. But to recap, since my last blog the following abilities have been added to the JIT compiler:

  • Conditionals and looping
  • Fast argument list access
  • Floating point and integer arithmetic
  • Reading and writing lexicals, and accessing 'world values'.
  • Fast reading and writing of object attributes
  • Improved logging and bytecode dumping.
  • Specialization guards and deoptimisation
The last of these points was done just this week, and the problem that caused it and the solution it involves are relevant to what I want to discuss today, namely invocation.

The basic idea of speculative optimization - that is, what spesh does - is to assume that if all objects in the variable $foo have been of class Foobar before, they'll continue to be FooBar in the future. If that is true, it is often possible to generate optimized code, because if you know the type of an object you typically know its layout too. Sometimes this assumption doesn't hold, and
then the interpreter must undo the optimization - basically, return the state of the interpreter to where it would've been if no optimization had taken place at all.

All the necessary calculations have already been done by the time spesh hands the code graph over to the JIT compiler, so compiling the guards ought to be simple (and it is). However, an important assumption broke because of it. The MoarVM term for a piece of executable code is a 'frame', and the JIT compiler compiles whole frames at a time. Sometimes frames can be inlined to create bigger frames, but the resulting code always represents a single new frame. So when I wrote the code responsible for entering JIT-ted code from the interpreter, I assumed that the JIT-ted code represented an entire frame, at the end which the interpreter should return control to its caller.

During deoptimization, however, the interpreter jumps from optimized, type-specific code, to safe, unoptimized 'duck-typing' code. And so it must jump out of the JIT into the interpreter, because the JIT only deals with the optimized code. However, when doing so, the JIT 'driver' code assumed that control had reached the end of the frame and it ought to return to the caller frame. But the frame hadn't completed yet, so where the caller had expected a return value there was none.

The solution was - of course - to make the return from the current frame optional. But in true perl style, there is more than one way to do that. My current solution is to rely on the return value of the JIT code. Another solution is to return control to the caller frame - which is, after all, just a bit of pointer updating, and encapsulated in a function call, too - from the JIT code itself. Either choice is good, but they have their drawbacks, too. Obviously, having the driver do it means that you might return inappropriately (as in the bug), and having the JIT code might mean that you'd forget it when it is appropriate. (Also, it makes the JIT code bigger). Moreover, the invoked frame might be the toplevel frame in which case we shouldn't return to the interpreter at all - the program has completed, is finished, done. So this has to be communicated to the interpreter somehow if the JIT-code is considered responsible for returning to the frame itself.

The issues surrounding a JIT-to-interpreter call are much the same. Because MoarVM doesn't 'nest runloops', the JIT code must actually return to the interpreter to execute the called code. Afterwards the interpreter must return control back to the JIT code. Obviously, the JIT-ted frame hasn't completed when we return to the interpreter during a callout, so it can't return to its caller for the same reason. What is more, when calling out to the interpreter, the caller (which is JIT code) must store a return address somewhere, so the JIT driver knows where to continue executing after the callee returns.

I think by now it is too late to try and spare you from the boring details, but the summary of it is this: who or what should be responsible for returning control from the JIT-frame to the caller frame is ultimately an issue of API design, specifically with regards to the 'meaning' of the return value of the JIT code. If the 'driver' is responsible, the return value must indicate whether the JIT code has 'finished'. If the JIT code is responsible, the return value must indicate whether the whole program has finished, instead. I'm strongly leaning towards the first of these, as the question 'is my own frame finished' seems a more 'local' answer than 'is the entire program finished'.

With that said, what can you expect of me the coming week? With object access and specialization guards complete, the next step is indeed calling to interpreted code from the JIT, which I have started yesterday. I should also get at argument passing, object creation, decontainerization, 'special conditionals', and many other features of MoarVM. The goal is to find 'compilation blockers', i.e., operations which can't be compiled yet but are common, and work through them to support ever greater segments of compiled code.

In the long run, there are other interesting things I want to do. As I mentioned a few posts earlier, I'd like to evolve the 'Jit Graph' - which is a linked list, for now - into a 'real' graph, ultimately to compile better bytecode. An important part of that is determining for any point in the code which variables are 'live' and used, and which are not. This will allow us to generate code to load important variables - e.g., the pointer input arguments buffer - temporarily in a register so that further instructions won't have to load it again. It will also allow us to avoid storing a computed value in a local if we know that it will be overwritten in the next instruction anyway (i.e., is temporary). Because copy-instructions are both very frequent and potentially very costly (because they access memory), eliminating them as best as possible should result in great speed advantages. Ultimately, this will also allow us to move more logic out of the architecture-specific parts and into the generic graph-manipulating parts, which should make the architecture-dependent parts simpler. I won't promise all this will be done in a single summer, but I do hope to be able to start with it.




Geen opmerkingen:

Een reactie plaatsen