2016年3月21日 星期一

Ignition - The Interpreter in V8 Javascript Engine

Mentioned in my previous article, V8 started to use a new interpreter, Ignition, to replace baseline compiler on some low-memory devices. Let's check it out!

Interpreters are notorious for its performance bottleneck, and they're also proved to be slower than most of the JIT runtimes. I've explained some of the bottlenecks of interpreters and background knowledge in one of my blog post(section 2 and 3). So I think folks in Chromium team must spend lots of efforts on Ignition before bringing it rivaling performance with the old JIT solution.

There has already a document written by Chromium team about Ignition. To summarize its content, Ignition uses a streaming execution flow which executes one byte code handler after another. That is, a byte code handler function would recursively invoke the dispatcher at the end of its function body. Of course, tail call optimization is required to avoid smashing the stack. The following figure illustrates overview of the execution flow.
So let's start our code journey. Everything starts from (src/interpreter/interpreter.cc) Interpreter::Initialize(). We encounter a strange macro immediately: GENERATE_CODE(Name, ...).
#define GENERATE_CODE(Name, ...)                                        \
  {                                                                     \
    InterpreterAssembler assembler(isolate_, &zone, Bytecode::k##Name); \
    Do##Name(&assembler);                                               \
    Handle<Code> code = assembler.GenerateCode();                       \
    dispatch_table_[Bytecodes::ToByte(Bytecode::k##Name)] = *code;      \
    TraceCodegen(code);                                                 \
    LOG_CODE_EVENT(isolate_,                                            \
                   CodeCreateEvent(Logger::BYTECODE_HANDLER_TAG,        \
                                   AbstractCode::cast(*code), #Name));  \
  }
  BYTECODE_LIST(GENERATE_CODE)
#undef GENERATE_CODE

It is undefined right away, so we can inferred that it is only used by BYTECODE_LIST. As its name, BYTECODE_LIST declares all of the byte codes. And GENERATE_CODE here just help it defining their handlers and some dispatch information. The Do##Name(&assembler) statement seems to has something to do with byte code handlers, let's grep one of them and see what's going on.
// Mov <src> <dst>
//
// Stores the value of register <src> to register <dst>.
void Interpreter::DoMov(InterpreterAssembler* assembler) {
  Node* src_index = __ BytecodeOperandReg(0);
  Node* src_value = __ LoadRegister(src_index);
  Node* dst_index = __ BytecodeOperandReg(1);
  __ StoreRegister(src_value, dst_index);
  __ Dispatch();
}

This is the byte code handler for mov. It looks just like a normal byte code handler, right? Then have a closer look: Do##Name(&assembler) is a function call instead of a callback object we usually seen in many interpreters. In another word, Do##Name(&assembler) acts more like a routine responds for configuring byte code actions. After scrolling to the top of interpreter.cc, you can find that the inconspicuous "__" above turns out to be pretty important.
#define __ assembler->

InterpreterAssembler is respond for encapsulating a small group of assembly lines as a byte code handler, and also, abstracting the underlying platform to provide a uniform assembly code emitting interface here. What's more, we can witness more power on this kind of design in comparison with normal byte-code-handler-function fashions.

Next, we'll focus on the InterpreterAssembler::Dispatch() function. It will be delegated to several routines.
void InterpreterAssembler::Dispatch() {
  DispatchTo(Advance(Bytecodes::Size(bytecode_)));
}

void InterpreterAssembler::DispatchTo(Node* new_bytecode_offset) {
  Node* target_bytecode = Load(
      MachineType::Uint8(), BytecodeArrayTaggedPointer(), new_bytecode_offset);
  if (kPointerSize == 8) {
    target_bytecode = ChangeUint32ToUint64(target_bytecode);
  }

  // TODO(rmcilroy): Create a code target dispatch table to avoid conversion
  // from code object on every dispatch.
  Node* target_code_object =
      Load(MachineType::Pointer(), DispatchTableRawPointer(),
           WordShl(target_bytecode, IntPtrConstant(kPointerSizeLog2)));

  DispatchToBytecodeHandler(target_code_object, new_bytecode_offset);
}

void InterpreterAssembler::DispatchToBytecodeHandler(Node* handler,
                                                     Node* bytecode_offset) {
  if (FLAG_trace_ignition) {
    TraceBytecode(Runtime::kInterpreterTraceBytecodeExit);
  }

  InterpreterDispatchDescriptor descriptor(isolate());
  Node* args[] = {GetAccumulator(),          RegisterFileRawPointer(),
                  bytecode_offset,           BytecodeArrayTaggedPointer(),
                  DispatchTableRawPointer(), GetContext()};
  TailCall(descriptor, handler, args, 0);
}

The Advance() call in Dispatch() is clearly to be stepping to next byte code, and Load() call in DispatchTo() is responds for loading byte code object. Everything looks normal, except the TailCall() at the bottom of DispatchToBytecodeHandler().

As mentioned in the above sections, Ignition invokes byte code handlers
one after another in conjunction with dispatchers. Another key factor is that tail call optimization is required for this concatenation approach or stack overflow may occur. TailCall() in the above snippet reside in CodeStubAssembler class, which is also the parent class of InterpreterAssembler, it will then delegate the flow to RawMachineAssembler::TailCallN().
Node* RawMachineAssembler::TailCallN(CallDescriptor* desc, Node* function,
                                     Node** args) {
  int param_count =
      static_cast<int>(desc->GetMachineSignature()->parameter_count());
  int input_count = param_count + 1;
  Node** buffer = zone()->NewArray<Node*>(input_count);
  int index = 0;
  buffer[index++] = function;
  for (int i = 0; i < param_count; i++) {
    buffer[index++] = args[i];
  }
  Node* tail_call = MakeNode(common()->TailCall(desc), input_count, buffer);
  NodeProperties::MergeControlToEnd(graph(), common(), tail_call);
  schedule()->AddTailCall(CurrentBlock(), tail_call);
  current_block_ = nullptr;
  return tail_call;
}
There are many V8's abstraction types like Node and CallDescriptor in the above code, let's just not dig into those objects first. What we can infer is that a Node object seems to represent a piece of execution code and TailCallN() creates a special "tail call node" that is appended at the bottom of current execution block.

This approach sounds reasonable, and it also appeal the pros we mentioned before on the adoption of InterpreterAssembler rather than byte code handler functions. If we use byte code handler functions as handlers and also want to have the same cascading execution fashion, tail call optimization would all be depending on compiler. I'm not judging that modern compilers can't do this job well, but we want more explicit controls on the behavior of our interpreter. In contrast, the InterpreterAssembler abstracts handlers into code pieces which can be freely arranged in any place in the execution flow. So getting the same effect of tail call optimization is just a piece of code(cake)

Last but not the least, we want to know more about the Node class. It turn our that Node is just an interface representing a node in a graph. There are lots of implementation in the entire V8 codebase. Like AstGraph, BytecodeGraph and RawMachineAssembler in our case. 

There are two common traits in both Ignition and Android ART's interpreter mentioned in my previous article

  1. They call byte code handlers and dispatchers(byte code decoders) one after another without returning to the parent function or using loop statement to iterate though byte codes. 
  2. They directly emit assembly code as the main execution core for both performance and design concern(like reaching tail call optimization). 

In my opinion, both of the traits above represent the latest modern interpreter design. We can use them in many projects as long as they have similar interpreting execution flow. 

沒有留言:

張貼留言