2016年3月9日 星期三

JIT In Android ART

Interestingly, although Android claim that it's latest Runtime(ART) adopts Ahead-Of-Time(AOT) compiling, a jit folder was silently shipped into art/compiler folder within AOSP around the early era of Marshmallow.

Fact is that: the installation procedure takes a long time on some of the devices running on ART. E.g. Facebook sometimes takes 2 minutes to install! Perhaps that's the reason why Android want to move back to JIT.

Projects usually use interpreter along with JIT engine. That is, interpreting the code or byte code first and collecting the profile information including how often a method is executed, aka. how "hot" the method is, and type information if you're working with dynamic type language. After several turns, if a method is "hot" enough, the execution engine would use the JIT compiler to compile the code into native code and delegate the execution to the native compiled method in every invoking of that method afterward. E.g. Dalvik VM's JIT engine.

Nevertheless, there are also some projects don't use interpreter, but instead using an extremely fast compiler to compile each method executes next ahead before using another optimizing compiler to do more optimized compiling on those "hot" methods. E.g. Google V8 javascript engine.

The latter approach is usually faster, but to my surprise, the new ART JIT adopts the first, the interpreter combo.

The great journey of ART's JIT starts from art/runtime/jit/jit_instrumentation.cc. Instrumentation in ART acts like a listener listens for various of interpreting or compilation events. E.g. methods invoking, branches and OSR(On Stack Replacement). JitInstrumentationCache::CreateThreadPool() adds the JitInstrumentationListener instance to the runtime instrumentation set.

JitInstrumentationListener listens to three events: method entered(JitInstrumentationListener::MethodEntered()), branches(JitInstrumentationListener::Branch()) and virtual or interface method invoking(JitInstrumentationListener::InvokeVirtualOrInterface()). The compilation triggers, instrumentation_cache_->AddSamples(...), reside within method entered and branches callbacks

JitInstrumentationCache::AddSamples() shows that ART JIT uses a slightly modified counter approach to profile execution flows. Usually, JIT compiler simply set a counter threshold and trigger compilation task after exceeding that value. But there seems to be THREE counter thresholds in this case: warm_method_threshold_, hot_method_threshold_ and osr_method_threshold_. Constructing a JIT system with more levels. The values are passed from the JVM arguments(JVM is an interface, not an unique instance, ART is one of the implementations) but I can't find those arguments at this time. But from the code arrangement we can inferred that warm_method_threshold < hot_method_threshold < osr_method_threshold. I'm also wondering how osr_method_threshold woks.

If one of the thresholds is reached, it would arrange a JitCompileTask. The following flow is pretty interesting:  Jit::CompileMethod() would be invoked, but Jit::Compile() is actually a stub of jit_compile_method(). What's special about jit_compile_method()? It's a C symbol loaded from dynamic library libart-compiler.so.  libart-compiler.so has nothing special, it's source files live side by side with source files mentioned above, I think modularization is the main reason why they adopt this kind of ad-hoc approach.

After going into jit_compile_method()OptimizingCompiler::TryCompile() would be called. Few months ago, there are two compilation levels in TryCompile(): CompileBaseline and CompileOptimized. But now, those levels is replaced by a neater, single level  approach:
  // Try compiling a method and return the code generator used for
  // compiling it.
  // This method:
  // 1) Builds the graph. Returns null if it failed to build it.
  // 2) Transforms the graph to SSA. Returns null if it failed.
  // 3) Runs optimizations on the graph, including register allocator.
  // 4) Generates code with the `code_allocator` provided.
  CodeGenerator* TryCompile(ArenaAllocator* arena,
                            CodeVectorAllocator* code_allocator,
                            const DexFile::CodeItem* code_item,
                            uint32_t access_flags,
                            InvokeType invoke_type,
                            uint16_t class_def_idx,
                            uint32_t method_idx,
                            jobject class_loader,
                            const DexFile& dex_file,
                            Handle<mirror::DexCache> dex_cache,
                            bool osr) const;

The comments had explained almost everything. The graph is an instance of HGraph class, which is easy to perform various of compiler optimizations. ART JIT use a method-based JIT compiler in contrast with the old DalvikVM JIT, which use trace-based compiler and switch to method-based compiler only under device charging.

In summary, JIT in ART doesn't seem to use any special techniques, so in my opinion, the key of performance falls on the interpreter, I would take some time researching on that part.

沒有留言:

張貼留言