Truffle â A Self-optimizing Language Implementation Framework 1
Transcription
Truffle â A Self-optimizing Language Implementation Framework 1
5th Int. Conf. on Mathematics and Informatics, September 2-4, 2015, Tˆargu Mure¸s, Romania Truffle – A Self-optimizing Language Implementation Framework Hanspeter M¨ ossenb¨ ock Matthias Grimmer Johannes Kepler University Linz, Austria {hanspeter.moessenboeck,matthias.grimmer}@jku.at We present Truffle [10], a framework for building efficient programming language implementations based on tree rewriting and profile-driven specialization of the executed program. Source code is first transformed into an abstract syntax tree (AST), which is then interpreted. During interpretation, run-time information about the execution is collected and the AST is rewritten so that it specializes to the observed profile (e.g., to the observed types, values, and invocations). When the AST has reached a stable state it is dynamically compiled to efficient machine code. If specializations turn out to fail at run time the machine code is deoptimized and falls back to the interpreter. Truffle is particularly useful for implementing dynamically typed languages such as JavaScript or Ruby where type information must be collected at run time, but it also creates new optimization potential for statically typed languages. Furthermore, it supports seamless interoperability between different programming languages and even allows a memory-safe implementation of otherwise unsafe low-level languages such as C. We explain the concepts of Truffle and sketch the implementation of an AST interpreter with profile-driven specializations and dynamic compilation. We also shortly explain Truffle's potential for interoperability and memory safety. 1 Motivation Writing a compiler—especially one that produces highly optimized machine code—is considered to be a non-trivial task. Therefore, many language implementations are based on interpreters, which are easier to write and are often better suitable for dynamically typed languages such as JavaScript, Python or Ruby. A common approach is to transform the source program into some intermediate representation such as an abstract syntax tree (AST), which is then interpreted. Interpreters, however, are slow. They are usually written in some high-level language such as Java and hardly perform any optimizations. For dynamically typed languages, primitive types such as int or float are often boxed, i.e., operands of these types are wrapped into objects that are inefficient to use in computations. In many cases, run-time profiling can find out that a computation such as a + b always deals with operands of a specific type (e.g., int) and could therefore be handled much more efficiently if the operation would be considered to be an integer addition. The idea of Truffle is to replace the subtree for the addition in this case with a specialized subtree handling an int addition without boxing, which speeds up interpretation (and later also compiled code). In other cases, run-time feedback could also suggest that certain operands have always been seen to be specific constants. Replacing these operands with constants can gain additional speedup. In other words, the Truffle AST is specialized during execution to the observed profile. When an AST has reached a stable state and when it has been executed frequently enough it is dynamically compiled to machine code applying state-of-the-art optimizations. Thus, the most frequently executed parts of the program run efficiently in optimized machine code while the less frequently executed parts are continued to be run in the interpreter. If one of the assumptions on which a specialization was based turns out not to hold at run time (e.g., if an addition specialized for integer operands suddenly has to deal with floating-point operands) the corresponding part of the machine code is deoptimized, i.e., it is reverted to an unspecialized AST that is executed in the interpreter again. The possibility to deoptimize allows Truffle to apply specializations optimistically and aggressively, because it can always fall back to the unspecialized case if any assumptions turn out to be wrong. The Truffle project has been initiated at the Johannes Kepler University Linz and is now an official research project at Oracle Labs. It is freely available under the OpenJDK license [7]. Several other universities and research sites are contributing to it. 2 Truffle Concepts Truffle is a self-optimizing interpreter framework with dynamic compilation that is based on rewriteable abstract syntax trees. It is implemented in Java and runs on a modified version of the Java HotspotTM VM using its services such as garbage collection and deoptimization. The nodes of a Truffle AST are represented by classes that are derived from a common base class Node providing rewriteability. Every node class has an execute() method that is responsible for evaluating the AST rooted at this node. A node for a general operation (e.g., a + operation accepting operands that are integers, floatingpoint numbers, complex numbers, or even strings) can be dynamically replaced with a specialized node working only on integer operands, say, and which is therefore faster. A language implementer can declare any number of specializations using a domain-specific language [5] that specifies the operand types and other conditions under which these specializations can be applied. The specializations are compiled to Java source code that becomes part of the Truffle interpreter, which uses them to rewrite parts of the AST whenever specializations are applicable. When the AST of a function or a method has reached a stable state and when it has been executed frequently enough it is dynamically compiled to machine code using the Graal compiler [7], which is an optimizing compiler written in Java that can be invoked from Truffle. During compilation, the execute() methods of the affected AST nodes are inlined in their callers thus creating a single piece of code to which optimizations such as constant folding, common subexpression elimination, or escape analysis can be applied seamlessly. This is a form of partial evaluation that compiles an AST for the specific situation under which it was used so far. All specializations are guarded by run-time checks, which make sure that the assumptions under which the specializations were applied still hold. If one of these guards fails at run time, the specialization is taken back and the affected code is reverted to an unspecialized AST that is further executed in the interpreter. This is possible, because the Hotspot VM features deoptimization and the side-by-side execution of compiled and interpreted code. 3 Using Truffle for Interoperability Truffle's approach of rewriting an AST during execution can be used to support seamless interoperability between different languages that are implemented under Truffle. Although all AST nodes are derived from a common base class, different languages have specific node classes with execute() methods that perform language-specific operations. By combining AST nodes of different languages we can achieve transparent interoperability. Truffle allows an object x that was implemented in a source language S to be used in code that was implemented in a host language H. If the object is accessed there (e.g., x.f or in x[i]) this access is done in the syntax of the host language H. However, in the AST of language H the foreign access is represented in a language-independent way, namely by nodes that send messages to the accessed object, e.g., to read or write a property or to invoke a method. Upon first execution of such a message in the interpreter the message gets resolved, i.e., the receiver of the message returns an AST snippet representing the S -specific operations for performing this access in the 5th Int. Conf. on Mathematics and Informatics, September 2-4, 2015, Tˆargu Mure¸s, Romania source language S. This snippet then replaces the message node in the AST of the host program. Further executions of this access will not send a message but rather execute the S -specific nodes for the access operations. When the AST containing the foreign access gets compiled, inlining of the execute() methods will create a single piece of code that represents both the operations of the host language and of the source language. Compiler optimizations can thus work across language borders. In particular, it is possible to inline methods that were written in a different language than the host language. Truffle's interoperability mechanism works without any glue code between languages and does not require a common object model to which all languages are mapped. Furthermore, it is not restricted to a specific pair of languages but rather works for all languages that are implemented under Truffle. The feasibility of this approach has been shown for Ruby and C [2] as well as for JavaScript and C [4]. 4 Using Truffle for Memory Safety Low-level languages like C allow operations such as pointer arithmetic that can compromise memory safety, because they allow pointer values to reference memory outside of objects. Furthermore, manual deallocation of objects may lead to dangling pointers or memory leaks. In order to overcome these problems, we have built a memory-safe implementation of C on top of Truffle [3]. The idea is to allocate all objects on the Java heap, where they are automatically garbage-collected. A C pointer is represented as a reference to a Java object plus an offset, which is initially 0. Member accesses via this pointer as well as pointer arithmetic just increment the offset, and the runtime system takes care that it never exceeds the bounds of the referenced object. Metadata stored with the Java object allows us to map a pointer offset to a specific member of the object and to do type checking, even after casting the pointer to some other type. When an object is accessed via a C pointer, the access is represented by an AST that includes checks whether the access is safe. At run time, this AST is specialized according to the observed profile and is finally compiled to machine code, whereupon many of the checks can be optimized away. In order to detect whether the pointer is later set to an object of a different type, a guard is inserted into the code. If the guard fails the machine code is deoptimized and falls back to the unspecialized AST. To prevent dangling pointers we mark the Java object that is referenced by a C pointer as deallocated as soon as a free operation has been performed on the C pointer. When accessing the object we check whether the access refers to a deallocated object and report an error in that case. The Java object is automatically reclaimed by the Hotspot garbage collector when it is no longer referenced by any C pointer. The extra operations for ensuring memory safety cause some overhead, but part of it can be optimized away during JIT compilation. Our safe C implementation on top of Truffle is only 15% slower than code that was produced by GCC with the highest optimization level. 5 Evaluation Truffle has been used to implement a number of languages such as JavaScript [9], Python [11], Ruby [8], Smalltalk, and R [1]. In all these cases, the performance of programs processed with Truffle is clearly superior to that of pure interpreters and can even compete with compiled industrystandard implementations for these languages. In our presentation we will show performance numbers for JavaScript and R implemented under Truffle. Currently, our JavaScript implementation under Truffle is on average five times faster than Nashorn, Oracle's reference implementation of JavaScript, and about 20% slower on average than Google's highly optimized implementation of JavaScript under V8, when evaluated with the Google Octane benchmark suite [6]. This demonstrates that Truffle's self-optimizing AST interpreter combined with a highly optimizing just-in-time compiler (Graal) can achieve competitive performance while allowing developers to build new language implementations with rather modest effort. Acknowledgements Truffle was designed and implemented by members of our institute as well as by researchers at Oracle Labs, in particular Christian Humer, Lukas Stadler, Andreas W¨oß, Christian Wimmer, and Thomas Wrthinger. We would like to thank them for their support and for their feedback on this presentation. We are also grateful for the continuous funding of the project by Oracle Labs. References [1] FastR, 2013. URL https://github.com/allr/fastr/ [2] M. Grimmer, C. Seaton, T. W¨ urthinger, H. M¨ossenb¨ock: Dynamically Composing Languages in a Modular Way: Supporting C Extensions for Dynamic Languages. Intl. Conf. on Modularity (Modularity'14), March 16-19, 2015, Fort Collins, Colorado, USA, pp.1-13. [3] M. Grimmer, R. Schatz, C. Seaton, T. W¨ urthinger, H. M¨ossenb¨ock: Memory-safe Execution of C on a Java VM. Submitted to the Workshop on Programming Languages and Analysis for Security (PLAS'15), July 6, 2015, Prague, Czech Republic. [4] M. Grimmer, T. W¨ urthinger, A. W¨ oß, H. M¨ossenb¨ock: An Efficient Approach for Accessing C Data Structures from JavaScript. Intl. Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS'14), July 28, 2014, Uppsala, Sweden, pp. 1-4. [5] C. Humer, C. Wimmer, C. Wirth, A. W¨oß, T. W¨ urthinger: A Domain-Specific Language for Building Self-Optimizing AST Interpreters. Intl. Conf. on Generative Programming: Concepts and Experiences (GPCE'14), Sept. 15-16, 2014, Vsteros, Sweden, pp. 123-132. [6] Octane JavaScript benchmarks. URL https://developers.google.com/octane/ [7] OpenJDK. Graal project, 2015. URL http://openjdk.java.net/projects/graal/. [8] C. Seaton, M. Van De Vanter, M. Haupt: Debugging at Full Speed. Workshop on Dynamic Languages and Applications (Dyla'14), June 8, 2014, Edinburgh, UK, 2014, pp. 1-13. [9] A. W¨ oß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, H. M¨ossenb¨ock: An Object Storage Model for the Truffle Language Implementation Framework. Intl. Conf. on Principles and Practice of Programming in Java (PPPJ'14), September 23-26, 2014, Cracow, Poland, pp. 133-144. [10] T. W¨ urthinger, C. Wimmer, A. W¨oß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko: One VM to rule them all. In Proceedings of the Onward! conference, ACM Press, 2013. doi: 10.1145/2509578.2509581. [11] ZipPy, 2013. URL https://bitbucket.org/ssllab/zippy/