1、外文资料原文 1 中文 6000 字 Encoding the Java Virtual Machines Instruction Set 1 Introduction The development of programs that parse and analyze Java Bytecode 9 has a long history and new programs are still developed 2,3,4,7,13. When developing such tools, however, a lot of effort is spent to develop a parse
2、r for the bytecode and for (re-) developing standard control- and data-flow analyses which calculate, e.g., the control-flow graph or the data-dependency graph. To reduce these efforts, we have developed a specification language (OPAL SPL) for encoding the instructions of stack-based intermediate la
3、nguages. The idea is thatonce the instruction set is completely specified using OPAL SPLgenerating both bytecode parsers and standard analyses is much easier than their manual development. To support this goal, OPAL SPL supports the specification of both the format of bytecode instructions and the e
4、ffect on the stack and registers these instructions have when executed. An alternative use of an OPAL SPL specification is as input to a generic parser or to generic analyses as illustrated by Fig. 1 Though the language was designed with Java Bytecode specifically in mind and is used to encode the c
5、omplete instruction set of the Java Virtual Machine (JVM) , we have striven for a Java-independent specification language. In particular, OPAL SPL focuses on specifying the instruction set rather than the complete class file format, not only because the formers structure is much more regular than th
6、e latters,but also because a specification of the instruction set promises to be most beneficial. Given the primary focus of OPAL SPLgenerating parsers and facilitating basic analyseswe explicitly designed the language such that it is possible to group related instructions. This makes specifications
7、 more concise and allows analyses to treat similar instructions in nearly the same way. For example, the JVMs iload 5 instruction, which loads the integer value stored in register #5, is a special case of the generic iload instruction where the instructions operand is 5. We also designed OPAL SPL in
8、 such a way that specifications do not prescribe how a framework represents or processes information; i.e., OPAL SPL is representation agnostic. The next section describes the specification language. In Section3we reason about the languages design by discussing the specification of selected JVM inst
9、ructions. In Section4the validation of specifications is discussed. The evaluation of the approach is presented in Section5. The paper ends with a discussion of related work and a conclusion. 外文资料原文 2 2 Specifying Bytecode Instructions The language for specifying bytecode instructions (OPAL SPL) was
10、 primarily designed to enable a concise specification of the JVMs instruction set. OPAL SPL supports the specification of both an instructions format and its effect on the stack and local variables (registers)when the instruction is executed. It is thus possible to specify which kind of values are p
11、opped from and pushed onto the stack as well as which local variables are read or written. Given a specification of the complete instruction set the information required by standard control- and data-flow analyses is then available. However, OPAL SPL is not particularly tied to Java as it abstracts
12、from the particularities of the JVM Specification. For example, the JVMs type system is part of an OPAL SPL specification rather than an integral part of the OPAL SPL language itself. Next, we first give an overview of the language before we discuss its semantics. 2.1 Syntax The OPAL Specification L
13、anguage (OPAL SPL) is an XML-based language. Its grammar is depicted in Fig.2using an EBNF-like format. Non-terminals are written in capital letters (INSTRUCTIONS, TYPES, etc.), the names of XML-elements are written in small letters (types, stack, etc.) and the names of XML-attributes start with (ty
14、pe, var, etc.). We refer to the content of an XML-element using symbols that start with/ (/VALUEEXPRESSION, /EXPECTEDVALUE, etc.). is used to specify nesting of elements. ( ),?,+,*,| have the usual semantics. For example,exceptionsspecifies that the XML-elementexceptionshas one or moreexceptionchild
15、 elements that always have the attributetype. 2.2 Semantics Format Specification Each specification written in OPAL SPL consists of four major parts (line 1 in Fig.2). The first part(types, lines 23) specifies the type system that is used by the underlying virtual machine. The second part (exception
16、s, line 4) declares the exceptions that may be thrown when instructions are executed. The third part (functions, line 5) declares the functions that are used in instruction specifications. The fourth part is the specification of the instructions themselves (lines 612), each of which may resort to th
17、e declared functions to access information not simply stored along with the instruction. For example,invoke instructions do not store the signature and declaring class of the called methods. Instead, a reference to an entry in the so-called constant pool is stored. Only this constant pool entry has
18、all information about the method. To obtain, e.g., the return type of the called method, an abstract function TYPE method refreturn type(method ref) is declared that takes a reference to the entry as input and returns the methods return type. Using abstract function declarations, we abstractin the s
19、pecification of the instructionsfrom the concrete representation of such information by the enclosing by tecode toolkit. 外文资料原文 3 The specification of an instruction consists of up to four parts: the instructions format (lines 78), a description of the effect the instruction has on the stack when ex
20、ecuted (lines 910), a descriptions of the registers it affects upon execution (lines 1112), and information about the exceptions that may be thrown during execution (end of line 6). An instructions format is specified by sequences which describe how an instruction is stored. Theu1, u2andu4elements (
21、line 8) of each format sequence specify that the current value is an unsigned integer value with 1, 2 and 4 bytes, respectively. Similarly, thei1, i2 andi4 elements (line 8) are used to specify that the current value is a (1, 2 or 4 byte) signed integer value. The values can be bound to variables us
22、ing thevarat tribute and can be given a second semantics using thetype attribute. For example,is a twobyte signed integer value that is bound to the variable value and has type short with respect to the instruction sets type system. Additionally, it is possible to specify expected values (line 8). T
23、his enables the selection of the format sequence to be used for reading in the instruction. E.g., 171. specifies that this sequence matches if the value of the first byte is 171. A sequences list element is used to specify that a variable number of values need to be read. The concrete number of elem
24、ents is determined by the count attribute. The attributes value is an expression that can use values that were previously assigned to a variable. The sequence elements implicit and implicit type are used to bind implicit value and type information to variables that can later on be used in type or va
25、lue expressions(line 7, 10 and 11). To make it possible to aggregate related bytecode instructions to one logical instruction, several format sequences can be defined. The effect on the stack is determined by the number and type of stack operands that are popped (line 9) and pushed (line 10). If mul
26、tiple stack layouts are specified, the effect on the stack is determined by the firstbefore-executionstack layout that matches; i.e., to determine the effect on the stack a data-flow analysis is necessary. Unique Prefix Rule One constraint placed upon specifications written in OPAL SPL is that a for
27、mat sequence can be identified unambiguously by only parsing a prefix of the instruction; no lookahead is necessary. In other words, if each format sequence is considered a production and eachu1, u2, etc. is considered a terminal, then OPAL SPL requires the format sequences to constitute an LR(0) gr
28、ammar This unique prefix rule is checked automatically (cf. Sec.4); furthermore, this rule facilitates generating fast parsers from the specification, e.g., using nestedswitchstatements. Type System OPAL SPL does not have a hard-coded type hierarchy. Instead, each specification written in SPL contains a description of the type system used by the bytecode language being described. The only restriction is that all types have to be arranged in a single, strict hierarchy. The Java Virtual Machine Specification 9s type hierarchy is shown in Fig.3(1). It captures all