On the Challenges in Extracting Metrics from Java Bytecode

Transcription

On the Challenges in Extracting Metrics from Java Bytecode
On the Challenges in Extracting
Metrics from Java Bytecode
A/Prof. Jean-Guy Schneider
[email protected]
Where it all began… .7
12.7: 0.6973
Gini Coefficient of Synthetic Fields
.4
.5
.6
12.6: 0.6443
12.5: 0.5820
4.7: 0.4664
.3
4.6: 0.3484
0
20
40
60
80
100
RSN
SCIENCE | TECHNOLOGY | INNOVATION
2
Overview Java Class File Ø  Informa1on that can be extracted Ø  … and some “misconcep1ons” q  Nested Class Extrac1on Ø  Preliminaries Ø  Problems and Consequences Ø  Nested Class Graph q  A few “surprises” Ø  Mo1va1ng example revisited q  Lessons learnt q 
3 SCIENCE | TECHNOLOGY | INNOVATION
Quick Overview of Class File Format…
There are 10 basic sec1ons to the Java Class File structure: 1.  Magic Number: 0xCAFEBABE 2.  Version of Class File Format: the minor and major versions of the class file 3.  Constant Pool: Pool of constants for the class 4.  Access Flags 5.  Name of the class 6.  Name of the super class 7.  Interfaces: any interfaces the class implements 8.  Fields: fields the class defines 9.  Methods: methods the class defines 10.  AWributes: aWributes of the class (e.g., the name of the sourcefile, enclosing method etc.) ☞  Many references to the Constant Pool in 5 -­‐ 10 SCIENCE | TECHNOLOGY | INNOVATION
4
Analysis of Java Bytecode What we can extract from Java Bytecode q  Classes and Interfaces q  Visibility (and other access modifiers) q  Deriva1ons (inheritance, interface implementa1ons) q  Fields, Methods + Method Signatures, Excep1ons q  Instruc1ons q  Type Dependencies q  Local Variables q  Nested Classes, Enclosing Classes/Methods 5 SCIENCE | TECHNOLOGY | INNOVATION
Data Sources used… Qualitas Corpus: hWp://qualitascorpus.com/ q  Ant: 22 versions (1.1 – 1.8.4) q  Freecol: 31 versions (0.3.0 – 0.10.7) q  Helix Data Set: hWp://www.ict.swin.edu.au/research/projects/
helix/ q  Kolmafia: 101 versions (4.0 – 14.1) q  Xalan: 13 versions (1.0.0 – 2.7.1) q  … and a few hand-­‐crahed examples q 
6 SCIENCE | TECHNOLOGY | INNOVATION
When is a Java Type enclosed in another type? q 
No nested types in Java 1.0 q 
General assump1on: q  From Java 1.1: ‘$’ indicates nes1ng of a type For example: q 
com/ice/tar/TarInputStream is top-­‐level q  com/ice/tar/TarInputStream$EntryAdapter is nested Hence: q  Test for ‘$’ in type name will suffice… q 
q 
☞  ‘$’ is a valid character for any Java idenHfier, including class and interface names! 7 SCIENCE | TECHNOLOGY | INNOVATION
Example 1 class Foo$Bar {
private int a = 23;
public int mFoo$Bar (int x) {
int $y = x + a;
int $z = 3;
return $y + $z;
}
}
☞ 
☞ 
Perfectly well-­‐formed class… But: Bytecode analysis indicates that method mFoo$Bar does not have any local variables! 8 SCIENCE | TECHNOLOGY | INNOVATION
Example 2 public class Foo {
private int x = 1;
protected class Bar {
int y = 2;
public int mBar() {
return new Object()
{ public int z = x + 3; }.z;
}
}
}
☞ 
☞ 
Top-­‐level class Foo, nested class Bar with a nested anonymous class that extends Object
Generated Bytecode names: Foo, Foo$Bar, Foo$Bar$1
9 SCIENCE | TECHNOLOGY | INNOVATION
Example 2 – Bytecode… [ public ] Foo$Bar -> (top-level: false)
super: java/lang/Object
implements: [ ]
Enclosed in: Foo
Member: true
Local: false
Anonymous: false
#Methods 3 - #Fields 2
attributes: SourceFile InnerClasses
InnerClasses: [ Foo$Bar, Foo, Bar, [ protected ] ]
InnerClasses: [ Foo$Bar$1, #0, #0, [ ] ]
Inner Name, Outer Name, Simple Name, Access Flags
Mismatch!!
10 SCIENCE | TECHNOLOGY | INNOVATION
Example 2 (cont.) [ ] Foo$Bar$1
-> (top-level: false)
super: java/lang/Object
implements: [ ]
Enclosed in: Foo$Bar -> mBar:()I
Member: false
Local: true Anonymous: true
#Methods 1 - #Fields 2
attributes: SourceFile InnerClasses EnclosingMethod
EnclosingMethodAttribute: Foo$Bar -> mBar:()I
InnerClasses: [Foo$Bar, Foo, Bar, [ protected ] ]
InnerClasses: [Foo$Bar$1, #0, #0, [ ] ]
☞  BTW: which Foo$Bar??? 11 SCIENCE | TECHNOLOGY | INNOVATION
ObservaMon… and why it can be wrong… A nested class with bytecode name X$..$Y$Z is enclosed in class X$..$Y q 
Not always applicable L (up to 6% error rate) q 
Counter examples (from Freecol 0.5.0): net/sf/freecol/client/control/InGameController$2 is enclosed in net/sf/freecol/client/control/InGameController$1 net/sf/freecol/client/gui/panel/Declara1onDialog$5 is enclosed in net/sf/freecol/client/gui/panel/Declara1onDialog$SignaturePanel q 
Another example (from KoKmafia 4.0): net/sourceforge/kolmafia/KoLFrame$2
is enclosed in
net/sourceforge/kolmafia/KoLFrame$ItemManagePanel$VerifyButtonPanel 12 SCIENCE | TECHNOLOGY | INNOVATION
Nested Classes Graph starting from a top-level node, go through all inner class
structures:
exclude any 'self-defining' nested classes, that is, ones with Inner
Name having the same string value as the current class C
if the Outer Name of a nested class structure is zero
-> a method M in the current class C is the defining scope for the
nested local class
-> record Inner Name as one of the directly nested classes of C
(i.e. class Inner Name is directly enclosed by C)
if the Outer Name is equal to the current class' name
-> record Inner Name as one of the directly nested classes of C as it
is a (static or non-static) member class of C
proceed recursively with all recorded nested class names 13 SCIENCE | TECHNOLOGY | INNOVATION
Example 2 – missing InformaMon [ ] Foo$Bar$1
-> (top-level: false)
super: java/lang/Object
implements: [ ]
Enclosed in: ??? -> ???
Member: false
Local: false Anonymous: true
#Methods 1 - #Fields 2
attributes: SourceFile InnerClasses
InnerClasses: [Foo$Bar, Foo, Bar, [ protected ] ]
InnerClasses: [Foo$Bar$1, #0, #0, [ ] ]
☞ 
☞ 
Without an EnclosingMethod aWribute, the defining “context” of a non-­‐
member nested class cannot be uniquely determined! In case of missing InnerClasses informa1on, the enclosing class of a “hidden” nested class cannot always be uniquely determined, either… 14 SCIENCE | TECHNOLOGY | INNOVATION
Java Classes CategorizaMon What is ohen published: q 
Top-­‐level (package level) classes q 
Nested classes q  Member-­‐level classes q  “Inner Classes” q  Local classes (have a simple name) q  Anonymous classes (no simple name) Anonymous classes are expressions – they can be used anywhere where an expression is allowed. ☞  Are ohen used to ini1alize member instances ☞ 
15 SCIENCE | TECHNOLOGY | INNOVATION
Java Classes CategorizaMon (cont.) Corrected classifica1on: q 
Top-­‐level (package level) classes q 
Nested classes q  Member-­‐level classes q  Anonymous classes q  Local classes – can be either named or anonymous ☞  “Locality” of anonymous classes can only be safely determined if EnclosingMethod aWribute is present! Classifica1on is not orthogonal! ☞ 
16 SCIENCE | TECHNOLOGY | INNOVATION
Let’s have a look at some results…
SCIENCE | TECHNOLOGY | INNOVATION
17
0
Gini - Percentage
.2
.4
.6
.8
Ant – Gini 0
5
10
15
20
25
RSN
GiniSyntheticFields
%TopLevel
SCIENCE | TECHNOLOGY | INNOVATION
GiniSyntheticFields(Zero)
18
0
Frequency
500
1000
1500
Ant – Types of Classes 0
1000
2000
3000
Age (days since release 1)
#Classes
#Nested
SCIENCE | TECHNOLOGY | INNOVATION
4000
#TopLevel
#SyntheticNested
19
0
Frequency
200
400
600
Freecol – Types of Classes 0
1000
2000
3000
Age
#Nested
#NestedAnonymous
SCIENCE | TECHNOLOGY | INNOVATION
#NestedMember
#NestedLocal
20
0
Frequency
500
1000
KoLmafia – Types of Classes 0
20
40
60
80
100
RSN
#Classes
#Nested
SCIENCE | TECHNOLOGY | INNOVATION
#TopLevel
#SyntheticNested
21
Lessons Learnt q 
q 
q 
q 
Java Bytecode is a “rich” source of informa1on q  Needs to be treated with “care” q  Go back to the specifica1ons to find “correct” informa1on Beware of what Java compilers do q  Missing informa1on q  Wrongly generated informa1on (not according to specs): q  Outer Name for anonymous classes q  Local classes used in mul1ple methods Beware of (incomplete) heuris1cs q  E.g., name of enclosing class Beware of pre-­‐mature interpreta1ons ☞  use “whole” picture 22 SCIENCE | TECHNOLOGY | INNOVATION
Lessons Learnt (cont.) q 
q 
Compiler: q  If current compiler generates Bytecode in a par1cular way ☞  Do not assume that is the case for all compilers! Tes1ng: q  use the tool as one of the case studies q  have a LARGE corpus of case studies ☞  Special cases may only appear in certain systems! ☞ 
Many publica1ons in empirical SE do not quan1fy “error rates” when heuris1cs are used… 23 SCIENCE | TECHNOLOGY | INNOVATION
0
Gini / Percentage
.2
.4
.6
KoLmafia – Adding RaMo Nested Classes 0
20
40
60
80
100
RSN
GiniSyntheticFields
%TopLevel
SCIENCE | TECHNOLOGY | INNOVATION
GiniSyntheticFields(Zero)
24
On the Challenges in Extracting
Metrics from Java Bytecode
A/Prof. Jean-Guy Schneider
[email protected]
25

Similar documents