Poster SPLLift

Transcription

Poster SPLLift
LIFT
SPL
- Transparent and Efficient Reuse of
IFDS-based Static Program Analyses
Eric Bodden (TU Darmstadt), Claus Braband (ITU Copenhagen), Marcio Ribeiro, Tarsis Toledo, Paulo Borba (UFPE Brazil) and
y•
x•
0•
Mira Mezini (TU Darmstadt)
control-flow edge
T
int x = secret();
normal
o
data-flow edge
a
y•
x•
ppear
0•
violating information flow
a
t PLDI
call
int y = 0;
normal
Motivation
Methodology
’13!
y•
•
x•
0
‣ Product lines allow highly customizable products as specializations of a
common platform; success stories exist in the car manufacturing and
mobile devices industries
0•
p•
0•
p•
instead analyzes the entire product line at once,
‣ Our approach
y = foo(x);
return p;
call-to-return
including all possible combinations
SPLLIFT
y • by combining flow functions for the case where an
x•
0•
‣ This is achieved
return
print(y);
ifdef is disabled with the
one for the case
where it is enabled
‣ Software product lines (SPLs) use conditional compilation (e.g. via
#ifdef) to define many products as variations of a common code base
result is a lifted flow function
‣ The
Fig. 3: Exploded super graph for the single example product from Figure 1b; main method shown on left-hand side, foo shown to the right
‣ Problem: traditional static analyses can only be applied to preprocessed software products
Enabled-case flow function f F
0
•
‣ But even small product lines can induce thousands of products:
different combinations of features cause a combinatorial explosion
a
•
F
‣ Result: existing approaches do not scale; SPLs are currently
effectively unanalyzable
•
0
0
•
b
•
F
•
a
Lifted flow function f LIFT
Disabled-case flow function f ¬F
_
¬F
•
b
a
•
¬F
•
0
0
•
b
•
F
b
•
=
¬F
•
a
a
•
•
b
•
0
¬F
¬F
•
a
•
b
Fig. 4: Lifting of different flow functions in SPLLIFT
lifting of a flow function that generates the data-flow fact b after the
statement if a is valid before the statement
But how about intra-procedural flows through branching state- B. Inter-procedural flow functions
void main() {
int x = password();
int y = 0;
#ifdef F
x = 0;
Feature'F
#elif G
y = foo(x);
Feature'G
#endif
print(y);
}
int foo(int p) {
#ifdef H
p = 0;
#endif
return p;
}
{'}
✓
int foo(int p) {
return p;
}
{F}
void main() {
int x = password();
int y = 0;
x = 0;
print(y);
}
✓
int foo(int p) {
return p;
}
✓
void main() {
int x = password();
int y = 0;
x = 0;
y = foo(x);
print(y);
}
int foo(int p) {
return p;
}
traditional approach:
to decide whether the
password may leak to the
print statement, one must
analyze all
2|{F,G,H}|=8
possible products
{F,H}
{G}
X
{H}
void main() {
int x = password();
int y = 0;
y = foo(x);
print(y);
}
void main() {
int x = password();
int y = 0;
print(y);
}
password();
For conditional branches of the form if (p)goto l, the lifted flow
function is0basically
of the cases for the unconditional
y•
•
x •a combination
branch and the “normal data flow”, which models the case in which
the branch falls through. We show the respective case in Figure 4c.
y = 0; branches;
The disabled case ¬F is handled just as forint
unconditional
if a branch is disabled, it just falls through, no matter whether it is
y • false
conditional0 or
• not.x •
✓
{G,H}
✓
0•
x•
✓
void main() {
int x = password();
int y = 0;
x = 0;
print(y);
}
void main() {
int x = password();
int y = 0;
y = foo(x);
print(y);
}
int foo(int p) {
p = 0;
return p;
}
int foo(int p) {
p = 0;
return p;
}
{F,G,H}
void main() {
int x = password();
int y = 0;
x = 0;
y = foo(x);
print(y);
}
✓
int foo(int p) {
p = 0;
return p;
}
Scale of the problem
x•
C. Edges between 0 nodes
data-flow
edge
It is an interesting question toconditional
ask whether we
should conditionalize
edges between 0 nodes. As explained earlier, in plain IFDS/IDE,
0 nodes are always connected, unconditionally. We decided for the
violating information flow
design shown in Figure 4 where edges between 0 nodes are in fact
conditionalized with by feature annotations just as any other edges.
G
0•
y•
¬G
0•
data-flow edge
[F ] x = 0;
¬F
int foo(int p) {
p = 0;
return p;
}
int foo(int p) {
return p;
}
The call and return flow functions model inter-procedural control
flows caused by call and return statements. The general idea behind
modeling those functions is the same as in the intra-procedural case,
however this time we need to consider a different flow function for
the disabled case, i.e., when ¬F holds. Remember that a call flow
function leads from a call site to its callee, and a return flow function
from a return statement to just after the call site. Using the identity
function to model such situations would be incorrect: If we were to
use the identity function then this would propagate information from
the call site to the callee (respectively vice versa) although under
¬F the call (or return) never occurs. We must hence use the killall function to model this situation,control-flow
as shown in Figure
edge 4d (middle
column). This function kills all data-flow facts, modeling that no flow
between call site and callee occurs if the invoke statement is disabled.
SPLLIFT applied to example SPL
Figure 4b shows how we lift flow functions for unconditional
branches labeled with a feature annotation F . If the throw or goto
statement is enabled, data flow only exists towards the nodes of the
branch target (the primed nodes in Figure 4b). Both edges within this
figure are assumed to be labeled with F . If the statement is disabled,
falsethefalse
data flowstrue
only along
fall-through branch, as the branch does
not actually
• use the identity function in this case.
x •Againywe
0 •execute.
Just as before, the lifted flow function f LIFT is obtained through a
int x =
disjunction of both functions.
product {G} indeed
contains a leak
Feature'H
void main() {
int x = password();
int y = 0;
print(y);
}
{F,G}
?
ments? We conduct our analysis on the Jimple intermediate representation, a three-address code, for which we need to distinguish unconditional branches through throw e and goto l, from conditional
branches of the form if (p)goto l.
G
p•
¬F ^ G
[H] p = 0;
¬H
[G] y = foo(x);
0•
p•
return p;
G
y•
0•
print(y);
p•
¬F ^ G ^ ¬H
G
Result: ¬F ⋏ G ⋏ ¬ H
Fig. 5: SPLLIFT asSPL
it isLIFT
applied
to the entire example
product line
of Figure
an edge
labeled withcan
feature
constraint
determines
in a single
pass
that1a;the
password
only
leakC represents the
function x. x ^ C. Constraints at nodes represent initial values (at the top) or intermediate results of the constraint computation.
if G is enabled but F and H are disabled.
This is consistent with the result determined by the traditional analysis.
exclusive-OR group to a [. . . ] pairwise mutual exclusion of
the group members.”
We use this form of translation to obtain a Boolean feature-model
constraint from the graphical feature models defined for our benchmark subjects.
Empirical Evaluation
‣
‣
V. I MPLEMENTATION WITH S OOT AND CIDE
LIFT
remarkable
performance
‣ SPL
We have
implementedshows
SPLLIFT based
on the Soot program
analysis
Fig. 6: Example program in the Colored IDE (CIDE)
and transformation framework [13], the Colored Integrated Develop1
ment Environment
(CIDE)
[14]
and
the
JavaBDD
library. We
have
approach
would have taken days or
‣ In some cases the traditional
implemented an IDE solver [10] in Soot that works directly on Soot’s
One performance critical aspect of our implementation is what data
LIFT only takes minutes
years,
while
SPL
to compute.
intermediate representation “Jimple”. Jimple is a three-address code structures
we use to implement the feature constraints that SPLLIFT
representation of Java programs that is particularly simple to analyze. tracks. After some initial experiments with a hand-written data
Jimple statements
are never
nested, and
all control-flow to
constructs
are previously
solved
a problem
which
no scalable
‣ Hence
structure representing
constraints insolution
Disjunctive Normal Form, we
reduced to simple conditional and unconditional branches. Soot can switched to an implementation based on Binary Decision Diagrams
existed
produce Jimple
code from Java source code or bytecode, and compile (BDDs) [19], using JavaBDD. BDDs have the advantage that they
Jimple back into bytecode or into other intermediate representations. are relatively compact, and are normalized, which allows us to easily
Configurations
Configurations
Possible Types
Reaching
Possible
Definitions
Types
Uninitialized
Reaching Definitions
Variables
Uniniti
To be able to actually parse software
product
lines,
we
used
CIDE,
detect
contradictions. LIFT
The
size of a BDD can heavily LI
LIFT
LIFT
LIFT and exploitA2
LIFT
Benchmark
valid
Benchmark
Soot/CG
SPL
valid
Soot/CG
A2
SPL
SPL
A2
SPL
SPL
A2A2
SPL
an extension of the Eclipse IDE [15]. In CIDE, software produce lines depend on its variable ordering. In our case, because
we did not
BerkeleyDB
unknown
BerkeleyDB
7m33s
unknown
24s
years
7m33s
12m04s
24s
years
years
10m18s
12m04s
years
years
10m1
are
expressed
as
plain
Java
programs.
This
makes
them
comparatively
perceive
the
BDD
operations
to
be
a
bottleneck,
we
just
pick
one
labels of #ifdefs used
GPL
1,872
4m35s
GPL
42s
1,872directives
9h03m39s
4m35s
42s
9h03m39s
days
7m09s
8m48s
days
days
7m0
easy
to parse: there
are no explicit compiler
such as #ifdef 8m48s
ordering and leave the search for an optimal ordering to future work.
Lampiro
1m52sneed to handle. 4sInstead,
4 code1m52s
13s
3m30s
13s
1m25s
42s
3m09s
3m30s
1m2
in OpenSSL
that a 4parser Lampiro
would
variations are 42s4s
MM08
26 by marking
MM08
2m57s code fragments3swith
26 different
2m06s
2m57s
59s3s Limitations
24m29s
2m06s
2m13s
59s
27m39s
24m29s
2m1
expressed
colors. Each Current
Our current
color is associated with a feature name. In result,
every CIDE SPL is
LIFT
LIFTimplementation is not as sensitive to feature annotaTABLE II:
comparison
TABLE
II:
ofcomparison
Performance
SPL
comparison
A2SPL
; values
of
in as
gray
SPL
show
over
A2is;approach
estimates
valuesduein togray
show
estim
LIFTtions
it traditional
could
be.coarse
This
mostly
the fact
that coarse
IFDS/IDE
alsoPerformance
a valid Java Performance
program.
CIDE forbids
so-calledover
“undisciplined”
of
with
“A2”.
annotations, i.e., enforces that users mark code regions that span requires the presence of a call graph, and currently we compute this
Scale of the problem is immense: as an example, we investigated the
call graph
without Reaching
taking feature
annotations into
account. While
we
entire statements,
classes.
Previous
research
has shown
Benchmark
Featuremembers
Model orBenchmark
Possible
TypesFeature
Reaching
Model
Definitions
Possible
Types
Uninitialized
Variables
Definitions
Uninitialized
Variables
follow call-graph
feature
sensitive fashion (as defined10m18s
by
that this is typically
not a practical limitation
[18]. Figure
6 shows 12m04s
usage of #ifdef constructs in the OpenSSL crypto library
regarded
24s
regarded
24s edges in a10m18s
12m04s
our call flow
our running example
programBerkeleyDB
with the appropriately
features 11m35s
BerkeleyDB
ignored
23s marked
ignored
23sfunction), the feature-insensitive
10m47s
11m35s call graph is also used
10m47s
to compute points-to sets. This precludes us from precisely handling
in CIDE.
average
A2
21s
average
A2
9m35s
21s
7m12s9m35s
7m12s
OpenSSL contains 1874 #ifdefs with 391 different labels
situations such as the following:
Implementation
regarded
8m48s
42s
7m09s8m48s
ignored
18sJavaBDD,
ignored and
8m21s
18s implementation
7m29s8m21s
Based
on CIDE, GPL
Soot,
our IFDS
average A2
17s average A2
7m31s
17s
6m42s7m31s
regarded
4s
regarded
42s
4s
1m25s 42s
Available
as
open
source
at:
http://bodden.de/spllift/
ignored
Lampiro 4s
ignored
48s
4s
1m13s 48s
average A2
3s average A2
42s
3s
49s 42s
regarded
3s
regarded
59s
3s
2m13s 59s
ignored
MM08 2s
ignored
45s
2s
1m49s 45s
average A2
2s average A2
31s
2s
1m37s 31s
1 JavaBDD
‣ This yields
2391
≈
5・10117
GPL
combinations!
‣ As comparison: The observable universe has only about 1080 atoms.
‣
‣
Lampiro
MM08
regarded
42s
website:
http://javabdd.sourceforge.net/
Heros
7m09s
7m29s
6m42s
1m25s
1m13s
49s
2m13s
1m49s
1m37s

LIFT
LIFT
http://sse.ec-­‐spride.de/
TABLE III: Performance impact
TABLE
of feature
III: Performance
model on impact
SPL of
. Values
featureinmodel
gray show
on SPL
the average
. Values
time
in gray
of theshow
A2
the
analysis.
average
This
time
number
of thecan
A2


be seen as lower bound for any
be seen
feature-aware
as lower bound
analysis.
for any feature-aware analysis.

a