Poster SPLLift
Transcription
Poster SPLLift
LIFT SPL - Transparent and Efficient Reuse of IFDS-based Static Program Analyses Eric Bodden (TU Darmstadt), Claus Braband (ITU Copenhagen), Marcio Ribeiro, Tarsis Toledo, Paulo Borba (UFPE Brazil) and y• x• 0• Mira Mezini (TU Darmstadt) control-flow edge T int x = secret(); normal o data-flow edge a y• x• ppear 0• violating information flow a t PLDI call int y = 0; normal Motivation Methodology ’13! y• • x• 0 ‣ Product lines allow highly customizable products as specializations of a common platform; success stories exist in the car manufacturing and mobile devices industries 0• p• 0• p• instead analyzes the entire product line at once, ‣ Our approach y = foo(x); return p; call-to-return including all possible combinations SPLLIFT y • by combining flow functions for the case where an x• 0• ‣ This is achieved return print(y); ifdef is disabled with the one for the case where it is enabled ‣ Software product lines (SPLs) use conditional compilation (e.g. via #ifdef) to define many products as variations of a common code base result is a lifted flow function ‣ The Fig. 3: Exploded super graph for the single example product from Figure 1b; main method shown on left-hand side, foo shown to the right ‣ Problem: traditional static analyses can only be applied to preprocessed software products Enabled-case flow function f F 0 • ‣ But even small product lines can induce thousands of products: different combinations of features cause a combinatorial explosion a • F ‣ Result: existing approaches do not scale; SPLs are currently effectively unanalyzable • 0 0 • b • F • a Lifted flow function f LIFT Disabled-case flow function f ¬F _ ¬F • b a • ¬F • 0 0 • b • F b • = ¬F • a a • • b • 0 ¬F ¬F • a • b Fig. 4: Lifting of different flow functions in SPLLIFT lifting of a flow function that generates the data-flow fact b after the statement if a is valid before the statement But how about intra-procedural flows through branching state- B. Inter-procedural flow functions void main() { int x = password(); int y = 0; #ifdef F x = 0; Feature'F #elif G y = foo(x); Feature'G #endif print(y); } int foo(int p) { #ifdef H p = 0; #endif return p; } {'} ✓ int foo(int p) { return p; } {F} void main() { int x = password(); int y = 0; x = 0; print(y); } ✓ int foo(int p) { return p; } ✓ void main() { int x = password(); int y = 0; x = 0; y = foo(x); print(y); } int foo(int p) { return p; } traditional approach: to decide whether the password may leak to the print statement, one must analyze all 2|{F,G,H}|=8 possible products {F,H} {G} X {H} void main() { int x = password(); int y = 0; y = foo(x); print(y); } void main() { int x = password(); int y = 0; print(y); } password(); For conditional branches of the form if (p)goto l, the lifted flow function is0basically of the cases for the unconditional y• • x •a combination branch and the “normal data flow”, which models the case in which the branch falls through. We show the respective case in Figure 4c. y = 0; branches; The disabled case ¬F is handled just as forint unconditional if a branch is disabled, it just falls through, no matter whether it is y • false conditional0 or • not.x • ✓ {G,H} ✓ 0• x• ✓ void main() { int x = password(); int y = 0; x = 0; print(y); } void main() { int x = password(); int y = 0; y = foo(x); print(y); } int foo(int p) { p = 0; return p; } int foo(int p) { p = 0; return p; } {F,G,H} void main() { int x = password(); int y = 0; x = 0; y = foo(x); print(y); } ✓ int foo(int p) { p = 0; return p; } Scale of the problem x• C. Edges between 0 nodes data-flow edge It is an interesting question toconditional ask whether we should conditionalize edges between 0 nodes. As explained earlier, in plain IFDS/IDE, 0 nodes are always connected, unconditionally. We decided for the violating information flow design shown in Figure 4 where edges between 0 nodes are in fact conditionalized with by feature annotations just as any other edges. G 0• y• ¬G 0• data-flow edge [F ] x = 0; ¬F int foo(int p) { p = 0; return p; } int foo(int p) { return p; } The call and return flow functions model inter-procedural control flows caused by call and return statements. The general idea behind modeling those functions is the same as in the intra-procedural case, however this time we need to consider a different flow function for the disabled case, i.e., when ¬F holds. Remember that a call flow function leads from a call site to its callee, and a return flow function from a return statement to just after the call site. Using the identity function to model such situations would be incorrect: If we were to use the identity function then this would propagate information from the call site to the callee (respectively vice versa) although under ¬F the call (or return) never occurs. We must hence use the killall function to model this situation,control-flow as shown in Figure edge 4d (middle column). This function kills all data-flow facts, modeling that no flow between call site and callee occurs if the invoke statement is disabled. SPLLIFT applied to example SPL Figure 4b shows how we lift flow functions for unconditional branches labeled with a feature annotation F . If the throw or goto statement is enabled, data flow only exists towards the nodes of the branch target (the primed nodes in Figure 4b). Both edges within this figure are assumed to be labeled with F . If the statement is disabled, falsethefalse data flowstrue only along fall-through branch, as the branch does not actually • use the identity function in this case. x •Againywe 0 •execute. Just as before, the lifted flow function f LIFT is obtained through a int x = disjunction of both functions. product {G} indeed contains a leak Feature'H void main() { int x = password(); int y = 0; print(y); } {F,G} ? ments? We conduct our analysis on the Jimple intermediate representation, a three-address code, for which we need to distinguish unconditional branches through throw e and goto l, from conditional branches of the form if (p)goto l. G p• ¬F ^ G [H] p = 0; ¬H [G] y = foo(x); 0• p• return p; G y• 0• print(y); p• ¬F ^ G ^ ¬H G Result: ¬F ⋏ G ⋏ ¬ H Fig. 5: SPLLIFT asSPL it isLIFT applied to the entire example product line of Figure an edge labeled withcan feature constraint determines in a single pass that1a;the password only leakC represents the function x. x ^ C. Constraints at nodes represent initial values (at the top) or intermediate results of the constraint computation. if G is enabled but F and H are disabled. This is consistent with the result determined by the traditional analysis. exclusive-OR group to a [. . . ] pairwise mutual exclusion of the group members.” We use this form of translation to obtain a Boolean feature-model constraint from the graphical feature models defined for our benchmark subjects. Empirical Evaluation ‣ ‣ V. I MPLEMENTATION WITH S OOT AND CIDE LIFT remarkable performance ‣ SPL We have implementedshows SPLLIFT based on the Soot program analysis Fig. 6: Example program in the Colored IDE (CIDE) and transformation framework [13], the Colored Integrated Develop1 ment Environment (CIDE) [14] and the JavaBDD library. We have approach would have taken days or ‣ In some cases the traditional implemented an IDE solver [10] in Soot that works directly on Soot’s One performance critical aspect of our implementation is what data LIFT only takes minutes years, while SPL to compute. intermediate representation “Jimple”. Jimple is a three-address code structures we use to implement the feature constraints that SPLLIFT representation of Java programs that is particularly simple to analyze. tracks. After some initial experiments with a hand-written data Jimple statements are never nested, and all control-flow to constructs are previously solved a problem which no scalable ‣ Hence structure representing constraints insolution Disjunctive Normal Form, we reduced to simple conditional and unconditional branches. Soot can switched to an implementation based on Binary Decision Diagrams existed produce Jimple code from Java source code or bytecode, and compile (BDDs) [19], using JavaBDD. BDDs have the advantage that they Jimple back into bytecode or into other intermediate representations. are relatively compact, and are normalized, which allows us to easily Configurations Configurations Possible Types Reaching Possible Definitions Types Uninitialized Reaching Definitions Variables Uniniti To be able to actually parse software product lines, we used CIDE, detect contradictions. LIFT The size of a BDD can heavily LI LIFT LIFT LIFT and exploitA2 LIFT Benchmark valid Benchmark Soot/CG SPL valid Soot/CG A2 SPL SPL A2 SPL SPL A2A2 SPL an extension of the Eclipse IDE [15]. In CIDE, software produce lines depend on its variable ordering. In our case, because we did not BerkeleyDB unknown BerkeleyDB 7m33s unknown 24s years 7m33s 12m04s 24s years years 10m18s 12m04s years years 10m1 are expressed as plain Java programs. This makes them comparatively perceive the BDD operations to be a bottleneck, we just pick one labels of #ifdefs used GPL 1,872 4m35s GPL 42s 1,872directives 9h03m39s 4m35s 42s 9h03m39s days 7m09s 8m48s days days 7m0 easy to parse: there are no explicit compiler such as #ifdef 8m48s ordering and leave the search for an optimal ordering to future work. Lampiro 1m52sneed to handle. 4sInstead, 4 code1m52s 13s 3m30s 13s 1m25s 42s 3m09s 3m30s 1m2 in OpenSSL that a 4parser Lampiro would variations are 42s4s MM08 26 by marking MM08 2m57s code fragments3swith 26 different 2m06s 2m57s 59s3s Limitations 24m29s 2m06s 2m13s 59s 27m39s 24m29s 2m1 expressed colors. Each Current Our current color is associated with a feature name. In result, every CIDE SPL is LIFT LIFTimplementation is not as sensitive to feature annotaTABLE II: comparison TABLE II: ofcomparison Performance SPL comparison A2SPL ; values of in as gray SPL show over A2is;approach estimates valuesduein togray show estim LIFTtions it traditional could be.coarse This mostly the fact that coarse IFDS/IDE alsoPerformance a valid Java Performance program. CIDE forbids so-calledover “undisciplined” of with “A2”. annotations, i.e., enforces that users mark code regions that span requires the presence of a call graph, and currently we compute this Scale of the problem is immense: as an example, we investigated the call graph without Reaching taking feature annotations into account. While we entire statements, classes. Previous research has shown Benchmark Featuremembers Model orBenchmark Possible TypesFeature Reaching Model Definitions Possible Types Uninitialized Variables Definitions Uninitialized Variables follow call-graph feature sensitive fashion (as defined10m18s by that this is typically not a practical limitation [18]. Figure 6 shows 12m04s usage of #ifdef constructs in the OpenSSL crypto library regarded 24s regarded 24s edges in a10m18s 12m04s our call flow our running example programBerkeleyDB with the appropriately features 11m35s BerkeleyDB ignored 23s marked ignored 23sfunction), the feature-insensitive 10m47s 11m35s call graph is also used 10m47s to compute points-to sets. This precludes us from precisely handling in CIDE. average A2 21s average A2 9m35s 21s 7m12s9m35s 7m12s OpenSSL contains 1874 #ifdefs with 391 different labels situations such as the following: Implementation regarded 8m48s 42s 7m09s8m48s ignored 18sJavaBDD, ignored and 8m21s 18s implementation 7m29s8m21s Based on CIDE, GPL Soot, our IFDS average A2 17s average A2 7m31s 17s 6m42s7m31s regarded 4s regarded 42s 4s 1m25s 42s Available as open source at: http://bodden.de/spllift/ ignored Lampiro 4s ignored 48s 4s 1m13s 48s average A2 3s average A2 42s 3s 49s 42s regarded 3s regarded 59s 3s 2m13s 59s ignored MM08 2s ignored 45s 2s 1m49s 45s average A2 2s average A2 31s 2s 1m37s 31s 1 JavaBDD ‣ This yields 2391 ≈ 5・10117 GPL combinations! ‣ As comparison: The observable universe has only about 1080 atoms. ‣ ‣ Lampiro MM08 regarded 42s website: http://javabdd.sourceforge.net/ Heros 7m09s 7m29s 6m42s 1m25s 1m13s 49s 2m13s 1m49s 1m37s LIFT LIFT http://sse.ec-‐spride.de/ TABLE III: Performance impact TABLE of feature III: Performance model on impact SPL of . Values featureinmodel gray show on SPL the average . Values time in gray of theshow A2 the analysis. average This time number of thecan A2 be seen as lower bound for any be seen feature-aware as lower bound analysis. for any feature-aware analysis. a