Flaky Tests and Bugs in Apache Software (eg Hadoop)
Transcription
Flaky Tests and Bugs in Apache Software (eg Hadoop)
ApacheCon Core North America (May 12, 2016, at Vancouver) Flaky Tests and Bugs in Apache Software (e.g. Hadoop) Akihiro Suda <[email protected]> NTT Software Innovation Center Copyright© 2016 NTT Corp. All Rights Reserved. Who am I • Software Engineer at NTT Corporation • NTT: the largest telecom in Japan • Engaged in improvement on reliability of distributed systems • • Some contributions to ZooKeeper / Hadoop including critical bug fixes (non-committer) github: https://github.com/AkihiroSuda Copyright© 2016 NTT Corp. All Rights Reserved. 2 Agenda • Current "flakiness" in Apache software • Why flaky test matters? • What causes a flaky test? • How can we find, reproduce, and fix a flaky test? • • Existing work at Apache communities Our work: Namazu(鯰, catfish) https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 3 Agenda • Current "flakiness" in Apache software • Why flaky test matters? • What causes a flaky test? • How can we find, reproduce, and fix a flaky test? • • Existing work at Apache communities Our work: Namazu(鯰, catfish) https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 4 Data are measured at 14/01/2016, using CLOC Good News: Apache software are well tested! Software Production code (LOC) Test code (LOC) 95K 87K YARN 178K 121K HDFS 152K 150K 33K 27K HBase 571K 222K Spark 167K 128K Flume 46K 34K 168K 78K MapReduce ZooKeeper Cassandra Prod Test Copyright© 2016 NTT Corp. All Rights Reserved. 5 Data are captured at 14/01/2016 Bad News: https://builds.apache.org/job/%s-trunk/ MapReduce YARN HDFS Build Time Blue = Success Red = Failure ZooKeeper HBase Build I've never seen fully successful Hadoop build, even on my local machine... Copyright© 2016 NTT Corp. All Rights Reserved. 6 Data are captured at 4/4/2016 Bad News: JIRA QL: project = ? AND text ~ "test fail*" just for approximation Software #Matched #All Issues MapReduce 2,441 (38%) 6,373 YARN 2,290 (63%) 4,756 HDFS 5,141 (53%) 9,672 828 (35%) 2,384 HBase 6,595 (42%) 15,542 Spark 794 ( 6%) 14,047 Flume 342 (12%) 2,882 1,656 (15%) 11,430 ZooKeeper Cassandra Roughly speaking, the half of Hadoop development is dedicated to debugging test failures. Interestingly, its flakiness seems not uniform across software.. (discussed later) Copyright© 2016 NTT Corp. All Rights Reserved. 7 Agenda • Current "flakiness" in Apache software • Why flaky test matters? • What causes a flaky test? • How can we find, reproduce, and fix a flaky test? • • Existing work at Apache communities Our work: Namazu(鯰, catfish) https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 8 Not all test failures are critical for production.. 97% unit test failures in Apache software are said to be harmless for production (" false-alarm ") • Information source: " An Empirical Study of Bugs in Test Code " (A.Vahabzadeh et al., ICSME'15) Copyright© 2016 NTT Corp. All Rights Reserved. 9 So flaky test doesn't matter, as it doesn't affect production? It still matters! For developers.. It's a barrier to promotion of CI • If many tests are flaky, developers tend to ignore CI failure overlook real bugs It's also a psychological barrier to contribution • A developer may be blamed due to a test failure For users.. It's a barrier to risk assessment for production • No one can tell flaky tests from real bugs Copyright© 2016 NTT Corp. All Rights Reserved. 10 image: http:// guid es .lib. j jay. cu ny.ed u / nypd /b roken wind ows So flaky test doesn't matter, as it doesn't affect production? SemaphoreCI suggests " No broken windows " strategy for flaky tests https://semaphoreci.com/community/tutorials/how-to-deal-with-and-eliminate-flaky-tests Copyright© 2016 NTT Corp. All Rights Reserved. 11 Agenda • Current "flakiness" in Apache software • Why flaky test matters? • What causes a flaky test? • How can we find, reproduce, and fix a flaky test? • • Existing work at Apache communities Our work: Namazu(鯰, catfish) https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 12 Basic cause: async operation • Typical flaky test is caused by a malformed async operation like this (A.Vahabzadeh et al., ICSME'15 / Q.Luo et al., ACM FSE'14 / YARN-4478) invokeAsyncOperation(); // some tests lack even this sleep sleep(certainHardcodedTimeout); assertTrue(checkSomethingGoodHasHappened()); • Basically it can be fixed by increasing timeout&retries • • But it's not easy to find a reasonable timeout value (e.g. YARN-{4804, 4807, 4929...}) Long timeout is expensive Copyright© 2016 NTT Corp. All Rights Reserved. 13 Testbed (e.g. CI) can cause test failures as well • Host configuration • Host performance • Docker is great! But it still has some issues Copyright© 2016 NTT Corp. All Rights Reserved. 14 CI host configuration can cause test failures • HADOOP-12687 • Many YARN test fails when /etc/hosts has multiple loopback entries • ZOOKEEPER-2252 • • Test: nslookup("a") should fail It does not fail when there is actually the host named "a“ • INFRA-11811 • JDK was not set up properly in a Jenkins slave • Such a test can fail when the job is assigned to a specific buildbot and it looks like a flaky test Copyright© 2016 NTT Corp. All Rights Reserved. 15 Data are captured at 25/04/2016 CI host performance: they're not made equal • Hadoop's buildbot https://builds.apache.org/computer/ Copyright© 2016 NTT Corp. All Rights Reserved. 16 CI host performance: they're not made equal • Spark's buildbot https://amplab.cs.berkeley.edu/jenkins/computer / Copyright© 2016 NTT Corp. All Rights Reserved. 17 CI host performance: they're not made equal • Significant difference in the response time! Target Hadoop Spark Average 1163ms 3ms Max 1482ms 6ms Min 30ms 0ms • Maybe related to the fact that Spark has only a small number of test-related issues (e.g. YARN 63% vs Spark 6% (slide 7)) Copyright© 2016 NTT Corp. All Rights Reserved. 18 Docker issues Docker is great for testing! • Some Apache software are using Docker on their CI (via Apache Yetus) • Apache BigTop also utilizes Docker for provisioning Hadoop • People also loves Docker for setting up test beds on their workstations and laptops • Of course me too Copyright© 2016 NTT Corp. All Rights Reserved. 19 Docker #18180: Java VM unkillable zombie • Mentioned in several Apache-related issue tickets: • • • • • • • jupyter/docker-stacks#75: Spark hanging docker-library/cassandra#43, #46 docker-solr/docker-solr#4 ALLURA-8039 AMBARI-14706 IGNITE-2377 YETUS-229 … • Fortunately Apache Buildbot (Yetus) didn't hit the bug, but made people's local testbeds flaky in a weird way. • Fixed in recent kernels (so, accurately, it's not a Docker's issue) Copyright© 2016 NTT Corp. All Rights Reserved. 20 Other potential Docker-related issues AUFS: fcntl(F_SETFL, O_APPEND) was not supported (#20199) • • Can cause data corruption (Dovecot is known to be affected) Fixed in recent AUFS Overlay: You should not open O_RDWR and O_RDONLY simultaneously (#10180) • • Can cause data corruption (RPM is known to be affected) Expected behavior, won't get fixed More information: https://github.com/AkihiroSuda/docker-issues Copyright© 2016 NTT Corp. All Rights Reserved. 21 Flaky test is not limited to xUnit in CI.. • Some issues can occur only in a deployed environment rather than in a CI • e.g. TCP packet corruption • Very flaky and critical TCP Copyright© 2016 NTT Corp. All Rights Reserved. 22 TCP packet corruption https://www.pagerduty.com/blog/the-discovery-of-apachezookeepers-poison-packet/ • TCP checksum was ignored in some IPsec configuration • ZooKeeper became weird intermittently due to corrupted TCP packet https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ipdata-to-mesos-kubernetes-docker-containers4986f88f7a19#.gq8chzply • TCP checksum was ignored in some veth configuration • Mesos and Kubernetes are affected TCP Copyright© 2016 NTT Corp. All Rights Reserved. 23 TCP packet corruption • It's very hard to notice (and reproduce) flaky TCP packet corruption... • Should distributed systems be TCP-corruption tolerant...? • the probability is very low in regular environments, but it is not zero (32-bit Ethernet CRC + 16-bit TCP checksum) • JIRA issues: ZOOKEEPER-2175, HDFS-8161… TCP Copyright© 2016 NTT Corp. All Rights Reserved. 24 Agenda • Current "flakiness" in Apache software • Why flaky test matters? • What causes a flaky test? • How can we find, reproduce, and fix a flaky test? • • Existing work at Apache communities Our work: Namazu(鯰, catfish) https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 25 Efforts to find/reproduce a flaky test • determine-flaky-tests-hadoop.py • Apache Kudu‘s CI (dist_test) • Google's TAP • Our work: Namazu https://github.com/osrg/Namazu • and similar great tools Copyright© 2016 NTT Corp. All Rights Reserved. 26 determine-flaky-tests-hadoop.py • Picks up failed tests using Jenkins API • Included in hadoop.git/dev-support (HADOOP11045) $ determine-flaky-tests-hadoop.py --job Hadoop-YARN-trunk ****Recently FAILED builds in url: https://builds.apache.org/job/Hadoop-YARN-trunk ... Among 15 runs examined, all failed tests <#failedRuns: testName>: 7: TestContainerManagerRecovery.testApplicationRecovery ... Copyright© 2016 NTT Corp. All Rights Reserved. 27 determine-flaky-tests-hadoop.py • Great tool, but it doesn't support running a specific test repeatedly • Also there is a maven dependency issue (YARN-4478) • B depends on A • TestB is never executed if TestA fails if TestA is flaky, we can't evaluate the flakiness of TestB! Copyright© 2016 NTT Corp. All Rights Reserved. 28 http://dist-test.cloudera.org:8080/ (Apr 25) Kudu's CI: flaky test dashboard Recently open-sourced and introduced in Apache: Big Data (Monday) https://github.com/cloudera/dist_test Copyright© 2016 NTT Corp. All Rights Reserved. 29 Kudu's CI: flaky test dashboard • Tests are run repeatedly on CI to find flaky tests • KUDU_FLAKY_TEST_ATTEMPTS • KUDU_FLAKY_TEST_LIST (From https://github.com/apache/incubator-kudu/commit/1a24338a) Fix flakiness of client_failover-itest The reason this test was flaky is that there is a race between.. .. 100x Looped and they all passed: http://dist-test.cloudera.org/job?job_id=mpercy.1454486819.10566 Author Mike Percy Jan 29, 2016 8:01 AM Committer Todd Lipcon Feb 4, 2016 2:14 PM Commit 1a24338ad60a8842d1ae5e227f8f03e58faea8c0 Copyright© 2016 NTT Corp. All Rights Reserved. 30 Google's TAP • Google's internal CI • 1.6M test failures per day • 73K (4.5%) are flaky • Repeat a failing test 10 times for labeling flaky tests • Information source: An Empirical Analysis of Flaky Tests (Q.Luo et al. ACM FSE'14) Copyright© 2016 NTT Corp. All Rights Reserved. 31 Challenge: poor non-determinism • Modern CIs run jobs repeatedly to find / reproduce flaky tests • But they don't control non-determinism • Overlook a flaky test • Can not reproduce a failure Cannot analyze the failure • Our suggestion: increase non-determinism for finding and reproducing flaky tests Copyright© 2016 NTT Corp. All Rights Reserved. 32 NAMAZU: PROGRAMMABLE FUZZY SCHEDULER https://github.com/osrg/namazu NOTE: Namazu was formerly named "Earthquake" Copyright© 2016 NTT Corp. All Rights Reserved. 33 Namazu: programmable fuzzy scheduler 鯰 Increases non-determinism for finding and reproducing flaky tests (namazu) means a catfish in Japanese Fuzzed (Randomized) Schedule Event Filesystem Packet Java https://github.com/osrg/namazu Go[planned] Linux threads Copyright© 2016 NTT Corp. All Rights Reserved. 34 Namazu: programmable fuzzy scheduler Namazu uses non-invasive techniques • can be easily applied to any environment https://github.com/AkihiroSuda/golang-exp-aspectgo • can avoid false-positives FUSE Filesystem Openflow AspectJ Netfilter Byteman Packet Java https://github.com/osrg/namazu AspectGo [wip] Go[planned] sched_ setattr(2) Linux threads Copyright© 2016 NTT Corp. All Rights Reserved. 35 Namazu targets • xUnit tests • 😃 Easy to get started; just run `mvn` • 😃 Can reproduce test failures observed in CI • 😞 Limited testable scope • Integration tests on a distributed cluster • 😃 Can test everything • 😞 Need to write a script to set up the cluster • But Docker helps us a lot! Copyright© 2016 NTT Corp. All Rights Reserved. 36 Namazu targets We support the both scenarios RPC $ mvn test Orchestrator Single-node mode (for xUnit tests) Distributed mode (for integration tests) Copyright© 2016 NTT Corp. All Rights Reserved. 37 NAMAZU + XUNIT TESTS $ mvn test Copyright© 2016 NTT Corp. All Rights Reserved. 38 Namazu + xUnit tests • Namazu is a comprehensive framework... • Quick start: “renice” threads for xUnit tests • • POSIX.1 requires that threads share the single nice(priority) value, but the actual Linux implementation (NPTL) not. Not always effective, but it’s generic and easy to get started Filesystem Packet Java Go[planned] Linux threads Copyright© 2016 NTT Corp. All Rights Reserved. 39 Namazu + xUnit tests $ cd hadoop; ./start-build-env.sh [container]$ mvn test –Dtest=TestFoo#testBar $ PID=$(docker inspect $(docker ps -q -f ancestor=hadoopbuild-ubuntu) | jq .[0].State.Pid) $ sudo nmz inspectors proc -pid $PID Namazu periodically sets random nice values for all the child processes and the threads under $PID Plus utilizes non-default kernel schedulers (e.g. SCHED_BATCH) Copyright© 2016 NTT Corp. All Rights Reserved. 40 Namazu + xUnit tests: Reproducibility • Testcase Traditional Namazu YARN-4548 RM/TestCapacityScheduler 11% 82% YARN-4556 RM/TestFifoScheduler 2% 44% ZOOKEEPER-2137 ReconfigTest 2% 16% YARN-4168 NM/TestLogAggregationService 1% 8% YARN-1978 NM/TestLogAggregationService 0% 4% YARN-4543 NM/TestNodeStatusUpdater 0% 1% More information: osrg/namazu#125 Copyright© 2016 NTT Corp. All Rights Reserved. 41 Namazu + xUnit tests: Reproducibility • "Renicing" is not always effective... • But even when renicing is ineffective, sometimes you can also reproduce the flaky test by injecting delays or reordering packets Testcase Traditional Namazu ZOOKEEPER-2080 ReconfigRecoveryTest 14.0% 61.9% $ sudo iptables ... -j NFQUEUE --queue-num 42 $ sudo nmz inspectors ethernet -nfq-number 42 Copyright© 2016 NTT Corp. All Rights Reserved. 42 NAMAZU + INTEGRATION TESTS Copyright© 2016 NTT Corp. All Rights Reserved. 43 Namazu + Integration tests • ZooKeeper: distributed coordination service • used in Hadoop, Spark, Mesos, Kafka.. • ZooKeeper 3.5 (alpha) introduced the dynamic configuration • We performed an integration test so as to evaluate the reliability of the reconfiguration • We found a flaky bug! Copyright© 2016 NTT Corp. All Rights Reserved. 44 Namazu + Integration tests • We permuted some specific Ethernet packets in random order using Namazu • TCP retransmissions are eliminated for reducing possible state space ZooKeeper cluster Open vSwitch + Ryu SDN Framework + Namazu Copyright© 2016 NTT Corp. All Rights Reserved. 45 Found ZOOKEEPER-2212 • Bug: New node cannot participate to ZK cluster properly New node cannot become a leader of ZK cluster itself (More technically, it keeps being an "observer“) • Cause: distributed race (ZAB packet vs FLE packet) • • ZAB.. atomic broadcast protocol for data FLE.. leader election protocol for ZK cluster itself Uses different TCP connection Non-deterministic packet order ZAB [2888/tcp] FLE [3888/tcp] Leader of ZK cluster New ZK node Copyright© 2016 NTT Corp. All Rights Reserved. 46 Found ZOOKEEPER-2212 Data are captured at 22/01/2016 Copyright© 2016 NTT Corp. All Rights Reserved. 47 Found ZOOKEEPER-2212 • Expected: ZK cluster works even when 𝑵/𝟐 nodes crashed • Real: single node failure can terminate the 3-node ensemble Not participating properly (keeps being an "observer") Copyright© 2016 NTT Corp. All Rights Reserved. 48 How hard is it to reproduce? • Reproducibility: 0.0% 21.8% (tested 1,000 times) • We could not reproduce the bug even after 5,000 times traditional testing (60 hours!) • Even reproducible by “renicing” threads, but the reproducibility is just 0.7% Copyright© 2016 NTT Corp. All Rights Reserved. 49 Why we can hit the bug? We define the distributed execution pattern based on code coverage: 𝒑 𝟏,𝟏 ⋮ 𝑷= 𝒑 𝑳,𝟏 ⋯ ⋱ ⋯ 𝒑 𝟏,𝑵 ⋮ 𝒑 𝑳,𝑵 • • 𝐿: LOC 𝑁: Number of nodes (==3 in this case) • • 𝑝𝑖,𝑗 : 1 if the node 𝑗 covers the branch in line 𝑖, otherwise 0 We used JaCoCo: Java Code Coverage Library (patch: ZOOKEEPER-2266) Namazu achieves faster pattern growth. That's why we can hit the bug. Copyright© 2016 NTT Corp. All Rights Reserved. 50 HOW TO USE NAMAZU? Copyright© 2016 NTT Corp. All Rights Reserved. 51 How to use Namazu? Easy to install $ sudo apt-get install lib{netfilter-queue,zmq3}-dev $ go get github.com/osrg/namazu/nmz Easy to get started • Provides Docker-like CLI • No code instrumentation needed • No configuration needed (default: just renice threads) $ sudo nmz container run –it –v /foo:/foo ubuntu [container]$ cd /foo && mvn test Copyright© 2016 NTT Corp. All Rights Reserved. 52 How to use Namazu? For threads ("renicing") $ sudo nmz inspectors proc -pid $TARGET_PID For filesystem $ sudo nmz inspectors fs -mount-point /nmzfs For network packets $ sudo iptables ... -j NFQUEUE --queue-num 42 $ sudo nmz inspectors ethernet -nfq-number 42 Need distributed mode? (for integration testing) Just add `--orchestrator-url http://foobar:10080/api/v3` to the CLI. Copyright© 2016 NTT Corp. All Rights Reserved. 53 Namazu API (Go) type ExplorePolicy interface { QueueEvent(Event) ActionChan() chan Action } Namazu defines REST API, so you can also use other languages func (p *MyPolicy) QueueEvent(event Event) { action := event.DefaultAction() You can also inject fault actions here p.timeBoundedQ.Enqueue(action, 10 * Millisecond, 30 * Millisecond) } Action is randomly fired in [10ms, 30ms] func (p *MyPolicy) ActionChan() chan Action { return p.timeBoundedQ.DequeueChan } Copyright© 2016 NTT Corp. All Rights Reserved. 54 API use case: found YARN-4301 • We found a bug: YARN cannot detect disk failure cases where mkdir()/rmdir() blocks mkdir EIO A case where mkdir() returns EIO explicitly mkdir ... A case where mkdir() blocks • We noticed that the bug can occur theoretically when we are reading the code, and actually produced the bug using Namazu • • When we should inject the fault is pre-known; so we manually wrote a concrete scenario using Namazu API Much more realistic than JUnit + mocking Copyright© 2016 NTT Corp. All Rights Reserved. 55 API use case: found YARN-4301 Interactive test is often easier than writing a JUnit testcase func (p *MyPolicy) signalHandler() { signal.Notify(sigChan, syscall.SIGUSR1) for { We use SIGUSR1 here, <-sigChan but it is also interesting to p.sleep = 10 * time.Minute implement human-friendly } fault: blocks for 10 minutes CLI or GUI for } interactive testing go p.signalHandler() func (p *MyPolicy) QueueEvent(event Event) {..} func (p *MyPolicy) ActionChan() chan Action {..} $ go run mypolicy.go inspectors fs -mount-point /nmzfs Set "yarn.nodemanager.local-dirs" to "/nmzfs/nm-local-dir", Send SIGUSR1 to Namazu when you (and YARN) are ready 56 Copyright© 2016 NTT Corp. All Rights Reserved. API use case: found YARN-4301 Copyright© 2016 NTT Corp. All Rights Reserved. 57 Another API use case: "semi"-deterministic replay • If you have knowledge on the protocol, you can make a hash for a packet • Note that you have to eliminate time-dependent and random bytes when you hash the packet • Using the hash and Namazu API, you can "semi"deterministically replay the scenario • • Not fully deterministic; it just does its best effort Record-less! You just need to remember the "seed" for replaying • PoC: ZOOKEEPER-2212: up to 65% reproducibility • • More information: osrg/namazu#137 See also (for Go): https://github.com/AkihiroSuda/go-replay Copyright© 2016 NTT Corp. All Rights Reserved. 58 SIMILAR GREAT TOOLS Copyright© 2016 NTT Corp. All Rights Reserved. 59 Similar great tool: Jepsen • Network partitioner + Linearizability tester • Famous for "Call Me Maybe" blog: http://jepsen.io/ • “Call Me Maybe” by Carly Rae Jepsen (vevo): https://www.youtube.com/watch?v=fWNaR-rxAic • Randomly injects network partition using iptables • "Linearizability" ∈ "Strong consistency" • Integration test on a flaky network rather than a flaky xUnit test Copyright© 2016 NTT Corp. All Rights Reserved. 60 Similar great tool: Jepsen • Has been used to test several Apache software • Cassandra: 9851,10001,10068,10231,10413,10674 • http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen • HBase • Kafka • Solr: 6530, 6583, 6610 • http:///lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsenflaky-networks • ZooKeeper Copyright© 2016 NTT Corp. All Rights Reserved. 61 Namazu + Jepsen? • Namazu is much more generalized • The bugs we found/reproduced are basically beyond the scope of Jepsen (Threads, Disks..) • Namazu can be also combined with Jepsen! It will be our next work.. Jepsen • causes network partition • tests linearizablity ... Namazu • increases non-determinism • injects filesystem faults Copyright© 2016 NTT Corp. All Rights Reserved. 62 Similar great tool: CharybdeFS • Make the filesystem flaky using FUSE • Used in testing ScyllaDB (Apache Cassandra's clone) • https://github.com/scylladb/charybdefs • Similar to Namazu FS • • • Both supports API Also similar to PetardFS (not active since 2007) CharybdeFS can be also combined with Namazu as well • CharybdeFS is specialized in FS; Namazu is much more comprehensive. Copyright© 2016 NTT Corp. All Rights Reserved. 63 Similar great tool: DEMi (appeared in NSDI'16) https://github.com/NetSys/demi • Found some akka-raft bugs and reproduced a few Spark bugs • • challenge in reducing false-positives related to instrumentation DEMi and Namazu are complementary each other • • DEMi is powerful, but has some limitations Namazu is comprehensive and made easy to get started Namazu DEMi Target Generic (Network,Filesystem,Thread..) Akka Getting Started Easy Need to write AspectJ codes Deterministic Replay? No Yes Bug Cause Minimization? No Yes Copyright© 2016 NTT Corp. All Rights Reserved. 64 SO... HOW CAN WE FIX FLAKY TESTS? Copyright© 2016 NTT Corp. All Rights Reserved. 65 How can we fix flaky tests? • Namazu finds/reproduces flaky tests, but it doesn't automatically fix them 😞 • Basic approach for async-related flakiness: Adjust the values for sleep() and retries in the test code invokeAsyncOperation(); // some tests lack even this sleep sleep(certainHardcodedTimeout); assertTrue(checkSomethingGoodHasHappened()); Copyright© 2016 NTT Corp. All Rights Reserved. 66 How can we fix flaky tests? invokeAsyncOperation(); // some tests lack even this sleep sleep(certainHardcodedTimeout); assertTrue(checkSomethingGoodHasHappened()); • Suggestion: the timeout(&retries) should be a configurable parameter rather than a hard-coded value Timeout value Cost (time) Risk (timeout) Appropriate for Long High Low • Slow machine (e.g.CI) • Conservative person Short Low High • Fast machine • Risk-appetite person Copyright© 2016 NTT Corp. All Rights Reserved. 67 CONCLUSION Copyright© 2016 NTT Corp. All Rights Reserved. 68 Conclusion • Apache software are well tested • But they are flaky • Let’s improve them • Improve asynchronous code • Repeat tests • Our tool can control non-determinism so as to reproduce flaky tests https://github.com/osrg/namazu Copyright© 2016 NTT Corp. All Rights Reserved. 69