KPMG Location Analytics
Transcription
KPMG Location Analytics
KPMG Location Analytics Jori van Lier April 2, 2015 Intro: Me Jori van Lier [email protected] Intro: Rest of the team Location Analytics Overview The gist Measure WiFi Data from Smartphones (MAC addresses and signal strengths) Reconstruct Location of Smartphone (visitor, shopper…) Heatmaps Visitor counts Dwell times Why? Heatmaps ■ What is the most visited area in a location? Store Layout Optimization ■ Based on customer preferences and buying patterns, what is the best store layout? Visits Capture Rate ■ How many people come to a location and how does this vary over locations? ■ How many people that walk by a location actually enter that location ? Returning visits Conversion ■ How often does the average customer visit your store? Did your latest marketing campaign increase customer retention? ■ How many customers that enter the store actually buy something? Staffing Optimization ■ Based on visit patterns, predict peak hours and determine how much staff is required where. For example: ensure that the checkout area is manned before the crowd arrives. Trends ■ Find the different trends based on customer behavior and make decisions before problems affect your sales. Dwell Time ■ How long does an average visitor stay inside a location? Does the time spent convert to sales? Benchmarking ■ Benchmark and A/B test data between different locations and dates. Find out what works and what doesn’t. The beginning: “proof of concept” dashboard App for employees Standardized product: CrumbBase Data Acquisition Getting the data… Data acquisition Wi-Fi devices continually send WiFi probe packets (802.11 type 0 subtype 4) WiFi sensors have 2 antennas: 1. Monitor mode to measure traffic on MAC layer 2. Mesh mode for communication among sensors A small barebones computer: • Aggregates the raw data • Hashes the MACs, encrypts the rest • Filters for “opt out” • Forwards data to TTP Anonymization & opt out process No single party has all the information to extract personal information Big Data Platform An enabler to analyze the data… KAVE: KPMG Analytics & Visualization Environment The Overview Horizontally Scalable Open Source Configurable Modular Secure The Implementation Remote (or local) hosting Dedicated hardware Virtualized system Secure internal network Exploratory data science • Most of our analyses start off with a dataset, a “hunch”, and lots of plots… • Tooling: Python data stack (numpy, scipy, pandas, matplotlib, scikitlearn, iPython notebook) Storm, for real-time analysis • Fully developed analyses with a real-time character go into Storm • Storm is a distributed real-time computation system • Used by Twitter, Groupon, Flipboard, Yelp, Baidu… Bolt Spout Spark, for batch analysis • Spark on Hadoop for Batch processing • From the Hadoop Ecosystem, we only use the Hadoop resource manager (YARN) and the Hadoop Distributed File System (HDFS). • Hadoop MapReduce = Slow, Spark = Fast! (In-memory) • Scala, Python or Java • Awesome functional programming style: file = spark.textFile("hdfs://...") file.flatMap(lambda line: line.split()) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a+b) Word count in Spark's Python API Trilateration From dBm measurements to X and Y coordinates… Friis free space transmission equation Looking for the intersection Analytics Now that we have X and Y coordinates we can turn the data into actionable insights Example Storm analysis Our “WiFi Orientation Engine” Storm topology Data Acquisition Server WifiDrpcSpout WifiDrpcDecryptSplit WifiDrpcMonitor WifiNormalization Trusted Third Party DRPC Server VisitAnalysisBolt WifiTrilaterat ionFitter regionVisitAnalysisBolt Mongo Persistence Bolt MongoDB Hadoop Persistence Bolt Hadoop HDFS dwellTimeAnalysisBolt dwellTimeDailyAnalysisBolt heatmapProducerBolt visitAnalysisBolt: incoming data public final void execute(final Tuple tuple) { […] if (sourceComponent.equals("wifiNormalization")) { type = TupleType.SEEN; } else if (sourceComponent.equals(locationSourceBoltId)) { type = TupleType.VISIT; } […] Event event = (Event) tuple.getValueByField("event"); addTupleToBucket(type, event.getTimestamp(), event.getSourceMac()); […] } There’s two buckets: 1 for raw data and 1 fitted data. Add incoming tuples to corresponding bucket. visitAnalysisBolt: outgoing data (daily cumulative visitor counter) Every minute, Storm triggers a “tick” tuple which is a signal to emit data for that minute. private void emitDailyEvent(final Set<String> seenDevicesTotal, final Set<String> visitDevicesTotal, final Long timestamp) { int seenTotal = seenDevicesTotal.size(); int visitTotal = visitDevicesTotal.size(); int walkByTotal = seenTotal - visitTotal; VisitDailyEvent visitDailyEvent = new VisitDailyEvent(); visitDailyEvent.setVisit(visitTotal, calcUncertainty(visitTotal)); visitDailyEvent.setWalkBy(walkByTotal, calcUncertainty(walkByTotal)); visitDailyEvent.setCaptureRate(seenTotal != 0 ? Math.round((float) visitTotal / (float) seenTotal * 100.) : 0); visitDailyEvent.setMeasurementTimestamp(new Date(timestamp)); visitDailyEvent.setApplication(application); visitDailyEvent.setLayer(RESLAYER_DAILY); } this.outputCollector.emit(visitDailyEvent.toValues()); This event is picked up by the MongoPersistenceBolt which stores it into MongoDB VisitAnalysisBolt: Result in MongoDB { } "_id" : ObjectId("551c1558e4b0583e96bf0c07"), "version" : 2, "processingTimestamp" : ISODate("2015-04-01T15:57:03.454Z"), "measurementTimestamp" : ISODate("2015-04-01T15:54:00Z"), "value" : { "visit" : { "value" : 1114, "error" : 33 }, "walkBy" : { "value" : 1901, "error" : 44 }, "captureRate" : 37 }, "history" : [ "Persisting" ] This is emitted every minute. The counters reset at 0:00 local time. Cumulative daily visitor count plot (and calibration…) Our approach & PoC results Our approach (1/2) • Scrum: an agile, iterative approach • The business prioritizes our backlog. We develop analyses and present the results in (bi-)weekly product demos. Our approach (2/2) • Once metrics have been developed and baselines have been set: Experiments and A/B testing! E.g.: Change something in the storefront window and determine if more visitors came in than before (and if the difference is statistically significant). • If so: keep doing that! • If not: stop wasting effort (money) on that activity! This is similar to what websites have been doing all along to determine the best layout. Brick ‘n mortar stores can start doing this as well now. Results (1/3) • Proof of Concept: Solution rolled out to a large retailer • Done: • 2 months stabilizing system (Dec/Jan) • 2 months developing new / custom metrics (Feb/March) • Next 2 months: experiments: 1. If we proactively inform visitors which areas are quiet, does this lead to less congestion? • Metric: Occupancy spread more evenly 2. If we send employees to the checkout area before the crowd arrives, based on a short-term queue time prediction, will we see a reduction in queuing time? • Metric: Lower Dwell Time 3. If we offer an incentive to visit, will we see a larger % of people entering? • Metric: Higher Capture Rate Results (2/3) Results (3/3) Thanks! Jori van Lier [email protected]