Price Modeling: The right level of data aggregation
Transcription
Price Modeling: The right level of data aggregation
Price Modeling: The right level of data aggregation Authors: Amit Gupta Hari Hariharan ©2014 Copyright Fractal Analytics, Inc., all rights reserved. Confidential and proprietary Information of Fractal Analytics Inc. Fractal is a registered trademark of Fractal Analytics Limited. Price Modeling: The right level of data aggregation Abstract When developing price models, market researchers face the constant debate around using market and retailer level data versus store-level data. Aggregated data carries aggregation bias, which affects parameter estimates. Research shows aggregation bias can be avoided by examining homogeneous entities and restraining the projection of estimates. Storelevel data, in comparison, can lead to erroneous estimates due to noise caused by local effects. In addition, store-level data is expensive, often has restricted access and can be hard to work with. This paper provides a point-of-view and framework to identify the right level of data that appropriately suits the level of decision making for efficient and accurate forecasts. Price Modeling: The right level of data aggregation Table of Contents 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. Objectives and Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Issue with Aggregated Data – Aggregation Bias . . . . . . . . . . . . . . . . . . . . . . 2 3.1 What is aggregation bias? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.2 What are the antecedents of Aggregation Bias . . . . . . . . . . . . . . . . . . . 3 3.3 Aggregation bias is bigger when a smaller component of an aggregated set is exposed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Dealing with Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.1 Aggregation bias can be avoided without store-level data . . . . . . . . . . 4 5. Problems with store-level data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Price Modeling: The right level of data aggregation 1. Motivation data or market level data CPG companies increasingly rely on econometric This choice of data for analyses is driven by a models to enable data driven marketing decision making in the area of pricing and promotions. They have realized that such rigorous scientific decision making has made significant bottom line impact. Businesses rely on either external consultants or have internal expertise to design, develop, implement and interpret the results of these models. All the businesses rely on the two primary sources of syndicated data, namely IRI and Nielsen, to create and deploy the econometric models. When designing the research approach for conducting a price-and-promo analysis researchers have to make crucial decisions regarding the level of data with which to work. In such studies, the statistical models that need to be built can leverage the data at different levels of aggregation across channels (store, market or retailer), products (SKUs or Product Price Groups) and time (weekly, monthly etc.). In practice, since it is difficult to deal with the huge number of SKUs, CPG and Retailers tend to rely on product price groups for pricing decisions, and the use of weekly data has become the normal practice as well. The more debated question is whether to build models with storelevel data or aggregated retailer/account level Copyright © Fractal 2013 – 2014 few factors. The first factor is the availability and cost of obtaining data at the different levels of aggregation. Second, the amount of time and effort required to work with the data should be considered. Third and perhaps the most important consideration is the level of data aggregation that provides the most accurate parameter estimates which in turn facilitates accurate sales estimation and forecasts. Industry experts also have different perspectives. Some believe that models built with aggregated data produce biased parameter estimates, given that the models are inherently non-linear while the aggregation is linear. The biased estimates in turn produce inaccurate forecasts. The store level data has its own challenges as well. First, the data is based on a sample of stores. Second, store level data carries a lot of noise (due to local factors) which cannot be accounted for in the models leading to inaccurate estimates. For example, sales of sodas may run up in a store because there was a local football match nearby. However such data could never get captured leading to inaccurate estimates. 1 Price Modeling: The right level of data aggregation 2. Objectives and Research questions The objective of the paper is to help marketers and researchers align the level of syndicated data aggregation with their marketing problem. The specific questions that this paper will address are: What are the different levels of data aggregation that can be used to model price? What is the accuracy of the estimates of the different methods of aggregation? How can we handle biases in data? 3. Issue with Aggregated Data – Aggregation Bias 3.1 What is aggregation bias? Aggregation bias refers to an inaccurate slant in analytic studies caused by aggregating data to build a smaller number of data-points, for analytics. This aggregation can be of data that belongs to different products, channels, markets or points in time. Consider the example of two stores in a market. The first store has a promotional event while the second one doesn’t. When we model store-level How can we handle noise in data? data, the model will capture the lift in the first How do we choose the right level of store. There is no lift beyond base sales in the aggregation for a given decision? second store. If, on the other hand, we model Based on the above, we will provide a framework to assess the alignment of the data aggregation method with the marketing decision problem. aggregated data, the model will capture the lift in sales given 50 percent ACV of feature, assuming both stores are equal size. The framework should help marketers and The lift captured through 50 percent ACV of decision makers to choose the right level of data promotion when projected to 100 percent ACV aggregation for their specific marketing decision is likely to give a biased result, overstating the problems. lift. This is aggregation bias, which emerged because the data for 2 stores were aggregated to build a single piece of data to be analyzed. Copyright © Fractal 2013 – 2014 2 Price Modeling: The right level of data aggregation 3.2 What are the antecedents of aggregation bias? to size or variant level, days are aggregated to Heterogeneity is one important antecedent of a store are aggregated to total store sales. aggregation bias. In the example above two stores were aggregated because they had different exposure to promotions – one had the promotion while the other did not. weeks or months, and different shoppers within Why then are researchers only concerned with store data aggregated at the market level, and not with other aggregations? Researchers show that biases occur when we The reason is simple: aggregation bias occurs aggregate entities that are not homogeneous, or only when heterogeneous entities are grouped in other words, don’t witness the same activity together despite their competing characteristics. or input. (see Link 1995; Wittink et al 1997) Consider ‘aggregation’ from a wider perspective. We know that scanner data can be characterized using three different dimensions and each may have different levels of aggregation (see Table 1) Table 1: Dimensions in scanner data S.No. Dimension Levels and aggregation 1 Channel Transaction, shopper, store, market, retailer 2 Product SKU, size, variant, brand 3 Time Day, week, month ACV (All Commodity Value) is a measure of the width or coverage of a promotional event. 1 As Table 1 illustrates, to some extent aggregation may happen on each of the When we aggregate different shoppers within a store to get to store level sales, there is no aggregation bias since they are all exposed to same activity in the store. Similarly, when we aggregate different SKUs that follow same pricing strategy to get size level group, there is no bias. This set of SKUs is termed a Promoted Product Group (PPG). When we aggregate days of a week to weekly sales, there is no bias since all seven days of a week witness the same activity most of the time. If we group a set of stores that follow the same pricing and promotions, there will be no bias because these stores are homogeneous. dimensions. For example, SKUs are aggregated Copyright © Fractal 2013 – 2014 3 Price Modeling: The right level of data aggregation 3.3 Aggregation bias is bigger when a smaller component of an aggregated set is exposed Research suggests that you will have to create a Research also suggests that the bias is bigger paper focuses on the ways of avoiding the bias when a smaller set of stores in a market are and not on how to adjust for the bias once it is exposed to an activity (see Link 1995). In other there. words, in the aggregated model, when we estimate a parameter for an activity that is 10 percent ACV wide as opposed to another with custom category specific solution to mitigate the effects of bias. (see Wittink et al 1997). This 4.1 Aggregation bias can be avoided without store-level data 70 percent width, the bias is much larger when While there are ways to adjust the estimates to projected to 100 percent ACV for the activity overcome the bias, it is better to try to avoid the that is 10 percent wide. bias altogether. There are specific reasons that cause and govern the extent of bias. These are: Hence another significant factor that causes or impacts the extent of bias is the gap between (i) Heterogeneity in aggregated entities the actual and projected amount of activity in (ii) The gap between the actual occurrence and the aggregated set of stores. projected activity 4. Dealing with Bias Once these reasons are known, it is possible to avoid the bias using the following methods: The bias that emerges from aggregation can be Meaningful aggregation of homogeneous dealt in two different ways: entities 1. Avoid the bias by using the appropriate level of data aggregation. 2. Adjust the parameter estimates to mitigate the bias. Most pricing and promotion decisions are made at retailer (e.g. Kroger) and retailer-market (e.g. Kroger Ohio) combination level. If researchers use data aggregated at these levels Researchers have established several ways of they will avoids aggregation bias and at the countering the bias by adjusting the parameter same time also avoid noise that appears in estimates. disaggregated store-level data. Copyright © Fractal 2013 – 2014 4 Price Modeling: The right level of data aggregation Variables definition to limit projections Store-level data often includes a lot of “noise” Another source of bias is projecting the impact which occurs because of random variations in of activity that occurs in a small set of stores to sales that are not due to any marketing or a larger store set. The bias occurs in parameter pricing decisions, but instead are organic e.g. estimates for the %-of-stores kind of variables construction activity near a store could drive (e.g. ACV), which are then projected to 100 sales down, change in temperature around a percent of the stores. store may drive sales up etc. Such noise in the Such a bias can be avoided by breaking the variable that captures such activity into multiple variables that capture different levels of activity. For instance, instead of one ACV Display variable you must create separate variables for 0 percent to 20 percent ACV of display, 20 percent to 40 percent ACV of display, 40 percent to 60 percent ACV of display, etc. By doing this, the models provide separate parameter estimates for different levels of activity, avoiding the need to project the parameter estimate from smaller level to a larger level of activity, and hence avoiding the bias itself. data may disturb the parameter estimate and make it inaccurate or unstable. In other words, the parameter estimates tend to be inaccurate when we look at such disaggregated data, even when these estimates are mostly free from any aggregation bias. This begs the question, how helpful and reliable are estimates from store level data that are free from aggregation bias, but tend to be inaccurate themselves? Table 2 suggests that the better approach is to use data aggregated for homogenous entities. Table 2: Issues with different levels of data 5. Problems with store-level data Researchers often state that sales estimation and forecasts are more precise when done using Accuracy of estimate Store-level data estimates from the analysis of market-level data. Why is this the case when we know that the estimates from store-level data analysis are free from aggregation bias? Copyright © Fractal 2013 – 2014 Aggregated Store level Data Estimates can be inaccurate and unstable, given the noise Estimates are accurate and stable Aggregation Bias No aggregation Bias Homogeneous: No aggregation bias Heterogeneous: Avoid aggregation bias by limiting the projection 5 Price Modeling: The right level of data aggregation 6. Proposed Framework Framework for Assessing Alignment and Marketing Decision Making Focus Focus of Marketing Decision Making Decision Granularity, Cadence and Modeling approach Data used for Models Assessment of Alignment • Data is available Store-level data for modeling and estimates at the account level Decision at the account level & weekly decisions • Estimates not biased • Uncommon Situation • Risk of noise • Data is available Aggregated store-level data for modeling and estimates at • Estimates not biased the account level • Uncommon situation Price and promotion • Data is available Decision at retailermarket level & weekly decisions Store-level data for modeling at retailer-market level • Estimates not biased • Common situation • Risk of noise Aggregated store-level data for modeling at retailermarket level Media spend • Data is available • Estimates not biased • Common situation • Data is available Regional level across accounts Store-level data for modeling and estimates at the regional level • Estimates not biased • Uncommon situation • Risk of noise • Data is available Aggregated store-level data for modeling and estimates at • Estimates not biased the regional level • Common situation Copyright © Fractal 2013 – 2014 6 Price Modeling: The right level of data aggregation Table 3: Overall Trade-off Matrix Market level Retailer level Retailer Market Level Store level Aggregation Bias May appear because of heterogeneity Very limited bias given the stores within a retailer are largely homogeneous No bias given the stores are homogeneous in pricing decisions No bias as there is no store aggregation Data Availability Yes Yes Yes Not readily accessible Cost Low Moderate High Very High Implementation Quick Easy Comprehensive Tedious In conclusion, we suggest the following guiding principles: Use the level of aggregation at which pricing decisions are made, and business planning is done. This ensures homogeneity and avoids aggregation bias References Marcus Christen, Sachin Gupta, John C. Porter, Avoid projecting the impact of a narrow promotional activity to a much wider set Richard Staelin, Dick R. Wittink, “Using Market- Store-level data is not always the best solution and it is certainly not required to mitigate aggregation bias Nonlinear Model”, Journal of Marketing Store level data can suffer from noise which is hard to model For pricing decisions the ideal level of aggregation is either Retailer or RetailerMarket level Copyright © Fractal 2013 – 2014 level Data to Understand Promotions Effects in a Research (1997), 322-334 Steven Tenn, “Estimating Promotional Effects with Retailer-Level Scanner Data” (2003) Ross Link, “Are aggregate scanner data models biased?” (1995) 7 Price Modeling: The right level of data aggregation About Fractal Analytics by research advisor Gartner. Fractal Analytics is a global analytics firm Fractal Analytics has also been recognized that serves Fortune 500 companies gain a competitive advantage by providing them a deep understanding of consumers and tools to improve business efficiency. Producing accelerated analytics that generate datadriven decisions, Fractal Analytics delivers insight, innovation and impact through predictive analytics and visual story-telling. for its rapid growth, being ranked on the exclusive Inc. 5000 list for the past three years and also being named among the USPAACC’s Fast 50 Asian-American owned businesses for the past two years. For more information, contact us at: +1 650 378 1284 [email protected] Fractal Analytics was in founded in 2000 and has 700 people in 12 offices around the world serving clients in over 100 countries. Authors Amit Gupta Fractal Analytics is backed by TA Associates, VP Growth – Tech, a global growth private equity firm, and [email protected] recently partnered with Aimia, a global LinkedIn loyalty and consumer insights firm. Hari Haran The company has earned recognition by VP Global Consulting − CPG, Retail and FSI, industry analysts and has been named one [email protected] of the top five “Cool Vendors in Analytics” LinkedIn