Online Survey Sample and Data Quality Protocols
Transcription
Online Survey Sample and Data Quality Protocols
Online Survey Sample and Data Quality Protocols Socratic Technologies, Inc. © 1994-2014. Reproduction in whole or part without written permission is prohibited. Federal law provides severe civil and criminal penalties for unauthorized duplication or use of this material in physical or digital forms, including for internal use. ISSN 1084-2624. sotech.com | 800-576-2728 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Sample and Data Quality Historical Perspective Socratic Technologies, Inc. has developed sophisticated sample scanning and quality assessment programs to identify and correct problems that may lead to reduced data reliability and bias. From the earliest days of research, there have been problems with sample quality (i.e., poor recruiting, inaccurate screening, bias in sample pools, etc.) and potential respondents have attempted to submit multiple surveys (paper and pencil), lie to get into compensated studies (mall intercepts and focus groups) and have displayed lazy answering habits (all forms of data collection). In the age of Internet surveying, this is becoming a highly discussed topic now, because we now have the technology to measure sample problems and we can now detect exactly how many people are involved in “bad survey behaviors.” While this puts a keen spotlight on the nature of problems, we also have the technology to correct many of these issues in real-time. So while we are now aware of potential issues, we are also better prepared than at any time in the past, to deal with threats to data quality. This paper will detail the steps and procedures that we use at Socratic Technologies to ensure the highest data quality by correcting problems in both sample sourcing and bad survey behavior. Sample Sources & Quality Procedures The first line of defense in overall data quality is the sample source. Catching problems begins by examining the way panels are recruited. According to a variety of industry sources, pre-identified sample sources (versus Web intercepts using pop-up invitations or banner ads) now account for almost 80% of U.S. online research participants (and this proportion is growing). Examples include: • Opt-in lists • Customer databases • National research panels • Private communities A common benefit to all of these sources is that they include a ready-to-use database from which a random or pre-defined sample can be selected and invited. In addition, pre-recruitment helps to solidify the evidence of an opt-in permission for contact or to more completely establish an existing business relationship—at least one of which is needed to meet the requirements of email contact under the federal CAN-SPAM Act of 2003. In truth, panels of all kinds contain some level of bias driven by the way recruitment strategy is managed. At Socratic we rely on panels that are recruited primarily through direct invitation. We exclude sample sources that are recruited using a “Get paid for taking surveys” approach. This ensures that the people who we are inviting to our surveys are not participating for strictly mercenary purposes--which has been shown to distort answers (i.e. answering questions in such a way as to “please” the researcher in exchange for future monetary rewards.) CONTINUED sotech.com | 800-576-2728 2 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued In addition, we work with panel partner who undertake thorough profile verification and database cleaning procedures on an ongoing basis. Our approved panel partners regularly scan databases for: Panels and Sample Sources are like wine, if you start with poor grapes, no matter what the skill of the winemaker, the wine is still poor. How panels are recruited determines • unlikely duplicated Internet server addresses • series of similar addresses (abc@hotmail, bcd@hotmail, cde@hotmail, etc.) • replicated mailing addresses (for incentive checks) • other data that might indicate multiple sign-ups by the same individual • impossible changes to profiling information (e.g., a 34 year old woman becoming an 18 year old man) • lack of responsiveness (most drop panelists if they fail to respond to five invitations in a row) • non-credible qualifications (e.g. persons who consistently report ownership or experience with every screening option) • a history of questionable survey behavior (see “Cheating Probability Score” later in this document) CONTINUED the long-run quality of the respondents they produce. sotech.com | 800-576-2728 3 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS Socratic’s Network of Global Panel Providers The following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis: VENDOR NAME COUNTRIES VENDOR NAME 3D interactive.com Australia EuroClix B.V. Panelcliz Netherlands Accurate Market ResearchMexico Flying Post UK, France, Germany Focus Forward US AdperioUS GainJapan AdvaithAsia Garcia Research AssociatesUS 42 Market Research France COUNTRIES AG3 Brazil, Argentina, Mexico, Chile AIP Corporation Asia HRHGreece Internet Plaza Asia IID Interface in DesignAsia AlterechoBelgium Amry Research Russia, Ukraine ARCPoland AuroraUK Aussie Survey UK & Australia Authentic Response US Beep world Austria, Switzerland, Germany BestLifeLATAM Blueberries Israel C&R Research Services, Inc. US GMI VENDOR NAME All Countries China, Hong Kong India CorpscanIndia CotterwebUS Data Collect Czech Republic Opinions UAE, Saudi Arabia Panel Base UK Panel Service Africa South Africa Panthera Interactive All Countries Precision Sample US Public Opinious Canada Pure Profile UK, US, Australia Russia, Ukraine, US, UK Japan iPanelOnlineAsia ResultaAsia IthinkUS RPAAsia ItracksCanada Sample Bus Asia IvoxBelgium Schlesinger Assoc. US Lab 42 SeapanelsAsia All Countries LivraLATAM Community View All Countries Rakuten Research Cint All Countries US Opinion Outpost/SSI Quick Rewards Campus Fund Raiser US All Countries Russia, Ukraine Opinion Health InzichtNetherlands, Belgium, France Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand, Netherlands, France, Sweden, UK, Switzerland Clear Voice / Oceanside OMI PanelbizEU Inquision South Africa, Turkey Insight CN COUNTRIES OfferwiseUS Luth Research US M3 Research Nordics Maktoob Research Middle East Market intelligence US, EU Market Tools Canada, Australia US, UK, France, Spec Span US Spider Metrix Australia, UK, Canada, New Zealand, South Africa, US STR Center All Countries Telkoma South Africa Testspin/WorldWide All Countries Think Now Research US TKL Interactive US TNS New Zealand New Zealand All Countries All Countries DelviniaCanada Market-xcel India, Singapore Toluna EC Global Panel Masmi Ukraine Hungary, Russia, United sample EksenTurkey Embrain Co. Mc Million US uthinkCanada EmpanelUS Mo Web EU Empathy Panel Ireland My Points US, Canada WebMD Market Research Services Empowered Comm. Australia Nerve planet LATAM, US Asia UserneedsNordics US World One Research US, France, Germany, Spain India, China, Japan ePanel Marketing ResearchChina Net, Intelligence & ResearchKorea Erewards/ ResearchNOW Netquest Portugal, Spain YOUMINTIndia All Countries US, Canada, UK Italy, France, Germany, Spain, UK Zapera (You Gov) Esearch ODC Service YOCGermany All Countries sotech.com | 800-576-2728 4 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued Anti-Cheating Protocols Unlike other data collection modes, the As a first step in identifying and rejecting bad survey behavior, we need to differentiate between Cheating and Lazy Behavior issues. The solutions Socratic uses for handling each type of problem differ by class of delinquency. server technology Cheaters attempt to enter a survey multiple times in order to: used in online surveys • Collect compensation give the researcher far more control over in-process problems related to cheating and bad behavior. • Sabotage results Lazy folks don’t really think and do the least amount of work necessary to complete • Sometimes to get the compensation • Other times, because of the burden, boredom or fatigue of long, repetitious, difficult surveys Many forms of possible cheating and lazy respondent behaviors can be detected using server-based data and response pattern recognition technologies. In some cases, bad respondents are immediately detected and rejected before they even begin the survey. This is critical for quality, because we don’t accept or pay for “illegitimate” or “duplicated” respondents increasing the value of every completed interview. Other times, we allow people to enter the survey, but then use pattern recognition software to detect “answer sequences” that warrant “tagging and bagging.” Note: while we inform cheaters that “they’re busted and won’t be getting any incentive,” we don’t tell them how they were caught! One of our key tools in assessing the quality of a respondent is the Socratic Cheating Probability Score (CPS). A Cheating Probability Score looks at many possible problems and classifies the risk associated with accepting an interview as “valid and complete.” However, we also need to be careful not to use a “medium probability score” as an automatic disqualifier. Just because the results are not what we expect, doesn’t mean it’s wrong! Marginal scores should be used to “flag” an interview, which should then be reviewed before rejecting. High scores are usually rejected mid-survey before the respondent is qualified as having “completed.” Here are some examples of how we use technology to detect and reject common respondent problems: Repeat Survey Attempts Some cheaters simply attempt to retake surveys over and over again. These are the easiest to detect and reject. To avoid self-selection bias, most large surveys today are done “by customized invitation” [CAN-SPAM 2003] and use a “handshake” protocol. Pre-registering individuals with verified profiling data in order to establish a double or triple opt-in status Cheaters Solutions: Handshake Protocols A handshake protocol entails generating a unique URL suffix-code, which is used for the link to the survey in the email invitation. It is tied to a specific individual’s email address and/or panel member identification. Once it is marked as CONTINUED sotech.com | 800-576-2728 5 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued “complete” in the database no other submissions are permitted on that person’s account. An example of this random suffix code is as follows: http://sotechsurvey.com/survey/?pid=wx54Dlo1 Supplementing the invitation handshake, a Cookie check is utilized. At the start of all surveys, Socratic looks for a cookie bearing that survey_id and if it is found, the user will not be allowed to take the survey again. The respondent ID is immediately blocked, so that even if they remove the cookie later on, they still won’t be allowed back in. At the time a survey is finished (complete or termination), a cookie with the survey_id will be placed on the users machine. But cookie checks are no longer sufficient by themselves to prevent multiple submission attempts. More advanced identification is needed. For a more advanced identification verification; Socratic utilizes an IP & Browser Config Check. This is a server-level test that is invisible to the respondent. Whenever a person’s browser hits a Web site, it exchanges information with the Web server in order for the Web pages (or survey pages) to display correctly. For responses to all surveys, a check can be made for multiple elements: IP Address The first level of validation comes from checking the IP address of the respondent’s computer. IP addresses are usually generated based on a tightly defined geography. So if someone is supposed to be in California, and their IP address indicates a Chinese based service, this would be flagged as a potential cheating attempt. Browser String Each browser sends a great deal of information about the user’s system to the Survey Server. These strings are then logged and subsequent survey attempts are compared to determine whether exact matches are occurring. This is an example of a browser strings that would be used to detect matches: • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; Advanced Searchbar; .NET CLR 1.1.4322; .NET CLR 1.0.3705; KelkooToolbar 1.0.0) • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Monzilla/4.0 (compatible; MSIE 6.1; Windows 95/98/NT/ ME/2000/XP; 10290201-SDM); SV1; .NET CLR 1.0.3 Language Setting Another browser-based information set that is transmitted is the language settings for the user’s system. These too are logged and compared to subsequent survey attempts. • en-us,x-ns1pG7BO_dHNh7,x-ns2U3 • en-us,en;q=0.8,en-gb;q=0.5,sv; • zh-cn;q=1.0,zh-hk;q=0.9,zh-tw; • en-us, ja;q=0.90, ja-jp;q=0.93 Internal Clock Setting Finally, the user’s computer system has an internal time keeping function that continuously monitors the time-of-date and date out to a number of decimal CONTINUED sotech.com | 800-576-2728 6 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued points. Each user’s computer will vary slightly even within the same time zone or within the same company’s system. When these four measurement are taken together, the probability of two exact settings on all readable elements is extremely low. impaired individuals.) As computers becoming more and more sophisticated in their ability to detect patterns, the CAPTCHA distortions have become more complex. Examples are as follows: Technology for cheating in online surveys has proliferated over the past 10-years and in some areas of the world has become a cottage industry. However, with the correct server technology, Socratic can detect the profiles of cheating applications and thwart them in real-time, prior to completing a survey. Techno-Cheaters Images that Can be “Read” by Image Recognition Bots (As of 2013) Some cheaters are caught because they are trying to use technology to submit multiple surveys. Form Populators and Key-Stroke Replicators are examples of auto-fill technologies. Techno-Cheaters Solutions: Total automation can be thwarted by creating non-machine readable code keys that are used at the beginning of a survey to make sure a human being is responding versus a computer “bot.” We refer to this as a Handshake Code Key Protocol. One of the most popular Handshake Code Key Protocols is CAPTCHA [Source: UC Berkeley CAPTCHA Project: http:// www.cs.berkeley.edu/~mori/gimpy/gimpy. html]. To prevent bots and other automated form completers from entering our surveys, a distorted image of a word or number can be displayed on the start screen of all Socratic projects. In order to gain access to a survey, the user has to enter the word or number shown in the image into a text box; if the result does not match the image, the user will not be allowed to enter the survey. (Note: some dispensation and use of alternative forms of code keys are available for vision Images that Cannot be “Read” by Image Recognition Bots CONTINUED sotech.com | 800-576-2728 7 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued Lazy Respondent Behavior Lazy behavior is far more prevalent as a survey A far more common problem with survey takers across all modes of data collection are people who just don’t take the time and effort to answer questions carefully. This can result in rushed surveys or those with replicated pattern issues. predetermined point, before the survey has been completed. • Based on the time since starting, the number of closed ended questions, and the number of open end questions, a determination will be made as to whether the respondent has taken an adequate amount of time to answer the questions. problem than outright There are several reasons that are related to why respondents don’t pay attention. cheating; primarily • Problem 1: Just plain lazy If (Time < (((# of CEs * secs/CE) + (# of OEs * secs/OE)) * 0.5)) Then FLAG. • Problem 2: Survey design is torturous Replicated Patterns because it’s easier to defeat cheaters than ●● Too long ●● Boring/Repetitious ●● Too difficult algorithms, however, ●● Not enough compensation it is now possible to ●● No affinity with sponsor people who aren’t paying attention. With new, more sophisticated limit the influence of lazy respondents in mid-survey. But whatever the reason for lazy behavior, the symptoms are similar, and the preventative technologies are the same. Speeders In the case of rushed surveys (“Speeders”), speed of submission can be also used to detect surveys completed too quickly. One statistical metric that Socratic uses is the Minimum Survey Time Threshold. By adapting a normative formula for estimating the length of a survey based on the number of various types of questions one can calculate an estimated time to completion and determine if actual time to completion is significantly lower. This test is run at a • Another common problem caused by lazy behavior is the appearance of patterned answers throughout a survey (e.g. choosing the first answer for every question, or selecting a single rating point for all attributes.) These are fairly easy to detect and the respondent can be “intercepted” in mid-survey and asked to reconsider patterned sequences. Socratic uses Pattern Recognition Protocols within a survey to detect and correct these types of problems. Here are some of the logic solutions we apply for common patterning problems: • XMas Treeing – This technique will identify those who “zig-zag” their answers (e.g. 1,2,3,4,5,4,3,2,1, etc.) ●● How to: When all attributes are completed take the absolute value of all att-to-att differences, if the mean value is close to 1 you should flag them. CONTINUED sotech.com | 800-576-2728 8 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued • Straight Lining – This technique will identify those who straight line answers to a survey (e.g., taking the first choice on an answer set or entering 4,4,4,4,4,4 on a matrix, etc.) ●● The majority of problems related to data quality can be detected before a survey is completed. How to: Subtract each attribute (SubQuestion) from the previous and keep a running total. When all attributes are completed take the absolute value, if the mean value is 0 you should flag them. However, a variety of Random Answers ongoing checks can add While these Pattern Recognition Protocols pick up many common problems, they cannot detect random answer submission (e.g. 1,5,3,2,5,4,3,1,1, etc.) For this we need another type of logic: Convergent/ Divergent Validity tests. even more assurance that respondents are who they claim to be and are located in the correct location. Panel cleaning is necessary for long-run viability. This test relies on the assumption that similar questions should be answered in a similar fashion and polar opposites should receive inverse reactions. For example, if someone strongly agrees that a product concept “is expensive,” he or she should not also strongly agree that the same item “is inexpensive.” When these types of tests in place, the survey designer has the same flexibility to intercept a survey with “validity issues” and request that the respondent reconsider their answers. Cross-Survey Answer Block Sequences Occasionally, other anti-cheating/anti-lazy behavior protocols will fail to detect a well-executed illegitimate survey. For this purpose, Socratic also scans for repeated sequences using a Record Comparison Algorithm – Questionnaires are continuously scanned, record-to-record, for major blocks of duplicated field contents (e.g. >65% identical answer sequences.) Note: Some level of discretion will be needed on surveys for which great similarities of opinion or homogeneity in the target population are anticipated Future development is also planned to scan open-ended comments for duplicated phrases and blocks of similar text within live surveys. Currently, this can only be done post-hoc. Post Survey Panel Cleaning Post Survey Detection For the panels managed by Socratic Technologies, the quality assurance program extends beyond the sample cleaning and mid-survey error testing. We also continuously monitor issues that can only be detected post-hoc. Address Verification Every third or fourth incentive payment should be made by check or mailed notice to a physical address if people want their reward, they have to drop any aliases or geographic pretext in order for delivery to be completed, and often times you can catch cheaters prior to distribution of an incentive. Of course, duplicated addresses, P.O. Boxes etc. are a give-away, we also look for slight name CONTINUED sotech.com | 800-576-2728 9 ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS continued derivatives not usually caught by banks, including: vey design. Some discretion will always be a requirement of survey usability: • nicknames [Richard Smith and Dick Smith] • Writing screeners that don’t telegraph qualification requirements • use of initials [Richard Smith and R. Smith] • Keeping survey length and burden to a reasonable level • unusual capitalization [Richard Smith and RiCHard SmiTH] • Minimizing the difficulty of compliance • small misspellings [Richard Smith and Richerd Smith] • Enhancing the engagement levels of boring tasks Conclusion • Maximizing the communication that participation is worthwhile and appreciated Many features and security checks are now available for assuring the validity of modern online research, this includes pre-survey panel quality, mid-survey cheating and lazy behavior detection and post-survey panel cleaning. While many of these techniques Socratic can “flag” a possible cheating or lazy behavior, we believe that the analyst should not just automatically reject interviews, but examine marginal cases for possible validity. With these technologies in place, online research can now be more highly regulated than any other form of data collection. Not all survey bad behavior is malicious; some is driven by poor and torturous sur- sotech.com | 800-576-2728 10 CONTACT San Francisco Headquarters Socratic Technologies, Inc. 2505 Mariposa Street San Francisco, CA 94110-1424 T 415-430-2200 (800-5-SOCRATIC) Chicago Regional Office Socratic Technologies, Inc. 211 West Wacker Drive, Suite 1500 Chicago, IL 60606-1217 T 312-727-0200 (800-5-SOCRATIC) Contact Us sotech.com/contact Socratic Technologies, Inc. is a leader in the science of computer-based and interactive research methods. Founded in 1994 and headquartered in San Francisco, it is a research-based consultancy that builds proprietary, interactive tools that accelerate and improve research methods for the study of global markets. Socratic Technologies specializes in product development, brand articulation, and advertising research for the business-to-business and consumer products sectors. Registered Trademarks, Salesmarks and Copyrights The following product and service descriptors are protected and all rights are reserved. Configurator Analysis , ReportSafe , Site-Within-Survey , Socratic Browser , Socratic CardSort , Socratic ClutterBook , Socratic ColorModeler , Socratic CommuniScore , Socratic Forum , Socratic Perceptometer , Socratic ProductExhibitor , Socratic Site Diagnostic , SSD , Socratic Te-Scope , Socratic Usability Lab , Socratic VisualDifferentiator , Socratic Web Boards , Socratic Web Survey 2.0 , SWS 2.0 , Socratic WebComm Toolset , Socratic WebPanel Toolset . TM TM SM TM SM SM SM SM SM SM SM SM SM SM ® SM SM SM SM SM SM sotech.com | 800-576-2728 11
Similar documents
Online Survey Sample and Data Quality Protocols sotech.com | 800-576-2728 sotech.com
More information
Configurator AnalysisTM and Price Sensitivity using Modified van
answer these questions from a standpoint of reasonable expectations is hampered by lack of familiarity with pricing for the product category in general. Therefore, we recommend giving a range of co...
More information