Voice versus SMS interviews: CollecMng Survey Data with Mobile

Transcription

Voice versus SMS interviews: CollecMng Survey Data with Mobile
Voice versus SMS interviews: Collec3ng Survey Data with Mobile, Mul3modal Devices Frederick G. Conrad
University of Michigan (MPSM)
University of Maryland (JPSM)
USA
VCIOM Grushin Conference
12 March, 2015
1 Acknowledgments •  Joint work with Michael Schober –  Parallel awards to Conrad and Schober –  NSF grant SES-­‐1026225 and SES-­‐1025645 •  Collaborators: –  Christopher Antoun, Chan Zhang, Yanna Yan, Michael Johnston, Patrick Ehlen, Andrew Hupp, Lloyd Hemingway, Lucas Vickers, Stefanie Fail, Kelly Nichols, Leif Percifield, Courtney Kellner 2 New ways of communica3ng may affect quality survey responses •  People increasingly communicate via smartphones –  58% of US adults own one (Pew, January 2014) –  70% of S. Koreans use one (Global Post, April 2014) –  36% of Russians (Google’s Our Mobile Planet 2013) –  Ownership increasing •  via mul3ple modes –  Voice, Text (SMS), Email, Videochat, Social Media •  possibly while mul3tasking 3 When people communicate on smartphones, they •  choose a mode that fits their current sedng and needs –  e.g., urgent vs. can wait, public vs. private, noisy vs. quiet, bright vs. dim •  switch modes while communica3ng •  respond in a different mode than contacted –  e.g., respond to voicemail with a text What does this mean for survey modes? •  New modes: –  People increasingly use text messaging for personal and professional communica3on –  How does it compare to voice interviews in precision and truthfulness of answers? •  Ability to choose between several modes –  all on the same device –  How does mode choice affect •  par3cipa3on/comple3on? •  response quality? 5 Current study •  examines –  data quality (sa3sficing, disclosure) –  comple3on rates –  respondent sa3sfac3on •  in 4 exis3ng or plausible survey interview modes that work through na3ve apps on the iPhone* –  As opposed to specially designed survey apps –  As opposed to web survey in phone’s browser •  when Rs assigned to or choose interview mode *to assure uniform interface for all Rs 6 Experimental Design •  4 Modes: –  Human Voice –  Human Text (SMS) –  Automated Voice –  Automated Text (SMS) 7 Surveys via text messaging? •  Tex3ng is becoming a poten3ally important way to reach poten3al Rs –  some may ahend to text more than to email or voicemail messages –  Rs may expect to be able to par3cipate in a survey via text •  Some organiza3ons now include SMS text in their suite of modes for mobile surveys –  e.g., GeoPoll, Poll Everywhere, iVisionMobile •  But most do not Text as a mode of interac3on •  Turn-­‐by-­‐turn –  Threaded (on a smartphone) •  Responses don’t need to be immediate –  Allows mul3tasking •  Works even with intermihent network/cell service –  unlike voice •  Does not require web capacity on device –  unlike mobile web survey Text Survey Interview 10 Interface for Human Text Interviewer 11 Human Text Respondent Automated text system would require a single character response or the word “help” 12 Automated Voice •  Respondents’ spoken answers must be recognized by system –  speech-­‐IVR •  Recogni3on 95.6% accurate based on human coders’ judgments •  Survey es3mates (means and percentages) from system or coders very similar despite slightly imperfect automated recogni3on –  Johnston et al. (2013, in prep) Autovoice: Sample Dialogs Categorical Numerical I: How open do you read the newspaper? 'Every day', 'a few 3mes a week', 'once a week', 'less than once a week', or 'never'? R: Every day I: Got it. I: Thinking about the 3me since your eighteenth birthday (including the recent past that you've already told us about), how many male partners have you had sex with? R: None First Hypothesis: “Nine” I: I think you said '9'. Is that right? Yes or No. Explicit Confirma7on R: No I: Thinking about the 3me since your eighteenth birthday (including the recent past that you've already told us about), how many male partners have you had sex with? Last Hypothesis: “Zero” R: Zero I: Thanks Last Annota3on: “Zero” Property
Voice
Text
Synchrony
Fully synchronous
Mostly asynchronous
Medium
Auditory
Visual
Language
Conversational
structure
Persistence of turn
Persistence of
entire conversation
Social presence of
partner
Spoken/heard
Turn-by-turn, with potential for
simultaneous speech
No
Written/read
Turn-by-turn, rarely but
possibly out-of-sequence
Yes
No
Yes, threaded
Continuous (auditory)
presence
Simultaneous, especially
when hands free, unless other
task involves talking
Intermittent evidence
(when texts arrive)
Switching required
between texting and
other tasks
Impact of
environmental
conditions
Potential interference from
ambient noise
Potential interference
from visual glare
Impact of nearby
others
Others may hear answers;
potential audio interference
from others’ talk
Others unlikely to see
text and answers on
screen, though possible
Character of
multitasking
Items •  Safe-­‐to-­‐talk/text ques3on •  32 Qs from major US social surveys and methods studies –  E.g., BRFSS, NSDUH, GSS, Pew Internet & American Life Project –  Most have produced differences in sa3sficing or disclosure between conven3onal modes •  8 (behavioral frequency) Qs require numerical response which can be more or less precise (rounded) –  Allows tes3ng differences in rounding between modes •  Bahery: eight statements about diet (e.g. “I avoid fast food”), each with same favor-­‐oppose scale –  Allows tes3ng differences in straightlining between modes •  14 Qs have more and less socially desirable answers –  e.g., sexual history, drug use, exercise –  Allows tes3ng differences in disclosure across modes 16 March – September 2012 Flow of Events Recruitment iPhone users recruited from Craigslist, Facebook, Google Ads, and Amazon Mechanical Turk; follow link to screener Screening Web ques3onnaire verified R > 21 years old and has US area code Confirma3on of iPhone use Randomly assigned to contact mode N = 634 Interview: mode same as contact mode Debriefing Payment Text message sent to phone number; user’s reply provides user agent string R contacted in: Human Voice, Human Text, Auto Voice, or Auto Text N = 626 Interview: mode chosen by R Rs asked 32 survey Qs Rs were texted link to post-­‐interview, web-­‐
based ques3onnaire $20 iTunes gip code incen3ve 17 Possible effects of tex3ng on quality of answers •  Sa3sficing –  More: Rs may import “least effort strategies” from usual tex3ng prac3ce •  Truncated, abbreviated interac3on –  Less: Reduced 3me pressure •  Rs can answer when they want to •  Disclosure: –  More: reduced social presence; no interviewer face or voice –  Less: permanent record of messages 18 Sa3sficing and Disclosure •  Sa3sficing: –  Rounded numerical answers: divisible by 10 –  Straightlining: same answer (e.g., “strongly favor”) for 7 of 8 statements about diet •  e.g. avoid red meat, limit fast food •  Disclosure: –  Undesirable responses, e.g., less than one day a week” when asked “How open do you exercise?” 19 Rounding:
Rs rounded on fewer Qs in text than voice
Number Rounded* Answers Основной Voice Основной Text Основной Основной Основной Основной Основной Основной Human *divisible by 10 Automated 20 Rounding: “Movie watching last month”* responses that end in zero % Rs whose answer is rounded 25% Voice 20% Text 15% 10% 5% 0% Human Automated *During the last month, how many movies did you watch in any medium? Rounding: “Number of songs on your iPhone”* responses that end in zero % Rs whose answer is rounded 70% 60% Voice Text 50% 40% 30% 20% 10% 0% Human Automated *How many songs do you currently have on your iPhone? Straightlining: Fewer Rs straightlined in text than voice 10% Voice 9% Text % Rs Straigtlining* 8% 7% 6% 5% 4% 3% 2% 1% 0% Human Automated *same answer for ≥ 7 out of 8 statements 23 Disclosure: Rs produced most socially undesirable answer for more Qs in text than voice, and in auto than human modes Number undesirable answers Основной Основной Основной Основной Voice Основной Text Основной Основной Основной Human Automated Number of Qs with most socially undesirable answer (categorical response or numerical answer above cutoff) per R 24 Disclosure: “Exercising less than 1 day per week”*, percent % Rs whose answer is undesirable 35% 30% 25% 20% Voice 15% Text 10% 5% 0% Human Automated *In a typical week, about how often do you exercise? Less than 1 time per
week, 1 or 2 times per week, 3 times per week, or 4 or more times per week?
Disclosure: “Binge drinking”*, percent any days in last 30 % Rs whose answer is undesirable 40% 35% 30% 25% Voice 20% Text 15% 10% 5% 0% Human Automated *During the past 30 days, on how many days did you have 5 or more drinks
on the same occasion?" Disclosure: “Sex partners in last 12 months”*, mean % Rs whose answer is undesirable Основной Основной Основной Основной Основной Voice Основной Text Основной Основной Основной Основной Human Automated *How many sex partners have you had in the last 12 months? Disclosure: “Ahend religious services”*, percent never % Rs whose answer is undesirable 50% 45% 40% 35% 30% 25% Voice 20% Text 15% 10% 5% 0% Human Automated *How often do you attend religious services? At least once a week, almost
every week, about once a month, seldom, or never?" Disclosure: “Sex during last 12 months”*, percent 4 or more 3mes a week % Rs whose answer is undesirable 12% 10% 8% Voice 6% Text 4% 2% 0% Human Automated *About how often did you have sex during the last 12 months?" Sa3sfac3on higher in text than voice Основной Overall, how sa3sfied were you with the interview? % Very sa7sfied Основной Основной p < 0.004 Основной Основной Основной Основной Основной Основной Основной Основной Text Voice Prefer text (vs. voice) for future iPhone interview? Основной Основной Основной % Rs Основной Основной Основной Human Основной Automated Основной Основной Основной Основной Text Voice Efficiency of Text vs. Voice Interviews •  Text interviews have clear data quality advantages over voice •  What is the cost of these advantages? •  How do response rates, comple3ons and breakoffs compare? 32 Response Rates Across All Modes 80% Response Rate* 70% 60% 50% 40% Voice 30% Text 20% 10% 0% Automated Human •  Higher response rate in text could be due to (1) persistence of invita3on (no noncontact), (2) ability to respond when convenient, (3) more 3me to decide *AAPOR RR1: # complete interviews / # invita3ons 33 Breakoffs* Across All Modes 16% % Breakoffs 14% 12% 10% 8% Voice 6% Text 4% 2% 0% Automated Human •  More breakoffs in Text could be due to (1) less social presence (no voice) or (2) asynchronous character (no need to answer quickly) •  Substan3ally higher breakoff rates in Automated than Human modes likely due to absence of human interviewer *Breakoffs: # Rs who started but did complete interview / # Rs who started interview 34 % of completed cases Cumula3ve Comple3on of Cases 100 90 80 70 60 50 40 30 20 10 0 Automated Voice Automated Text Human Voice Human Text Days aLer ini7al invita7on In automated text Rs could reply at any 3me in order to start interview 35 Time in Field per case 50 45 40 Hours 35 30 25 Voice 20 Text 15 10 5 0 Automated Human In automated text, system available 24/7; responded immediately (median response 3me = 3 seconds) aper R texted answer 36 Why are answers more precise in text than voice interviews? •  Text may reduce 3me pressure compared to voice interviews –  Tex3ng conven3ons seem to involve “respond when convenient” assump3on •  In spoken conversa3on, immediate response expected –  600 ms or less (Roberts & Francis, 2013) –  One second (Jefferson, 1989) 37 How might less 3me pressure lead to more precision? •  Text interviews longer than voice interviews … but involve fewer turns •  So Rs may be using the 3me between turns produc3vely •  Could involve checking records and thinking about answer before answering •  More reported mul3tasking in text than voice interviews •  Less rounding in longer than shorter text responses; no difference for voice 38 Interview Dura3on and Median Number of Turns per Ques3on •  Text interviews took longer to complete than voice interviews but they involved fewer conversa3onal turns •  reflec3ng slower back-­‐and-­‐forth (more 3me between turns) in text than voice 39 Text vs. voice interac3on: example 1 HUMAN TEXT 1 I: During the last month how many movies did you watch in any medium? 2 R: 3 HUMAN VOICE 1 I: During the last month, how many movies did you watch in ANY medium. 2 R: OH, GOD. U:h man. That’s a lot. How many movies I seen? Like 30. 3 I: 30. Total elapsed 7me un7l next Q: 1:21 0:12 Text vs. voice interac3on: example 2 HUMAN TEXT 1 I: During the last month how many movies did you watch in any medium? 2 R: Medium? 3 I: Here’s more informa3on. Please count movies you watched in theaters or any device including computers, tablets such as an iPad, smart phones such as an iPhone, handhelds such as iPods, as well as on TV through broadcast, cable, DVD, or pay-­‐per-­‐
view. 4 R: 3 Total elapsed 7me un7l next Q: 2:00 HUMAN VOICE 1 I: *During the last* 2 R: Huh? 3 I: Oh, sorry. Um, during the last month, how many movies did you watch in ANY medium. 4 R: Oh! Let’s see, what did I watch. Um, should I say how many movies I watched or how many movies watched me? [laughs] All right let’s-­‐let me think about that. I think yesterday I watched u:m, not in its en3rety but you know, coming and going. My kids are watching in. Um, I don’t know maybe 2 or 3 3mes a week maybe? Text vs. voice interac3on: example 2 HUMAN VOICE 5 I: Uh, so what would be your best es3mate on how many, um, you saw in the whole month. 6 R: [pause] Um, I don’t know I’d say maybe 3 movies if that many. 7 I: 8 R: Is that going to the movies or watching the movies on tv. Like you said *any medium* right? 9 I: 3? That’s *any movies.* Yep. 10 R: Maybe 1 or 2 a month I’d say. 11 I: 1 or 2 a month? [breath] Uh, so what would be *closer* Text vs. voice interac3on: example 2 HUMAN VOICE 12 R: *Yeah, because* I uh, um, occasionally I take the kids on a Tuesday to see a movie, depending on what’s playing. So I’d maybe once or twice a month 13 I: Which would be closer, once or twice. 14 R: I would say twice. 15 I: Twice? 16 R: R: Mhm. Because it runs 4 Tuesdays which is cheaper to go 17 I: Right 18 R: R: so I’d say twice, yah. Because I do take them twice. Not last month but the month before Total elapsed 7me un7l next Q: 1:36 Less rounding when inter-­‐turn interval* is longer in text interviews Основной Number of rounded answers Medium x interval dura3on F(1, 591) = 4.32, p = .038 Основной Основной Longer Основной Shorter Основной than median inter-­‐turn interval Основной Основной Основной Основной Основной Text Med. Inter-­‐turn interval: 15.75 sec Voice 4.04 sec *for 8 frequency items, i.e., where rounding possible 44 Mul3tasking (self-­‐reported) Doing something else on iPhone Основной % Rs Основной Основной Human Automated Основной Number of other tasks Основной Total number of other tasks being carried out during interview Основной Human Automated Основной Основной Основной Основной Основной Основной Основной Основной Text Voice Основной Text Voice 45 Mode Choice •  If Rs choose interview mode: –  is data quality improved? –  do overall mode differences persist? •  Mode Choice introduc3on: “To get started, we need you to choose how you want to be interviewed -­‐-­‐ whatever works best for you. There are four choices and any choice is fine with us. Do you want to ‘talk with a person’, ‘talk with an automated interviewer’, ‘text with a person’, or ‘text with an automated interviewer”? •  Within each contact mode, order of interview mode op3ons rotated across Rs (16 orders) 46 Mode Choice (2) •  Current implementa3on different from prior implementa3ons of survey mode choice –  Literature mostly about mail invita3on with choice of comple3ng by mail or on the web (e.g., Messer & Dillman, 2011; Millar & Dillman, 2011) •  Generally found (e.g., Fulton & Medway, ‘12) that choice reduces par3cipa3on and is ahributed to paradox of choice (Schwartz, 2004) or choice overload (Iyengar & Lepper, 2000) •  But could be due to break in process required to switch modes –  Comple3ng by web requires switching “devices” and typing in URL •  Choosing – and switching to – a different mode on single device can be easier than switching to a different mode on different “device” –  May harm par3cipa3on less •  The act of choosing plus an easy switch may increase Rs’ commitment, leading to beher quality Par3cipa3on 100 90.4% 90 94.9% 80 Percent 70 60 50 55.9% 48.9% No Choice 40 Choice 30 20 10 0 Start interview (answer Q1) Complete interview once start Overall comple3on higher without (50.5%) than with choice (46.4%) 48 Par3cipa3on: Breakoffs at start not due to paradox of choice but to switching costs % don’t answer Q1 aLer choosing mode Основной 11.1% Основной Основной Основной Основной Основной 0.7% Основной Stay in Mode (n=301) Switch Mode (n=388) Breakoffs at start depend on par3cular mode transi3ons •  Rela3vely high when Rs switch from automated to human-­‐administered modes: 4.3%-­‐18.2% –  Probably because con3nua3on not immediate – interviewers on call 9a -­‐ midnight –  Suggests on-­‐demand (24/7) human interviewers could substan3ally reduce these breakoffs •  Low (even zero) for other transi3ons –  Human to Human modes: 1.7% –  Human Text to Auto Voice: 0% –  Auto Text to Auto Voice: 0% Mode Transi3ons Основной Switched, Agent & Medium Switched, Medium Number of interviews Основной Основной n = 170 Основной n = 157 Switched, Agent n = 150 Основной n = 149 Stayed in Mode Основной Original Sample Size (before mode choice) Основной Contact Mode Possible outcomes: Sa3sficing •  If Rs can Choose –  might sa3sfice more because they choose a mode in which it’s easier to take shortcuts •  e.g., an automated mode: no human interviewer to press Rs to work hard –  might sa3sfice less because being able to choose may increase commitment to task 52 Sa3sficing: Average number of rounded numerical answers Number Rounded Answers Основной p<0.001 Основной 2.58 Основной 2.28 Основной Voice Основной Text Основной Основной Основной Human Automated No Choice Human Automated Choice Effect of choice not due to par3cular choice of mode: less rounding with choice than without aper controlling for chosen mode, p = 0.008 53 Sa3sficing: Percent of Rs straightlining 12% p=0.029 10% 8% 6.78% Voice 6% 3.99% 4% Text 2% 0% Human Automated No Choice Human Automated Choice Effect of choice not due to par3cular choice of mode: less straightlining (marginally) with than without choice aper controlling for mode, p = 0.085 54 Reasons for choosing mode* •  Coded open-­‐ended answers into 29 categories Most common categories % Providing Reason Ease/simplicity 33.8% Convenience/flexibility 22.8% Quickness (shortest interview 3me) 10.3% Privacy 9.8% Like tex3ng 9.0% Environment-­‐-­‐loca3on 8.8% Three coders; Agreement= 98.1% *Why did you choose this interviewing method? 55 Reasons for choosing modes* (examples) • 
Human voice: –  “More comfortable speaking with a real person” • 
Human text: • 
Automated text: –  “I chose to text because I had a small child with me in my home during the interview and could not have concentrated on the ques3ons if it was on the phone.” –  “To avoid background noise and to clearly understand the ques3on and take my 3me to answer it.” –  “I am at work and wouldn't always be able to answer ques3ons if I spoke to someone on the phone.” –  “Because I didn't want to talk on the pho e nor did I want to text a person simply becaus I knew some of my responses would have been a lihle late” • 
Speech IVR: –  “i didn't want to talk to anyone but, I was driving so I couldn't look at a screen” –  “Talking to an automated person was less personal” *Why did you choose this interviewing method? Sa3sfac3on higher with mode choice % Very sa7sfied Основной Overall, how sa3sfied were you with the interview? 69.8 Основной Основной 58.0 p < 0.001 Основной Основной Основной Основной Основной Основной No Choice Choice •  Increased sa3sfac3on may result because people perceive the chosen alterna3ve as more ahrac3ve (Fes3nger, 1948; Cooper, 2007) •  Or may have beher fit their needs Summary •  Data Quality –  Text interviews led to less sa3sficing, more disclosure and greater sa3sfac3on than voice interviews –  Text interviews more efficient than voice interviews –  Automated interviews led to more disclosure and no more sa3sficing than human-­‐administered interviews and •  Mode choice produced: – 
– 
– 
– 
less rounding less straightlining fewer breakoffs (more comple3ons) higher R sa3sfac3on •  Certainly worth further exploring both SMS text interviews and mode choice on a single device 58 Text vs. Voice Interviews •  Unknown how far these findings generalize –  to representa3ve sample? –  to text on devices other than iPhones? –  to other survey topics? •  But this ini3al evidence suggests the viability of text as an interviewing mode –  Ques3ons whether data from voice interviews in a smartphone era should be considered the gold standard for validity and data quality Thank You [email protected] 60