How can we use authorship and citations to better understand
Transcription
How can we use authorship and citations to better understand
SURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISM Logan Walls - Researcher Isabelle Edwards - Researcher Tin Ho - Project Manager RESULTS PROBLEM STATEMENT PROCESS DATA PROCESSING RAW DATA COLLECTION Obtained data from New York Times application programming interface (API) Pearson r = 0.30 P-value > 0.0001 1.2e-06 1.0e-06 Initially we interpreted this as a difference in research styles between journalists, but upon examining the same variables aggregated by journalist the correlation was much weaker, suggesting that the pattern we observe between citation counts and eigenfactor is more content-driven than journalist-driven. 8.0e-07 6.0e-07 4.0e-07 2.0e-07 0.0e+00 0 5 10 15 20 25 Number of Academic Papers Cited 30 35 40 45 *There is a single outlier node not shown on the graph Author Network Collect metadata about all of the articles which match our query. Includes web URLs, headlines, keywords, publication dates, word counts, and more for each article Used regex to extract domain names domain names from URLs Combine domain names to create a list of academic publication web domains Parsed HTMLs recieved from API with BeautifulSoup for scholarly documents Save links which appear to be citations (any URL in the list of academic publication domains) URLs that contain ‘pubmed’, ‘.gov’, ‘.edu’, ‘doi’, ‘abstract’, or ‘pdf’ are also suspected to be citations Collected digital object identifiers (DOI) from scholarly documents If link leads to an HTML page, parsed the page for DOIs using BeautifulSoup and regex If the link leads to a PDF, parse XML file generated by GROBID machine learning package for DOIs Steven B. Abramson Marc Triola Obtained metadata through DOI lookup service Parse response using regex to retrieve DOI metadata (including article title, authors, publisher, etc.) Analyzed data and created figures using GraphLab and Tableau Requested Eigenfactor and open-access information from Dr. Jevin West for each DOI Look at any anomalies or interesting trends in the data and figures Combine the NYT and link data by concatenating all NYT metadata into one table using Graphlab Sorted out nested data structures and extract relevant information Join DOIs to each NYT article via the links from which the DOIs were retrieved Merge all metadata retrieved from DOI lookup service into a single table Reconcile inconsistent field-names and missing values Join the metadata received from Dr. West into the table Compute unigram set for each NYT article using body-text and filter by calculating Term-Frequency-Inverse-Document-Frequency score Iteratively train topic models on the filtered unigrams using Graphlab, adjusting parameters as needed DAVID STIPP Mary E. Klingensmith Ralph W. Aye Lynn Buckvar-Keltz David V. Feliciano Daniel B. Jones Sharona B. Ross Victoria Harnik Michael M. Awad José M. Martinez Rebecca M. Minter Stephen D. Wexner Fritz Francois Melvin Rosenfeld Bruce D. Schirmer Maurice E. Arregui Morris E. Franklin PAULINE W. CHEN, M.D PAULINE CHEN, M.D Jamie J. Coleman Dianna Jacob Karen Hinckley Stukovsky S. Mohlenkamp R. W. Allen Timothy V. Larson Damiano Baldassarre Ann Rojas-Cheatham Merce Medina A. V. Diez Roux Paul Fischer A. Bhatnagar A. H. Auchincloss Amy H. Auchincloss Carrie Breton S. Katharine Hammond Alicia Bourne Simon Hales Bernardo Beckerman Masayuki Yokode Christoph Kessler Petros Koutrakis Martha E. Fay Raquel Garcia-Esteban Toshinori Murayama Cuno S. Uiterwaal P. D. Sampson David Hardie Joseph F. Polak A. Schmermund Duncan Thomas Sverre Vedal Cynthia L. Curl Virissa Lenters Steven Shea Lee L. Swanstrom Grace S. Rozycki Samer G. Mattar Adnan A. Alseidi Andrew W. Correia Marie S. O\'Neill Anjum Hajat Michael Nonnemacher Thomas J. Esposito Rafael Rivera Robert I. Grossman Academic Author Alison D. Schecter Patrick Kinney Yun Wang D. Rohan Jeyarajah M. A. Mittleman Chunli Quan A. V. Diez-Roux Ulf de Faire K.-H. Jockel Thomas Lumley D. R. Jacobs M. Memmesheimer J. R. Brook R. P. Tracy Xavier Basagaña A. Stang B. Astor Timothy Gould Karol Watson Yasuharu Niwa Pathmaja Paramsothy A. Navas-Acien Jeffrey H. Sullivan Morton Lippmann Laurie LaBree K. Mann Bert Brunekreef Adam A. Szpiro Eric de Groot Howard N. Hodis S. L.-J. Liu Annette Peters N. Solano J. D. Kaufman Robert Steinbrook James H. Ware C. Shanley Bryan Forrester R. Graham Barr Ximei Jin Linda Lam Bo Lu J. I. Rotter M. A. Cohen B. Hoffmann Stefan Möhlenkamp B. Mausbach Lung Chi Chen Mio Yamashita Naoharu Iwai Gerard Hoek Sanjay Rajagopalan Matthew Budoff D. W. Coon Benjamin G. Ferris C. A. Pope Ruth O’Hara Bryan Pogue Alan F. Schatzberg A. A. Szpiro Kam-Mei Lau Jess Leung Jeremy A. Sarnat Qinghua Sun Weiling Liu Aixia Wang Miriam Fuchsluger Michael Memmesheimer Sa Liu Marcus Bauer Nino Künzli Knut Kröger Y. Santalucia Robert Reiser J. Kaufman Wendy J. Mack F. Holguin Ray Chan G. L. Burke D. Leone Daniel E. Jimenez R. V. Luepker DEBORAH BLUM Matthew Allison H. L. Gray Gary Mallach C. L. Curl Hui-Qi Tong Ana V. Diez Roux Francesca Dominici A. Mollina Martin A. Cohen Joel D. Kaufman Ruth M. O’Hara Douglas W. Dockery Karl-Heinz Jöckel Tamiko Eto-Iwase Edward Avol J. M. Holland Jing Shiang Hwang Paul Sampson Qiang Li Andreas Stang Sara D. Adar Delores Gallagher-Thompson D. W. Durkin Landon Myer Daniel Jimenez Didier Moatti Man Kin Lai Xavier Basagana Barbara Hoffmann Rashmi Gupta Frank Gilliland Susanne Moebus T. V. Larson Lianne Sheppard Kevin Chan Natalie Rasgon Helen H. Suh C. Arden Pope Armin Azar Brent T. Mausbach Kateryna Fuks A. V. D. RouxN. Dragano Ryan W. Allen J. Long S. D. Adar Garnet L. Anderson Christine Moran Ana V. Diez-Roux R. A. Kronmal Ladson Hinton A. P. Spira Melen McBride Axel Schmermund Raimund Erbel Stephanie von Klot Paul D. Sampson Peng-Chih Wang John Peters Rebecca Peng NICHOLAS BAKALAR R. Erbel Michael A. Cucciare Patrick Leung Kala Mehendra Mehta T. Lumley A. M. Casillas R. G. Barr Kristin A. Miller Meng-kong Wong M. L. Daviglus Michiel L. Bots D. E. Bild A. Peters Julia Dratva N. Lehmann Nico Dragano Shelli R. Kesler Yuan Marian Tzuang L.W. Thompson Ann Bilbrey Peibin Yue Robert P. Reiser Majid Ezzati Michael Jerrett Joseph M. Currier Mianhua Zhong D. Gallagher-Thompson Johanna Penell Anne Ho Danielle China Karen D. Stukovsky Dolores Gallagher Thompson Hermann Jakobs John D. Spengler H. Kyriazopoulos Rita I. Kirk S. C. Smith Bernardo Beckermann Kristen Shepherd DOUGLAS QUENQUA Sapphire Li Renee M. Marquett Rob Beelen Susan S. Swan L.-J. Sally Liu Dolores Gallagher-Thompson Heather L. Gray D. H. O\'Leary Alma Au M. Rothkopf ABBY ELLIN Jochen Seissler L. Chen N. Carragher Laura Perez L. Whitsel Lea Liviakis Gina S. Lovasi Robert Detrano Kristine Yaffe Terry GordonXiping Xu T. G. Franklin L. Sheppard JUDITH GRAHAM Aleksandra Stepanenko Yumiko Hiura NICHOLAS WADE Andrew Futterman S. A. Beaudreau M. J. Budoff K. L. Thompson L. W. Thompson Jason M. Holland David R. Jacobs S. Moebus A. F. Schatzberg B. Draper Nathan D. Wong J. H. Stein S. E. Straus Y. HongMichael H. Criqui Larry W. Thompson T. Raghunathan KAREN WEINTRAUB D. Siscovick Hugh Davies Patrick J. Brown H. Kraemer Larry Thompson Vinnie Cheung Jaume Marrugat R. J. Tiongson D. S. Siscovick Frank E. Speizer T. Arguëlles Robert V. Tauxe Pey-Chyou Pan John Di Mario Roberto Elosua R. D. BrookS. Rajagopalan Hui-jing Lu ANAHAD O\'CONNOR and KAMARA SWABY D. Gandell David S. Siscovick Eric D. Peterson Georgina Charlesworth S. M. H. Alibhai Philip J. Atkinson Man-Kin LaiY. G. Rabinowitz Teri L. Hernandez L. Nichols Keith Sudheimer V. RozalskiLarry E. Beutler M. Bundookwala D. M. Lloyd-Jones Frederick J. Angulo John A. Painter V. Tsui Henry Brodaty RICHARD A. FRIEDMAN, M.D Yaron G. Rabinowitz Margaret D. Carroll Joseph B. Tomlins Robert M. Hoekstra Fu-Chen Chen Martin I. Meltzer Christopher R. Braden TARA PARKER-POPE Heather Gray Eunice Rodriguez MELANIE WARNER D. G. Thompson J. DeCoster Robert H. Eckel Andrés Losada Lynn C. Waelde Patricia M. Griffin Ruth M. O\'Hara Tracy Ayers HARRIET BROWN Nigel Field Annecy Majoros Bruno Kajiyama F. Sun M. Rubert J. Keeler Edward A. Gill J. E. Manson JAN HOFFMAN DONALD G. McNEIL Jr Paul Lichtenstein D. Mozaffarian W. E. Haley Nagalingeswaran Kumarasamy DANIELLE OFRI, M.D Julian Montoro-Rodriguez David W. Coon B. T. Mausbach R. O\'Hara Wai-Chi Chan For each DOI, send a request to http://dx.doi.org for a response in turtle format Journalists JAMES GORMAN ALAN SCHWARZ and SARAH COHEN BARRY MEIER PAM BELLUCK Mary Ann Hopkins Frank R. Lewis Use ontology query to retrieve web URLs of entities which have the “scholarly publication” attribute Generate topic groupings for the NYT articles Number of Citations vs. Average Eigenfactor Avg. Eigenfactor of Papers Cited How can we use authorship and citations to better understand information diffusion between popular media and academic articles for the purpose of informing the general public as well as the academic community? Eigenfactor is a metric used to measure the influence of academic publications: by employing a similar algorithm to PageRank, the influence of each article is not merely determined by the number of citations it receives, but also by the influence of the papers which cite it. By plotting the number of academic citations in a New York Times article against the average eigenfactor of those citations we show a significant positive correlation (Pearson's r = 0.31, p < 0.0001). L. D. Burgio Yea-Ing ShyuAlan Schatzberg Rodney U. Anderson JANE E. BRODY M. M. Hilgeman Benoît G. Bardy PETER ANDREY SMITH GINA KOLATA Meir J. Stampfer Sebastien Villard Mari Tervaniemi Sidney C. Smith Alberto Ascherio Thomas A. Stoffregen GRETCHEN REYNOLDS Yawen Yu I-Min Lee DENISE GRADY ANAHAD O\'CONNOR Peter C Gøtzsche Kristy Lee Elena Losina J M Ordovás Diana M. Thomas Andrea Z. LaCroix Johanna Rengifo Sylvain Moreno C. A. Depp Renee Marquett ALAN SCHWARZ E. C. Saenz Johan Auwerx Timothy Sawyer P Gómez-Abellán R. S. Allen J. M. Donelan Rochelle P. Walensky Philip Greenland Michelle M. Mielke Kenneth H. Mayer Y-C Lee Kevin R. Fontaine Pere Puigserver PERRI KLASS, M.D Mary M. Machulda F A J L Scheer Kenneth A. Freedberg Maren Schmidt-Kassow Ronald C. Petersen JILL WERMAN HARRIS JoAnn E. Manson Rosebud O. Roberts J J Alburquerque-Béjar PAULA SPAN David B. Allison David S. Knopman Andrew S. Kern Abdullah A. Al Rabeeah Phyllis C. Zee Susan J Shepherd David Wise WILLIAM L. HASKELL Kelly G. Baron Gavin M. Bidelman Paul T. Williams Kathryn J. Reid Alejandro Lucia Thomas E. Young KATHERINE BOUTON Arthur F. Kramer LAURIE TARKAN P. H. R. Green M. M. WalkerJ. A. Murray M Garaulet A. Drewnowski Richard J. Shaw Q Yang Conrad P. Earnest KENNETH CHANG Michelle W. Voss Benson Silverman Nina Kraus LAURA GEGGEL O. Berenfeld D. S. Sanders T Oliver Amber Thornton-Bullock A. Fasano Jane G MuirM. Hadjivassiliou Mark J. Travers ASHLEY TAYLOR Michelle T. Bover Manderski J. N. Leonard K. Kaukinen Pooja S. Tandon Paula Lozano ABIGAIL ZUGER, M.D Natasha Sokol DAVID DOBBS J. F. Ludvigsson CATHERINE SAINT LOUIS W. Edryd Stephens F. Zingone Rosalie V. Caruso C. Ciacci Jennifer Beal SABRINA TAVERNISE D. A. Leffler Annette M. Hartmann Cristine D. Delnevo Teodor T. Postolache SOPHIE EGAN M. Justin Byron C. P. Kelly Jessica R Biesiekierski Sang E. Lee Ziad A. Memish Dimitri A. Christakis Mary Hrywna Cristine Delnevo Patricia Langenberg Michelle T. Bover-Manderski Jane A. Allen Olaoluwa Okusaga James D Doecke Peter M Irving Ina Giegling Jennifer Cullen Paul Mowery K. E. A. Lundin Evan D Newnham Jodeanne Bellant Peter R Gibson Chuan Zhou K Lewis Melissa Haines C W WoodsKarl Klontz J. C. Bai Jacqueline S BarrettF. Biagi BENEDICT CAREY Gloria Reeves Robert H. Yolken Aamar Sleemi Leonardo H. Tonelli Bettina Konte LAURA BEIL RONI CARYN RABIN JESSICA NUTIK ZITTER, M.D Richard J. O’Connor Dan Rujescu The figure above is a network of all authors in our data set. The blue nodes are academic paper authors, and the green nodes are New York Times journalists. Each line coming from the journalist node is a citation to an academic article. This network only contains nodes that have received more than 5 citations (academic), and nodes that cite more than 5 articles (New York Times). The size of the nodes are determined by the amount of citations they have received or papers they have cited. The journalists Deborah Blum and Nicholas Bakalar are shown to have cited many of the same academic articles (shown on the top left of the network). Similarly, the journalists Abby Ellin and Judith Graham are also shown to cite many of the same academic authors. Journalists in the center of the network have citations to authors all over the network, and do not seem to overlap too much with any of the other journalists. The journalists at the bottom of the network have a similar amount of citations as many of the other journalists, but are shown to have cited a fewer amount of academic authors. This could mean that they have cited a smaller sample authors on several occasions, or that they cited many different authors less than five times. Journalists on the edges (with no connections to academic authors) do cite more than five authors, but do not cite those authors more than five times. Special thanks to Jevin West and Emma Spiro of the University of Washington’s DataLab for guiding us along in our project