How can we use authorship and citations to better understand

Transcription

How can we use authorship and citations to better understand
SURVEYING USAGE OF ACADEMIC
RESEARCH IN JOURNALISM
Logan Walls - Researcher Isabelle Edwards - Researcher Tin Ho - Project Manager
RESULTS
PROBLEM STATEMENT
PROCESS
DATA PROCESSING
RAW DATA COLLECTION
Obtained data from New York
Times application programming
interface (API)
Pearson r = 0.30
P-value > 0.0001
1.2e-06
1.0e-06
Initially we interpreted this as a difference in research styles
between journalists, but upon examining the same variables
aggregated by journalist the correlation was much weaker,
suggesting that the pattern we observe between citation
counts and eigenfactor is more content-driven than journalist-driven.
8.0e-07
6.0e-07
4.0e-07
2.0e-07
0.0e+00
0
5
10
15
20
25
Number of Academic Papers Cited
30
35
40
45
*There is a single outlier node not shown on the graph
Author Network
Collect metadata about all of
the articles which match our
query.
Includes web URLs, headlines,
keywords, publication dates,
word counts, and more for
each article
Used regex to extract domain
names domain names from
URLs
Combine domain names to
create a list of academic
publication web domains
Parsed HTMLs recieved from
API with BeautifulSoup for
scholarly documents
Save links which appear to be
citations (any URL in the list of
academic publication domains)
URLs that contain ‘pubmed’,
‘.gov’, ‘.edu’, ‘doi’, ‘abstract’,
or ‘pdf’ are also suspected to
be citations
Collected digital object
identifiers (DOI) from scholarly
documents
If link leads to an HTML page,
parsed the page for DOIs
using BeautifulSoup and regex
If the link leads to a PDF, parse
XML file generated by GROBID
machine learning package for
DOIs
Steven B. Abramson
Marc Triola
Obtained metadata through DOI
lookup service
Parse response using regex to
retrieve DOI metadata
(including article title, authors,
publisher, etc.)
Analyzed data and created
figures using GraphLab and
Tableau
Requested Eigenfactor and
open-access information from
Dr. Jevin West for each DOI
Look at any anomalies or
interesting trends in the data
and figures
Combine the NYT and link data
by concatenating all NYT
metadata into one table using
Graphlab
Sorted out nested data structures and extract relevant
information
Join DOIs to each NYT article
via the links from which the
DOIs were retrieved
Merge all metadata retrieved
from DOI lookup service into a
single table
Reconcile inconsistent
field-names and missing
values
Join the metadata received
from Dr. West into the table
Compute unigram set for each
NYT article using body-text
and filter by calculating
Term-Frequency-Inverse-Document-Frequency score
Iteratively train topic models on
the filtered unigrams using
Graphlab, adjusting parameters as needed
DAVID STIPP
Mary E. Klingensmith
Ralph W. Aye
Lynn Buckvar-Keltz
David V. Feliciano
Daniel B. Jones
Sharona B. Ross
Victoria Harnik
Michael M. Awad
José M. Martinez
Rebecca M. Minter
Stephen D. Wexner
Fritz Francois
Melvin Rosenfeld
Bruce D. Schirmer
Maurice E. Arregui
Morris E. Franklin
PAULINE W. CHEN, M.D
PAULINE CHEN, M.D
Jamie J. Coleman
Dianna Jacob
Karen Hinckley Stukovsky
S. Mohlenkamp
R. W. Allen
Timothy V. Larson
Damiano Baldassarre
Ann Rojas-Cheatham
Merce Medina
A. V. Diez Roux
Paul Fischer
A. Bhatnagar
A. H. Auchincloss
Amy H. Auchincloss
Carrie Breton
S. Katharine Hammond
Alicia Bourne
Simon Hales
Bernardo Beckerman
Masayuki Yokode
Christoph Kessler
Petros Koutrakis
Martha E. Fay
Raquel Garcia-Esteban
Toshinori Murayama
Cuno S. Uiterwaal
P. D. Sampson
David Hardie
Joseph F. Polak
A. Schmermund
Duncan Thomas
Sverre Vedal
Cynthia L. Curl
Virissa Lenters
Steven Shea
Lee L. Swanstrom
Grace S. Rozycki
Samer G. Mattar
Adnan A. Alseidi
Andrew W. Correia
Marie S. O\'Neill
Anjum Hajat
Michael Nonnemacher
Thomas J. Esposito
Rafael Rivera
Robert I. Grossman
Academic Author
Alison D. Schecter
Patrick Kinney
Yun Wang
D. Rohan Jeyarajah
M. A. Mittleman
Chunli Quan
A. V. Diez-Roux
Ulf de Faire
K.-H. Jockel
Thomas Lumley
D. R. Jacobs
M. Memmesheimer
J. R. Brook
R. P. Tracy
Xavier Basagaña
A. Stang
B. Astor
Timothy Gould
Karol Watson
Yasuharu Niwa
Pathmaja Paramsothy
A. Navas-Acien
Jeffrey H. Sullivan
Morton Lippmann
Laurie LaBree
K. Mann
Bert Brunekreef
Adam A. Szpiro
Eric
de
Groot
Howard
N.
Hodis
S. L.-J. Liu
Annette Peters
N. Solano
J. D. Kaufman
Robert Steinbrook
James H. Ware
C.
Shanley
Bryan Forrester
R. Graham Barr
Ximei Jin
Linda Lam
Bo Lu
J. I. Rotter M. A. Cohen
B. Hoffmann
Stefan Möhlenkamp
B. Mausbach
Lung Chi Chen
Mio Yamashita
Naoharu Iwai
Gerard Hoek
Sanjay Rajagopalan
Matthew Budoff
D. W. Coon
Benjamin
G.
Ferris
C. A. Pope
Ruth O’Hara
Bryan Pogue
Alan F. Schatzberg
A. A. Szpiro
Kam-Mei Lau
Jess Leung
Jeremy A. Sarnat
Qinghua Sun
Weiling Liu
Aixia Wang
Miriam Fuchsluger
Michael Memmesheimer
Sa Liu
Marcus Bauer
Nino
Künzli
Knut
Kröger
Y. Santalucia
Robert Reiser
J.
Kaufman
Wendy J. Mack
F. Holguin
Ray Chan
G. L. Burke
D. Leone
Daniel E. Jimenez
R. V. Luepker
DEBORAH BLUM
Matthew Allison
H. L. Gray
Gary Mallach
C. L. Curl
Hui-Qi Tong
Ana V. Diez Roux
Francesca Dominici
A. Mollina
Martin A. Cohen
Joel D. Kaufman
Ruth M. O’Hara
Douglas
W.
Dockery
Karl-Heinz
Jöckel
Tamiko Eto-Iwase
Edward
Avol
J. M. Holland
Jing Shiang Hwang Paul Sampson
Qiang Li Andreas Stang
Sara D. Adar
Delores Gallagher-Thompson
D. W. Durkin
Landon Myer
Daniel Jimenez
Didier Moatti
Man Kin Lai
Xavier Basagana
Barbara Hoffmann
Rashmi
Gupta
Frank
Gilliland
Susanne
Moebus
T.
V.
Larson
Lianne Sheppard
Kevin Chan
Natalie Rasgon
Helen H. Suh
C. Arden Pope
Armin Azar
Brent T. Mausbach
Kateryna Fuks
A. V. D. RouxN. Dragano
Ryan W. Allen
J. Long
S. D. Adar
Garnet L. Anderson
Christine Moran
Ana V. Diez-Roux R. A. Kronmal
Ladson Hinton
A. P. Spira
Melen McBride
Axel Schmermund Raimund Erbel
Stephanie von Klot
Paul D. Sampson
Peng-Chih Wang
John Peters
Rebecca Peng
NICHOLAS BAKALAR
R. Erbel
Michael A. Cucciare
Patrick Leung
Kala Mehendra Mehta
T. Lumley
A. M. Casillas
R. G. Barr
Kristin A. Miller
Meng-kong Wong
M. L. Daviglus
Michiel L. Bots
D. E. Bild
A. Peters Julia Dratva
N. Lehmann
Nico Dragano
Shelli R. Kesler
Yuan Marian Tzuang
L.W. Thompson
Ann Bilbrey
Peibin Yue
Robert P. Reiser
Majid Ezzati Michael Jerrett
Joseph M. Currier
Mianhua Zhong
D. Gallagher-Thompson
Johanna Penell
Anne Ho
Danielle China
Karen D. Stukovsky
Dolores Gallagher Thompson
Hermann Jakobs
John D. Spengler
H. Kyriazopoulos
Rita I. Kirk
S. C. Smith
Bernardo Beckermann
Kristen Shepherd DOUGLAS QUENQUA
Sapphire Li
Renee M. Marquett
Rob Beelen
Susan S. Swan L.-J. Sally Liu
Dolores Gallagher-Thompson
Heather
L.
Gray
D.
H.
O\'Leary
Alma Au
M. Rothkopf
ABBY ELLIN
Jochen Seissler
L. Chen
N. Carragher
Laura Perez L. Whitsel
Lea Liviakis
Gina S. Lovasi
Robert Detrano
Kristine Yaffe
Terry GordonXiping Xu
T. G. Franklin
L. Sheppard
JUDITH GRAHAM
Aleksandra Stepanenko
Yumiko Hiura
NICHOLAS WADE
Andrew Futterman
S. A. Beaudreau
M. J. Budoff
K. L. Thompson
L. W. Thompson
Jason M. Holland
David R. Jacobs
S. Moebus
A. F. Schatzberg
B. Draper
Nathan
D.
Wong
J.
H.
Stein
S. E. Straus
Y. HongMichael H. Criqui
Larry W. Thompson
T. Raghunathan
KAREN WEINTRAUB
D. Siscovick
Hugh Davies
Patrick J. Brown
H. Kraemer
Larry Thompson
Vinnie Cheung
Jaume
Marrugat
R. J. Tiongson
D. S. Siscovick
Frank E. Speizer
T. Arguëlles
Robert V. Tauxe
Pey-Chyou Pan
John Di Mario
Roberto Elosua
R. D. BrookS. Rajagopalan
Hui-jing Lu
ANAHAD O\'CONNOR and KAMARA SWABY
D. Gandell
David S. Siscovick
Eric D. Peterson
Georgina Charlesworth
S. M. H. Alibhai
Philip J. Atkinson
Man-Kin LaiY. G. Rabinowitz
Teri L. Hernandez
L.
Nichols
Keith Sudheimer
V. RozalskiLarry E. Beutler
M. Bundookwala
D. M. Lloyd-Jones
Frederick J. Angulo
John A. Painter
V.
Tsui
Henry Brodaty
RICHARD A. FRIEDMAN, M.D
Yaron G. Rabinowitz
Margaret D. Carroll
Joseph B. Tomlins
Robert M. Hoekstra
Fu-Chen Chen
Martin I. Meltzer
Christopher R. Braden
TARA PARKER-POPE
Heather Gray
Eunice Rodriguez
MELANIE WARNER
D. G. Thompson
J. DeCoster
Robert H. Eckel
Andrés Losada
Lynn C. Waelde
Patricia M. Griffin
Ruth M. O\'Hara
Tracy Ayers
HARRIET BROWN
Nigel Field
Annecy Majoros
Bruno Kajiyama
F. Sun
M. Rubert
J. Keeler
Edward A. Gill
J. E. Manson
JAN HOFFMAN
DONALD G. McNEIL Jr
Paul Lichtenstein
D. Mozaffarian
W. E. Haley
Nagalingeswaran Kumarasamy
DANIELLE OFRI, M.D
Julian Montoro-Rodriguez
David W. Coon
B. T. Mausbach R. O\'Hara
Wai-Chi Chan
For each DOI, send a request
to http://dx.doi.org for a
response in turtle format
Journalists
JAMES GORMAN
ALAN SCHWARZ and SARAH COHEN
BARRY MEIER
PAM BELLUCK
Mary Ann Hopkins
Frank R. Lewis
Use ontology query to retrieve
web URLs of entities which
have the “scholarly publication”
attribute
Generate topic groupings for
the NYT articles
Number of Citations vs. Average Eigenfactor
Avg. Eigenfactor of Papers Cited
How can we use authorship and citations to better
understand information diffusion between popular
media and academic articles for the purpose of
informing the general public as well as the academic
community?
Eigenfactor is a metric used to measure the influence of academic publications: by employing a similar algorithm to PageRank, the influence of each article is not merely determined
by the number of citations it receives, but also by the influence
of the papers which cite it. By plotting the number of academic citations in a New York Times article against the average eigenfactor of those citations we show a significant positive correlation (Pearson's r = 0.31, p < 0.0001).
L. D. Burgio
Yea-Ing ShyuAlan Schatzberg
Rodney U. Anderson
JANE E. BRODY
M. M. Hilgeman
Benoît G. Bardy
PETER ANDREY SMITH
GINA KOLATA
Meir J. Stampfer
Sebastien Villard
Mari Tervaniemi
Sidney C. Smith
Alberto Ascherio
Thomas A. Stoffregen
GRETCHEN REYNOLDS
Yawen Yu
I-Min Lee
DENISE GRADY
ANAHAD O\'CONNOR
Peter C Gøtzsche
Kristy Lee
Elena Losina
J M Ordovás
Diana M. Thomas
Andrea Z. LaCroix
Johanna Rengifo
Sylvain Moreno
C. A. Depp
Renee
Marquett
ALAN SCHWARZ
E. C. Saenz
Johan Auwerx
Timothy Sawyer
P Gómez-Abellán
R. S. Allen
J. M. Donelan
Rochelle P. Walensky
Philip Greenland
Michelle M. Mielke
Kenneth H. Mayer
Y-C Lee
Kevin R. Fontaine
Pere Puigserver
PERRI KLASS, M.D
Mary M. Machulda
F A J L Scheer
Kenneth A. Freedberg
Maren Schmidt-Kassow
Ronald C. Petersen
JILL WERMAN HARRIS
JoAnn E. Manson
Rosebud O. Roberts
J J Alburquerque-Béjar
PAULA SPAN
David B. Allison
David S. Knopman
Andrew S. Kern
Abdullah A. Al Rabeeah
Phyllis C. Zee
Susan J Shepherd
David Wise
WILLIAM L. HASKELL
Kelly G. Baron
Gavin M. Bidelman
Paul T. Williams
Kathryn J. Reid
Alejandro Lucia
Thomas E. Young
KATHERINE BOUTON
Arthur
F.
Kramer
LAURIE TARKAN
P. H. R. Green M. M. WalkerJ. A. Murray
M Garaulet
A. Drewnowski
Richard J. Shaw
Q Yang
Conrad P. Earnest
KENNETH CHANG
Michelle
W.
Voss
Benson
Silverman
Nina Kraus
LAURA GEGGEL
O. Berenfeld
D. S. Sanders
T Oliver
Amber Thornton-Bullock
A. Fasano Jane G MuirM. Hadjivassiliou
Mark J. Travers
ASHLEY TAYLOR
Michelle T. Bover Manderski
J. N. Leonard
K. Kaukinen
Pooja S. Tandon Paula Lozano
ABIGAIL ZUGER, M.D
Natasha Sokol
DAVID DOBBS
J. F. Ludvigsson
CATHERINE SAINT LOUIS
W. Edryd Stephens
F. Zingone
Rosalie V. Caruso
C. Ciacci
Jennifer Beal
SABRINA TAVERNISE
D. A. Leffler
Annette M. Hartmann
Cristine D. Delnevo
Teodor T. Postolache
SOPHIE EGAN
M. Justin Byron
C. P. Kelly Jessica R Biesiekierski
Sang E. Lee
Ziad A. Memish
Dimitri A. Christakis
Mary Hrywna
Cristine Delnevo
Patricia Langenberg Michelle T. Bover-Manderski Jane A. Allen
Olaoluwa Okusaga
James D Doecke
Peter M Irving
Ina Giegling
Jennifer Cullen
Paul Mowery
K. E. A. Lundin
Evan D Newnham
Jodeanne Bellant
Peter R Gibson
Chuan Zhou
K Lewis
Melissa Haines
C W WoodsKarl Klontz
J. C. Bai
Jacqueline S BarrettF. Biagi
BENEDICT CAREY
Gloria Reeves
Robert H. Yolken
Aamar Sleemi
Leonardo H. Tonelli
Bettina Konte
LAURA BEIL
RONI CARYN RABIN
JESSICA NUTIK ZITTER, M.D
Richard J. O’Connor
Dan Rujescu
The figure above is a network of all authors in our data set. The blue nodes are academic paper authors, and the green nodes are New
York Times journalists. Each line coming from the journalist node is a citation to an academic article. This network only contains nodes
that have received more than 5 citations (academic), and nodes that cite more than 5 articles (New York Times). The size of the nodes
are determined by the amount of citations they have received or papers they have cited.
The journalists Deborah Blum and Nicholas Bakalar are shown to have cited many of the same academic articles (shown on the top left
of the network). Similarly, the journalists Abby Ellin and Judith Graham are also shown to cite many of the same academic authors. Journalists in the center of the network have citations to authors all over the network, and do not seem to overlap too much with any of the
other journalists. The journalists at the bottom of the network have a similar amount of citations as many of the other journalists, but are
shown to have cited a fewer amount of academic authors. This could mean that they have cited a smaller sample authors on several occasions, or that they cited many different authors less than five times. Journalists on the edges (with no connections to academic authors) do cite more than five authors, but do not cite those authors more than five times.
Special thanks to Jevin West and Emma Spiro of the University of Washington’s DataLab for guiding us along in our project