Panama Papers: Tools to Investigate Data

Transcription

Panama Papers: Tools to Investigate Data
Panama Papers: Tools to Investigate Data
Matthew Caruana Galizia & Mar Cabra
http://bit.ly/icijplatformseijc16
grave face
photo of the editor’s kids
no computer in sight
heaps of papers
The difference now is our tools and applications.
Four years ago...
260 GB
Nuix (to search documents locally)
Forum I (Fudforum, implemented by Sebastian Mondial)
Forum II (Vanilla, implemented by Chris Zubak-Skees)
Interdata (DTSearch, implemented by Duncan Campbell & Matt Fowler)
Offshore Leaks Database (done with La Nación’s data unit in Costa Rica)
The most popular product of the past years at ICIJ (and CPI)
Let’s build
a stack!
controlling application
ocr engine
blacklight
file to text conversion
index
web server
operating system
operating system
operating system
Open source
*first*
Who are
our users?
Skills
Needs
The developer
Knows all about data
(France)
The “Watergate-type reporter”
Investigated the President
(Paraguay)
What are our
needs?
Communicate
Search
documents
3 million files
x
10 seconds
per file
=
1 year
queue
35 machines extracting text from files
index
1 year
÷
35 machines
=
11 days
Scanned document:
Extracted text:
Discover beneficial owners
Visual is good
(for reporting)
MAGIC!!
●
●
●
I click on “dots” and I find stories!
I discover stories thanks to fuzzy searching
Find shortest path
Wow!
●
●
●
●
Cypher queries
Public widgets
API
https://offshoreleaks.icij.org with download in CSV and Neo4j
MATCH (a:Officer),(b:Officer)
WHERE a.name CONTAINS 'Smith'
AND b.name CONTAINS 'Grant'
MATCH p=allShortestPaths((a)-[:OFFICER_OF|:INTERMEDIARY_OF|:REGISTERED_ADDRESS*..10]-(b))
RETURN p
LIMIT 50
Next steps
entity name recognition
From: Igor Czernecki
Sent:
To: Mossack Fonseca & Co. (Attorneys-at-Law)
Cc: Saran Harris
Subject: Re: Payment instruction on the basis of lease agreement
Dear Mrs Rogers,
I would like Dagar to write an invoice for;
GEMINI HOLDING Sp. z o.o.
3/7 Friedleina Street, 30-009 Krakow, Poland
datashare
[email protected]
[email protected]
Thanks!
http://bit.ly/icijplatformseijc16