The Impact of Always-on Connectivity for Geospatial

Transcription

The Impact of Always-on Connectivity for Geospatial
WHITEPAPER
The Impact of Always-on
Connectivity for Geospatial
Applications and Analysis
DATE: April 2016
Table of Content
Devices, Computing and Connectivity Converge .................................................................................................................. 3
Geospatial Analytics and Transportation...................................................................................................................................... 4
An Esri Take on New York’s Taxi Data ............................................................................................................................................. 7
Zoomdata TaxiStats ........................................................................................................................................................................................ 9
About MemSQL Geospatial .................................................................................................................................................................. 11
2
Devices, Computing and Connectivity
Converge
In the past ten years technology shifts have re-crafted the geospatial applications and analytics
landscape.
• The iPhone and Android ecosystems have fostered a world where almost everyone is a
beacon of information;
• Large scale computing capabilities have provided companies like Google and Facebook
the ability to keep track of billions of things, and companies like Amazon and
Microsoft are making similar computing power available to everyone;
• Global Internet coverage continues to expand, including innovative programs with
balloons and solar powered drones.
These trends are causing billion dollar shifts in the mapping and geospatially-oriented
industries, for example:
In August 2015, a consortium of the largest German automakers including Audi, BMW,
and Daimler (Mercedes) bought Nokia’s Here mapping unit, the largest competitor to
Google Maps, for $3.1 billion.
In addition to automakers like the German consortium having a stake in owning and
controlling mapping data and driver user experiences, the largest private companies, like
Uber and Airbnb, depend on maps as an integral part of their applications.
Source: VentureBeat
In this paper, we’ll examine several showcase applications that demonstrate modern geospatial
capabilities of an in-memory approach. In particular, we’ll focus on transportation.
3
Geospatial Analytics and Transportation
Uber has shown the world what is possible when capitalizing on the trends we called out
earlier: ubiquitous mobile phones, computing capabilities, and connectivity. In late 2015, Uber
announced it has server 1 billion rides, and in early 2016 it was operating in 400 cities across
68 countries1
Uber began when its co-founders were unable to get a taxi one evening, but the frustration
was impactful knowing they held GPS-capable computers in their pocket and there was likely a
labor and asset pool capable of filling the taxi gap.
Of course what makes Uber stand out today it its ability to link millions of riders and
corresponding drivers quickly, accurately, safely, and effortlessly. It is hard to discount this as
anything but a game changer.
While Uber data is not available for the world to see, we are fortunate to be able to get a small
sense of the kind of information involved with the release of taxi data from the New York City
Taxi Commission
MemSQL Supercar
Real-time geospatial capabilities in MemSQL identify the geographic location and
characteristics of natural or constructed features and boundaries, and the objects that reside
or move within them. For mobile, transportation and logistics, having instant access to realtime geospatial data can mean greater visibility into smart device application use, fuel
efficiency, global supply chains and real-time inventory management. Industries gain true
competitive advantage when business-critical decisions can be made as quickly as the data is
captured.
The demonstration, titled Supercar, makes use of a dataset containing the details of 170
million real world taxi rides. By sampling this dataset and creating real-time records while
simultaneously querying the data, Supercar simulates the ability to monitor and derive insights
across hundreds of thousands of objects on the go.
1
http://expandedramblings.com/index.php/uber-statistics/
4
By natively integrating geospatial datatypes in its relational database, MemSQL enables simple
queries to derive informative results. The queries available in Supercar include:
• How many riders did we serve?
• What was the average rider wait time?
• What was the average trip distance?
• What was the average trip time?
• What was the average price/fare?
Simple Queries With Native Geospatial Intelligence
The demonstration uses the developer-focused mapping platform from Mapbox and combines
simple SQL queries generated on the fly. For example, users can pan across the map and zoom
in to specific sections which creates an area in which they can then run the query.
One query example for passenger count is shown below. The coordinates of the polygon were
removed for simplicity sake, but in practice represent the latitude and longitude of the four
corners of the visible map area.
5
SELECT SUM(passenger_count) as result FROM trips WHERE
GEOGRAPHY_INTERSECTS(pickup_location, "POLYGON((...))") OR
GEOGRAPHY_INTERSECTS(dropoff_location, "POLYGON((...))")
MemSQL Supercar Real-Time Geospatial Demo
MemSQL Supercar
https://www.youtube.com/watch?v=2txICCLUV-Y
Supercar Technical Details
“Supercar” is a simulation of 50,000 taxis roaming around the New York metro area, picking up
and dropping off passengers. Each vehicle reports its geolocation to the server once a second.
A “trips” thread uses real-world NYC taxi data to create requests for pickups and destinations
at a clip of several hundred per second. A taxi is chosen by performing a
within_distance geospatial query to find the closest 20 available vehicles with the
features the rider asks for (e.g., SUV, carseat, limo). A candidate is chosen at random and the
taxi starts moving to the pickup point. The price of the ride is determined dynamically based
on a geofence query and recent values for supply and demand within that geofence.
Once the rider is dropped off, the taxi performs another query to determine where it is, the
price of that area, and the location of another area with a higher price. Having chosen a likely
place to wait for another fare, it moves toward it.
The “pricing” thread dynamically adjusts local prices for taxi fares once a second. It looks up
recent requests and taxi locations, grouped by the areas they occurred, and bumps the price
of each geofence up or down based on the ratio of supply and demand. A web-based user
interface plots the state of the system and allows the user to run real time analytical queries
against the dataset.
6
An Esri Take on New York’s Taxi Data
From the blog of Mansour Raad, Esri
http://thunderheadxpler.blogspot.com/2015/03/bigdata-memsql-and-arcgis-soi.html,
March 16, 2015
BigData, MemSQL and ArcGIS Interceptors
Last week, at the Developer Summit, we unveiled Server Object Interceptors. They have the
same API as Server Object Extensions, and are intended to extend an ArcGIS Server with
custom capabilities. An SOI intercepts REST and/or SOAP calls on a MapServer before and/or
after it executes the operation on an SOE or SO. Think servlet filters.
A use case of an SOI associated with a published MXD is to intercept an export image
operation on its MapService and digitally watermark the original resulting image. Another use
case of an interceptor is to use the associated user credentials in the single-sign-on request to
restrict the visibility of layers or data fields.
This is pretty neat and being the BigData Advocate, I started thinking how to use this
interceptor in a BigData context. The stars could not have been more aligned than when I
heard that the MemSQL folks have announced geospatial capabilities in their In-memory
database. See, I knew for a while that they were spitballing native geospatial types, but the fact
that they showcased it at Strata + Hadoop World made me reach back to them to see how we
can collaborate.
7
The idea is that since ArcGIS server does not natively support MemSQL, and since MemSQL
natively supports the MySQL wire protocol, I can use the MySQL JDBC driver to query
MemSQL from an SOI and display the result in a map.
The good folks at MemSQL bootstrapped a set of AWS instances with their “new” engine and
loaded the now-very-famous New York City taxis trips data. This (very very small) set consists
of about 170 million records with geospatial and temporal information such as pickup and
drop off locations and times. Each trip has additional attributes such as travel times, distances
and number of passengers. It was up to me now to query and display dynamically this
information in a standard WebMap on every map pan and zoom. What do I mean by “standard”
here, is that an out-of-the-box WebMap should be able to interact with this MemSQL database
without being augmented with a new layer type or any other functionality. Thus the usage of
an SOI. It will intercept the call to an export image operation with a map extent as an argument
in a “stand-in” MapService and will execute a spatial MemSQL call on the AWS instances. The
result set is drawn on an off-screen PNG image and is sent back to the requesting WebMap for
display as a layer on a map.
8
Zoomdata TaxiStats
Real-time Business Intelligence companies like Zoomdata have also shown what is possible with
geospatial analytics.
TaxiStats features a real-time dashboard application with Zoomdata. The simulated pickup and
drop-off data from taxis is streamed into MemSQL as rides complete. The Zoomdata business
intelligence dashboard displays that data as it is collected while exploratory analytics run
simultaneously on the dataset. The dashboard includes:
• Real-time data for pickups by ZIP code on the map, total volume of rides, and rides by
time of day.
• A map and graph that can be filtered to explore and drill down.
• A live stream that can be paused or rewound to examine a specific time period.
9
TaxiStats Showcase Application
Zoomdata TaxiStats
https://www.youtube.com/watch?v=26lfq_qgcRI
10
About MemSQL Geospatial
MemSQL at a Glance
MemSQL is the leader in real-time databases for transactions and analytics. As a purpose built
database for instant access to real-time and historical data, MemSQL uses a familiar SQL
interface and a horizontally scalable distributed architecture that runs on commodity hardware
or in the cloud. Innovative enterprises use MemSQL to better predict and react to
opportunities by extracting previously untapped value in their data to drive new revenue.
MemSQL is deployed across hundreds of nodes in high velocity big data environments. Based
in San Francisco, MemSQL is a Y Combinator company funded by prominent investors
including Accel Partners, Khosla Ventures, First Round Capital and Data Collective. Follow
us @MemSQL or visit at www.memsql.com.
MemSQL Product Architecture
MemSQL combines real-time streaming, database, and data warehouse workloads for subsecond processing and reporting in a single, scalable, easy-to-manage database. Build real-time
applications to instantly respond to dynamic business changes. Bring your data into the light of
day with precision insights, faster decisions, and immediate action.
MemSQL achieves these capabilities through a unique combination of features
A Commitment to the Enterprise
MemSQL has always maintained an enterprise focus, ensuring our database delivers the
maturity and functionality to serve the most demanding workloads.
Full Transactional SQL
MemSQL is a scalable, performant database that retains the time-tested relational properties
of SQL.
Multi-model and Multi-Mode
MemSQL supports multiple data models beyond SQL including key-value, document/JSON,
and geospatial.
11
In-Memory Rowstore and Disk/SSD-based Columnstore
MemSQL features an in-memory row store and a disk/SSD-based column store in a single
database, achieving extremely low latency execution while allowing for data growth.
Distributed Architecture
MemSQL supports a distributed architecture that can scale out on commodity hardware. This
architecture also supports distributed query optimization and execution for the fastest
analytics possible at scale.
Deploy On-Premises or in the Cloud
MemSQL can be deployed on site on commodity hardware, or on any public cloud including
Amazon, Azure, Google, Digital Ocean, Softlayer and others. This provides complete flexibility
for a variety of use cases.
Building Modern Database Applications with MemSQL
In addition to well-understood database models, MemSQL allows you to go beyond what
previous databases or data warehouses were capable of. We’d invite you to consider some of
the following options.
High-Volume Transactional Workloads
MemSQL excels at high volume transactional workloads, including those where real-time
analytics come into play. With MemSQL you can ingest millions of records per second, and run
queries with results accurate to the last transaction.
Data Warehouses with Live Data
In the past, data warehouses were batch-loaded with data after-the-fact. With MemSQL, you
can send live data to the database and run complex analytical queries with ease, all in a nonblocking infrastructure. MemSQL allows you to take an overnight process and turn it into a
continuous process.
Real-Time Data Pipelines with Apache Kafka and Spark
MemSQL Streamliner supports modern streaming workloads using the power of Apache Spark,
and enables our customers to stream, persist, and analyze hundreds of terabytes of data a day
without writing any code. Easily connect to Apache Kafka as a real-time message queue, or use
a custom extract to pull data from your preferred source.
12
MemSQL Geospatial
Starting with MemSQL 4, geospatial functions are now part of the database. This includes the
three main object types of polygons, paths, and points.
Polygons
Paths
Points
For a complete reference of MemSQL geospatial functions, please refer to
http://docs.memsql.com/latest/concepts/geospatial/
With the advent of mobile phones, ubiquitous computing, and global internet connectivity,
nearly every data point has a place. As such, geospatial analytics is becoming more important
than ever.
In particular, the scale and size of emerging geospatial datasets demands a similarly scalable
database. MemSQL, through its distributed architecture and support of geospatial functions
fits this demand perfectly.
Future Geospatial Developments
As geospatial demands increase, MemSQL plans to support them. This includes making
geospatial functions and data types first class citizens for real-time data pipelines, and the
expansion of more models and a broader range of queries.
For more information please visit www.memsql.com
13