Maple – a Web Map Service for Verbal Visualisation Using Tag

Transcription

Maple – a Web Map Service for Verbal Visualisation Using Tag
Maple – a Web Map Service for Verbal
Visualisation Using Tag Clouds Generated from
Map Feature Frequencies
Stefan Hahmann and Dirk Burghardt
Dresden University of Technology, Institute for Cartography, Helmholtzstraße 10, 01069
Dresen, Germany
Email: [email protected], [email protected]
Abstract
Tag cloud visualisation has been introduced in the seventies. As it is eye-catching
and engaging, it is used in many web applications and has become a very popular
visualisation technique today. This paper presents an approach that uses this technique in combination with maps. Our method augments cartographic representations with additional verbal content, which is one of the strongest instruments
available to cartographers to communicate spatial information. The idea is that
only few words extracted from the semantics contained in the features of the underlying map are suitable to characterise the map section as a whole. To demonstrate the approach we used the OpenStreetMap dataset. In order to allow a variety
of web map clients to use the results of the method, we realised the prototype by
implementing it as a Web Map Service (WMS) based on the according Open Geospatial Consortium (OGC) specification.
Keywords: tag cloud, word cloud, web mapping, geovisualisation, knowledge
representation, exploration, semantics, Web Map Service, OpenStreetMap
1 Introduction: History of Tag Clouds and Related Work
More than 30 years ago, Milgram and Jodelet (1976) introduced the technique of
drawing words at different font sizes to present a visual overview of a text corpus
that emphasizes certain words. The results of their work on a “collective mental
map of Paris” are shown in Figure 1. The method they used may be seen as the
origin of “tag cloud visualisation”.
2
Fig. 1 Introducing tag clouds to cartography: Milgram and Jodelet’s “collective mental map of
Paris” (Milgram and Jodelet 1976)
Fig. 2 Web application World Explorer:1 Landmarks automatically generated from clustered
Flickr photo locations and their assigned tags. Bigger labels indicate bigger cluster sizes
In the age of computer visualisation, this technique has become very popular, especially with the rise of Web 2.0. A general overview of different layout algorithms for the generation of tag clouds is given by Viégas and Wattenberg (2008).
Several researchers have recently published on tag cloud visualisations for cartographic contexts.
The approach of Ahern et al. (2007) exploits the public Flickr2 photo collection.
Tagged images with assigned GPS coordinates are used to compute clusters of
popular tags, i.e. regions with a disproportionately high density of photos and the
dominant tag of each such region are calculated. The tag label of each cluster is
placed centred at the centroids position of a cluster and the label size is scaled according to the size of the cluster. Figure 2 shows results of the application World
Explorer, which implements this approach. A cartographic base map is overlaid
with the computed labels, which results in a map of what we would call popular
landmarks. We have chosen this name because the labels are based on places most
1
2
http://tagmaps.research.yahoo.com/worldexplorer.php.
http://www.flickr.com.
3
Fig. 3 Visualisation of geospatial and nongeospatial context information using tag cloud
visualisations generated from geo-referenced
German Wikipedia articles, modified after Paelke et al. (2010)
Fig. 4 The Taggram method – a layout algorithm that adapts the shape of a tag cloud to an
arbitrary geometric region (Nguyen and Schumann 2010)
frequently photographed by the users of the Flickr photo platform. The approach
may be used on multiple scales. Results are similar to those of Milgram’s collective mental map with a bias towards tourist attractions.
Paelke et al. (2010) use content of geo-referenced Wikipedia articles to represent context information on maps. They compute tag cloud visualisations from articles that can be located within a specified map section via the coordinates given
in the article. Figure 3 shows a result of this work. The benefit of this approach is
its potential to show geospatial as well as non-geospatial context information. It
can be seen, for example, that the terms “Friedhof” (German for “cemetery”) and
“Weltkrieg” (German for “World War”) appear in the same tag cloud and will
thus be associated with each other by the user of the application.
Nguyen and Schumann (2010) present a layout algorithm for tag clouds that
adapts the shape of the cloud to an arbitrary geometric region. Figure 4 shows a
result of this so-called taggram method.
2 The Approach: Using OGC WMS standard for On-Map Word
Cloud Visualisation
The objective of the approach we present in this paper is to verbally visualise the
main semantic information that is contained within a map. For this purpose we use
the word cloud technique as an information analysis method in combination with
maps. This method makes use of adding verbal content to the map, as this is one
of the most powerful communicational resources available to cartographers.
4
Fig. 5 Section of the OpenStreetMap database schema: Node, Way and Relation represent geometric entities that can be annotated with zero to n Tag entities specifying their semantics
Fig. 6 A Flowchart of the process that generates the word cloud layer
Fig. 7 A word cloud highlighting frequently used terms within the titles of the presentations held
at ICC 2009
5
In the introductory section we have already described methods for the mapping of
landmarks and context using tag visualisation techniques. In this section we shift
the focus towards visualisation of semantic information. The approach is to
analyse the frequency of the semantics of the map features contained in a map.
These frequencies are shown on a map using word clouds. The idea is that most
frequent semantic information within a specified area is well suited to characterise
the semantics of a cartographic dataset within this area as a whole. To what extent
the cartographic model is describing the real world the approach can verbally
visualise the characteristics of the real world within this area.
For the demonstration of our approach we use the OpenStreetMap (OSM) dataset. Figure 5 shows the section of the OSM database schema that models geometric objects and their semantics. In fact the OSM tags are the semantics of the dataset. OSM tags consist of a key-value pair and specify map features. They are
linked to geometric objects. Each geometric object can be associated with a multitude of tags that specify its meaning. Tags can have references to points (table
node), lines and polygons (table way) or complex objects (table relation).
Figure 6 shows a flowchart of the implementation of the application that overlays an OSM base map with a word cloud processed for the current map extent.
OpenLayers serves as a WMS client and the OSM Mapnik layer as a base map.
The map client requests the server by sending a getMap request that conforms to
the Open Geospatial Consortium Web Map Service specification (Open GIS Consortium 2001). The server queries a mirrored OSM database, which results in a list
of tags and tag frequencies within the bounding box of the getMap request. In the
case of point objects, tag frequency increases with every occurrence of this tag in
conjunction with a point object. In the case of line and polygon objects a tag frequency, rises with the number of vertices of line and polygon objects that are associated with this tag.
OpenLayers, which we used as a WMS client, allows switching between different layers. In the communication with the WMS server, switching between different layers is realized by the layer parameter of the getMap request, which allows
the client to specify which keys and/or values will be presented by the overlaid
word cloud. As some keys such as “created_by” and “address” occur with disproportionately high frequency, we added the possibility to delete certain tags from
the tag frequency list. The filtered list is then processed by the word cloud layout
software. The produced image is sent back to the client as the getMap response.
The algorithm which is used to layout the word clouds was described by Viégas
et al. (2009). An implementation is available via an executable version3 of the engine that drives the popular word cloud visualisation website wordle. 4 Figure 7
shows the result of a word cloud visualisation of the titles of the presentations held
at ICC 2009. Maple – the name of the application – is an adaption of the name
wordle to the context of cartographic maps.
3
4
Available at: http://www.alphaworks.ibm.com/tech/wordcloud/download.
Wordle – Beautiful Word Clouds, available at: http://www.wordle.net.
6
3 Resulting Maps and Discussion
Figures 8, 9, 10 and 11 show the results of the implementation. Figure 8 presents
a map and an overlaid word cloud, which is computed for the area of the
“Neustadt” district of the German city of Dresden. It shows the frequency of
occurrences of tag values associated with map features having the key “amenity”.
The fact that Dresden Neustadt is a nightlife district is quickly deducible even for
a map user who does not know the area, because “pub” and “restaurant” show up
in a big font size.
Frequencies of values of map features with the key “highway”, which includes
all types of streets and footways, are shown in Figure 9 for an area in southern
Dresden. As the tags “footway” and “residential” appear very large, it is obvious,
that this OSM section provides very much detail even on public footpaths as it
represents a mostly residential area.
A word cloud processed from the frequencies of the keys within the centre of
the German city of Leipzig is contained in Figure 10. It illustrates why the OpenStreetMap project really deserves StreetMap being a part of its name. The tags
“highway” and “name” that normally co-occur on street features are the most
prominent keys within this cloud. It has to be mentioned that the keys “created_by” and “address” have been filtered. The tag “railway” is also big as Leipzig’s main station is within this map extent.
Figure 11 shows a map overlaid with the names of the OSM editors that have
been used within this area. It is remarkable that there are considerable differences
between different regions as different local OSM mapping communities seem to
prefer different OSM editors to edit data. These results may turn out to be interesting for the mapping community especially for the development and documentation
of OSM edit tools.
The tag cloud visualisation method allows the analysis of both object types and
object values. Figure 10 is an example for object type visualisation and Figures 8,
9 and 11 are examples for object value visualisation.
Table 1 shows typical computing times for word cloud processing at different
scales within an area of high density of OSM objects. Test environment was a machine equipped with an AMD Dual Core Opteron 2.6 GHz processor and 1 GB
RAM. Times for processing of the word cloud from the tag frequency list are
nearly scale-independent, whereas times for database queries increase exponentially with decreasing scale. For the prototype, a copy of the part of the OSM database that covers the area of Germany was used. This includes currently just 5%
of all data of the database. However there are still about 40 million entities in table
‘nodes’, 5 million entities in table ‘ways’ and 80.000 entities in table ‘relation’
that need to be queried. Additionally, there are about 8 million entities in table
‘node_tags’, 14 million entities in table ‘way_tags’, and 300.000 entities in table
‘relation_tags’, which need to be analysed for every word cloud request.
Word clouds are intuitively perceptible and by their nature do not suffer from
the labelling problem of bar charts, tree maps or bubble charts. Furthermore they
7
Fig. 8. OSM Mapnik base map and an overlaid word cloud computed from the values of the key
“amenity” in the area of Dresden Neustadt
Fig. 9. OSM Mapnik base map and an overlaid word cloud computed from the values of the key
“highway” in the area of southern Dresden
Fig. 10. OSM Mapnik base map and an overlaid word cloud computed from the OSM keys in
the centre of the city of Leipzig. Keys “created_by” and “address” are filtered
Fig. 11. OSM Mapnik base map and an overlaid word cloud computed from the values of the
key “created_by”, which indicates the tool that was used to edit data, in the centre of Leipzig
8
Table 1. Computational time for word cloud processing at different scales, test environment:
AMD Dual Core Opteron 2.6 GHz, 1 GB RAM
Scale
1:36000
Area (km²) Overall computing time Database query time
(sec)
(sec)
Cloud processing time
(sec)
20
103
95
8
1:18000
5
37
29
8
1:9000
1.25
10
3
7
1:4500
0.32
7
0
7
1:2250
0.08
7
0
7
are able to present the gist of a word corpus. Cons are that long words as well as
words with many ascenders and descenders get undue attention and that it is not
possible to read exact values. The layout algorithm of a word cloud is more sophisticated than the layout algorithm of a tag cloud as it uses the typographical
whitespace more efficiently.
The big advantage of using a standardised WMS implementation is that a multitude of existing WMS clients can directly integrate the results. Our implementation even allows making use of the getMap transparency parameter and hence an
overlay that does not completely hide the examined map is possible.
A disadvantage of the word cloud visualisation displayed directly on a map is
that map readers are used to associate text shown on a map with the directly underlying situation. In the case of an overlaid word cloud that describes the whole
map section, this can lead to misinterpretation. Hence, this method may be more
useful when displaying just one district, whereas for the case of displaying a
whole town, it might not provide significant insights. It would be possible to solve
this issue if the word cloud would not be directly overlaid on the map but would
be shown in a separate space of the application.
Using the tag frequency in relation to vertices within a map area underestimates
the relevance of large objects having few vertices and overestimates the relevance
of small objects having many vertices. This affects lines as well as polygons
which consume much map space with few vertices and accordingly lines and
polygons that consume little map space with many vertices. This bias is relevant
for the cases where the overlaid word clouds are intended to visualise the main
semantic information of a map section like in Figures 8 and 9. Instead of using tag
frequencies, the estimation of relevant tags within the word cloud visualisation can
be improved if length of lines and areas of polygons associated with specific tags
are used as a weight. For use cases where we only want to present statistics of a
dataset like in Figures 10 and 11 the vertex related frequency estimation is sufficient.
9
4 Conclusions and Future Work
We have presented a method that is able to present the main semantic information
included within a certain map section using a word cloud visualisation technique
that visualises map feature frequencies on a map. Up to a certain degree this technique is able to verbally characterise the real world environment presented in a
specific map section. Our demonstration is based on the OSM dataset but the approach is also applicable to other cartographic databases, e.g. data provided by national mapping agencies. Even non-primarily cartographic datasets such as Twitter
could be analysed.
Future work needs to address query performance, especially for the scenario of
a huge global OSM dataset and queries on mid- and small-scale map extents. Furthermore, verbal descriptions of the meaning of the OSM tags could be taken from
the OSM wiki website to produce more vernacular word clouds.
Field (2010) stresses shortcomings in the algorithm of word cloud creation and
recommends the use of a more sophisticated layout method. He argues that relative position and different colours of single words in the word cloud could be used
to group words according to certain attribute dimensions inherent in the data.
Last but not least, an empirical study needs to be carried out to prove whether
map users are able to interpret word clouds overlaid on maps and benefit from this
additional information. A study conducted by Lohmann et al. (2009), which compares different tag cloud layouts with a focus on human task-related performance,
could serve as a starting point.
References
Ahern S, Naaman M, Nair R, Yang J (2007) World Explorer: Visualizing Aggregate Data from
Unstructured Text in Geo-Referenced Collections. 7th ACM/IEEE-CS Joint Conference on
Digital Libraries, Vancouver. ACM Press, New York. pp. 1–10
Dinh-Quyen N, Schumann H (2010) Taggram: Exploring Geo-Data on Maps through a Tag
Cloud-based Visualization. Information Visualisation (IV), 2010 14th International Conference, London. IEEE Computer Society, Los Alamitos. pp. 322–328
Field K (2010) Cartographically Wordy but not Necessarily Worthy. The Cartographic Journal
47(3): 195–197
Lohmann S, Ziegler J, Tetzlaff L (2009) Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration. Gross T et al. (eds.): Human-Computer Interaction –
INTERACT 2009, LNCS 5726, Springer, Berlin / Heidelberg. pp. 392–404
Milgram S, Jodelet D (1976) Psychological maps of Paris. Proshansky HM, Ittelson WH, Rivlin
LG (eds.): Environmental psychology, Second Edition, Holt, Rinehart and Winston, New
York. pp. 104-124
Open GIS Consortium (2001) Web Map Service Implementation Specification (Tech. Rep. OGC
01-068r2), Wayland, MA, USA
Paelke V, Dahinden T, Eggert D, Mondzech J (2010) Location Based Context Awareness
Through Tag-Cloud Visualizations. Joint International Conference on Theory, Data Handling
and Modelling in GeoSpatial Information Science, Hong Kong. pp. 290–295
10
Viégas FB, Wattenberg M (2008) Tag Clouds and the Case for Vernacular Visualization. Interactions: 15(4): 49–52
Viégas FB, Wattenberg M, Feinberg J (2009) Participatory Visualization with Wordle. IEEE
Transactions on Visualization and Computer Graphics 15(6): 1137–1144
11
Copyright of Advances in Cartography and GIScience. Volume 1 is the property of SpringerVerlag and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder’s express written permission. However, user may print, download, or
email articles for individual use.