Accepted Full Paper Abstracts

Emre Eftelioglu, Yan Li, Xun Tang, James M. Kang, Christopher Farah and Shashi Shekhar.
Mining Network Hotspots with Holes: A Summary of Results

Abstract: Given a spatial network and a collection of activities (e.g. crime locations), the problem of Mining Network Hotspots with Holes (MNHH) finds network hotspots with doughnut shaped spatial footprint, where the concentration of activities is unusually high (e.g. statistically significant). MNHH is important for societal applications such as criminology, where it may focus the efforts of officials to identify a crime source. MNHH is challenging because of the large number of candidates and the high computational cost of statistical significance test. Previous work focused either on geometry based hotspots (e.g. circular, ring-shaped) on Euclidean space or connected subgraphs (e.g. shortest path), limiting the ability to detect statistically significant hotspots with holes on a spatial network. This paper proposes a novel Network Hotspot with Hole Generator (NHHG) algorithm to detect network hotspots with holes. The proposed algorithm features refinements that improve the performance of a naive approach. Case studies on real crime datasets confirm the superiority of NHHG over previous approaches. Experimental results on real data show that the proposed approach yields substantial computational savings without reducing result quality.

Cici Alexander, Lars Arge, Peder Klith Bøcher, Morten Revsbæk, Brody Sandel, Jens-Christian Svenning, Constantinos Tsirogiannis and Jungwoo Yang. Computing River Floods Using Massive Terrain Data

Abstract: Many times in history, river floods have resulted in huge catastrophes. To reduce the negative outcome of such floods, it is important to predict their extent before they happen. For this reason, specialists use algorithms that model river floods on digital terrains datasets. Nowadays, massive terrain datasets have become widely available. As flood modeling is an important part for a wide range of professions, it is crucial to process such datasets fast even with standard computers. Yet, these datasets can be several times larger than the main memory of a standard computer. Unfortunately, existing flood-modeling algorithms cannot handle this situation efficiently. Hence they have to sacrifice output quality for time performance, or vice versa.

In this paper, we present a novel algorithm that, unlike any previous approach, can both provide high-quality river flood modeling and handle massive terrain data efficiently. More than that, we redesigned an existing popular flood-modeling method (approved by European Union and used by authorities in Denmark) so that it can efficiently process huge terrain datasets. Given a raster terrain G and a subset of its cells representing a river network, both algorithms estimate for each cell in G the height that the river should rise for the cell to get flooded. Based on our design, both algorithms can process terrain datasets that are much larger than the main memory of a computer. For an input raster that consists of N cells, and which is so large that it can only be stored in the hard disk, each of the described algorithms can produce its output with only O(sort(N)) transfers of data blocks between the disk and the main memory. Here sort(N) denotes the minimum number of data transfers needed for sorting N elements stored on disk. We implemented both algorithms, and compared their output with data acquired from a real flood event. We show that our new algorithm models the real event quite accurately, more accurately than the existing popular method. We evaluated the efficiency of the algorithms in practice by conducting experiments on massive datasets. Each algorithm could process a dataset of 268 GB size on a computer with only 22 GB working main memory (twelve times smaller than the dataset itself) in at most 31 hours.

Kwangsoo Yang. Distance-Constrained k Spatial Sub-Networks: A Summary of Results

Abstract: Given a graph and a set of spatial events, the goal of Distance-Constrained k Spatial Sub-Networks (DCSSN) problem is to find k sub-networks that meet a distance constraint and maximize the number of spatial events covered by the sub-networks. The DCSSN problem is important for many societal applications, such as police patrol assignment and emergency response assignment. The problem is NP-hard; it is computationally challenging because of the large size of the transportation network and the distance constraint. This paper proposes a novel approach for finding k sub-networks that maximize the coverage of spatial events under the distance constraint. Experiments and a case study using Chicago crime datasets demonstrate that the proposed algorithm outperforms baseline approaches and reduces the computational cost to create a DCSSN.

Benjamin Niedermann and Martin Nöllenburg. An Algorithmic Framework for Labeling Road Maps

Abstract: Given an unlabeled road map, we consider, from an algorithmic perspective, the cartographic problem of placing non-overlapping road labels embedded in the roads. We first decompose the road network into logically coherent road sections, i.e., parts of roads between two junctions. Based on this decomposition, we present and implement a new and versatile framework for placing labels in road maps such that the number of labeled road sections is maximized. In an experimental evaluation with road maps of 11 major cities we show that our proposed labeling algorithm is both fast in practice and that it reaches near-optimal solution quality, where optimal solutions are obtained by mixed-integer linear programming. In direct comparison, our algorithm consistently outperforms the standard OpenStreetMap renderer Mapnik.

Takeshi Shirabe. On Distortion of Raster-based Least-cost Corridors

Abstract: Given a grid of cells each having a cost value, a variant of the least-cost path problem seeks a corridor—represented by a swath of cells rather than a sequence of cells—connecting two terminuses such that its total accumulated cost is minimized. While it is widely known that raster-based least-cost paths are subject to three types of distortion, i.e. deviation, distortion, and proximity, little is known about potential distortion of their corridor counterparts. This paper examines a raster model of the least-cost corridor problem and analyses its solution in terms of each type of distortion. It is found that raster-based least-cost corridors, too, are subject to all three types of distortion but in different ways: elongation distortion is always persistent, deviation distortion can be substantially reduced, and proximity distortion can be essentially eliminated.

Benjamin Adams and Mark Gahegan. Exploratory chronotopic data analysis

Abstract: The intrinsic connection between place, space, and time in narrative texts is the subject of chronotopic literary analysis. We take the notion of the chronotope and apply it to exploratory analysis of unstructured big data. Exploratory chronotopic data analysis provides a data-driven perspective on how place, space, and time are connected in large, crowdsourced text collections. In this study, we processed the English Wikpedia text to find all co-occurrences of named places and dates and discovered that times are linked to places in a large majority of cases. We analyzed these millions of connections between places and dates and discovered a number of interesting trends. Because of the scale of the data involved, we suggest that chronotopic data analysis will lead to the development of new data models and methods for geographic information science and related fields, such as digital humanities.

Jan-Henrik Haunert and Wouter Meulemans. Partitioning Polygons via Graph Augmentation

Abstract: We study graph augmentation under the dilation criterion. In our case, we consider a plane geometric graph G = (V,E) and a set C of edges. We aim to add to G a minimal number of nonintersecting edges from C to bound the ratio between the graph-based distance and the Euclidean distance for all pairs of vertices described by C. Motivated by the problem of decomposing a polygon into natural subregions, we present an optimal linear-time algorithm for the case that P is a simple polygon and C models an internal triangulation of P. The algorithm admits some straightforward extensions. Most importantly, in pseudopolynomial time, it can approximate a solution of minimum total length or, if C is weighted, compute a solution of minimum total weight. We show that minimizing the total length or the total weight is weakly NP-hard. Finally, we show how our algorithm can be used for two well-known problems in GIS: generating variable-scale maps and area aggregation.

Qinghan Liang, Silvia Nittel and Torsten Hahmann. From Data Streams to Fields: Extending Stream Data Models with Field Data Types

Abstract: With ubiquitous live sensors and sensor networks, increasingly large numbers of individual sensors are deployed in physical space. Sensor data streams are a fundamentally novel mechanism to create and deliver observations to information systems, enabling us to represent spatiotemporal continuous phenomena such as radiation accidents, pollen distributions, or toxic plumes almost as instantaneously as they happen in the real world. While data stream engines (DSE) are available to process high-throughput updates, DSE support for phenomena that are continuous in both space and time is not available. This places the burden of handling any tasks related to the integration of potentially very large sets of concurrent sensor streams into higher-level abstractions on the user. In this paper, we propose a formal extension to stream data model languages based on the concept of fields to support high-level abstractions of continuous ST phenomena that are known to the DSE, and therefore, can be supported through queries and processing optimization. The proposed field data types are formalized in a data model language independent way using second order signatures. We formalize both the set of supported field types are as well as the embedding into stream data model languages.

Dipto Sarkar, Renée Sieber and Raja Sengupta. GIScience considerations in spatial social networks

Abstract: A proliferation of literature incorporates social network analysis (SNA) to study geographic phenomenon. We argue that these incorporations have mostly been superficial. What is needed is a stronger interrogation of the challenges and possibilities of a tight coupling of spatial and social network concepts, which take advantage of the strengths of each methodology. In this paper, we create a typology of existing research focused on the integration of geography into SNA: nodal, topographic and spatial. We then describe three core concepts that co-exist in the two fields but are not necessarily complementary: distance, communities, and scale. We consider how they can be appropriated and how they can be more tightly coupled into spatial social net-works. We argue that the only way we can move beyond a superficial integration is to holistically identify the challenges and consider new methods to address the complexities of an integration.

Peter Kiefer, Ioannis Giannopoulos, Andrew Duchowski and Martin Raubal. Measuring Cognitive Load for Map Tasks through Pupil Diameter

Abstract: In this paper we use pupil diameter as an indicator for measuring cognitive load for six different tasks on common web maps. Two eye tracking data sets were collected for different basemaps (37 participants and 1,328 trials in total). We found significant differences in mean pupil diameter between tasks, indicating low cognitive load for free exploration, medium cognitive load for search, polygon comparison, line following, and high cognitive load for route planning and focused search. Pupil diameter also changed over time within trials which can be interpreted as an increase in cognitive load for search and focused search, and a decrease for line following. Such results can be used for the adaptation of maps and geovisualizations based on their users’ cognitive load.

Joshua Lewis and Max Egenhofer. Point Partitions: A Qualitative Representation for Region-Based Spatial Scenes in R^2

Abstract: A complete qualitative scene description should be such that it captures the essential details of a configuration so that a topologically correct depiction can be recreated. This paper models a spatial scene through sequences of point partitions, that is, how embedding space and objects are distributed around the intersections of the boundaries of regions. A set 23 base patterns are identified, which suffice to capture complex scenes, including configurations with holes. To demonstrate the diagrammatic depiction of a spatial scene from point partition patterns, such a scene is recreated using the developed model. The paper also provides a means of transitioning between these more complex relations and the coarse relations of the 9-intersection.

Joseph Tuccillo and Barbara Buttenfield. Model-Based Clustering of Social Vulnerability to Urban Extreme Heat Events

Abstract: Geodemographic classification methods are applied to Denver Colorado to develop a typology of social vulnerability to heat exposure. Environmental hazards are known to exhibit biophysical variations (e.g., land cover and housing characteristics) and social variations (e.g., demographic and economic adaptations to heat mitigation). Geodemographic model-based classification permits a more extensive set of input variables, with richer attributions; and it can account for spatial context on variable interactions. Additionally, it generates comparative assessments of environmental stress on multiple demographic groups. The paper emphasizes performance of model-based clustering in geodemographic analysis, describing two stages of classification analysis. In so doing, this research examines ways in which high heat exposure intersects with socioecological variation to drive social vulnerability during extreme heat events. The first stage classifies tract-level variables for social and biophysical stressors. Membership probabilities from the initial (baseline) classification are then input to a second classification that integrates the biophysical and social domains within a membership probability space to form a final place typology. Final place categories are compared to three broad land surface temperature (LST) regimes derived from simple clustering of mean daytime and nighttime land surface temperatures. The results point to several broad considerations for heat mitigation planning that are aligned with extant research on urban heat vulnerability. However, the relative coarseness of the classification structure also reveals a need for further investigation of the internal structure of each class, as well as aggregation effects, in future studies.

Xiaoxiao Liu and Stefania Bertazzon. Fine Scale Spatio-temporal Modelling of Urban Air Pollution

Abstract: Urban air pollution is a leading environmental health concern. However, the association between urban air pollution and health outcomes is not consistently reported in the literature, likely because of inaccurate exposure assessment induced by spatial error. In this study, a spatio-temporal model is presented, which integrates harmonic regression and land use regression (LUR) to estimate urban air pollution at fine spatio-temporal scale. The space-time field is decomposed into space-time mean and space-time residuals. The mean is estimated by linear combinations of harmonic regression components, and the spatial field is modelled with LUR. The residuals account for spatio-temporal deviation from the mean model. Using data from a regulatory monitor network and geographic covariates from a LUR model, the study yields monthly nitrogen dioxide estimates at the postal code level for Calgary, Canada. The model yields a satisfactory fit (R2=0.78). The space-time residuals exhibit non-significant to moderate spatial and temporal autocorrelation.

Arthur van Goethem, Marc Van Kreveld and Bettina Speckmann. Circles in the water: towards island group labeling

Abstract: Many algorithmic results are known for automated label placement on maps. However, algorithms to compute labels for groups of features, such as island groups, are largely missing. In this paper we address this issue by presenting new, efficient algorithms for island label placement in various settings. We consider straight-line and circular-arc labels that may or may not overlap a given set of islands. We concentrate on computing the line or circle that minimizes the maximum distance to the islands, measured by the closest distance. We experimentally test whether the generated labels are reasonable for various real-world island groups, and compare different options. The results are positive and validate our geometric formalizations.

Jiaoli Chen and Shih-Lung Shaw. Representing the Spatial Extent of Places based on Flickr Photos with a Representativeness-Weighted Kernel Density Estimation

Abstract: Geotagged photos have been applied by many researchers to estimate the spatial extent of places. This paper addresses an important challenge of using geotagged Flickr photos to delineate the spatial extent of a vague place, which is defined as a place without a clearly defined boundary. We argue that the variation of location popularity has a great impact on the estimation of such vague spatial extent of a place. We propose an approach to model the representativeness of each geotagged photo point based on its location popularity. A modified kernel density estimation method incorporating the photo representativeness is developed and tested with eight selected places, which cover urban vs. non-urban areas, with vs. without an official boundary cases, and at various spatial scales of state, city and district levels. Our results indicate major improvements of the proposed representatives-weighted kernel density estimation method over the traditional kernel density estimation method in estimating the spatial extent of vague places.

Matt Duckham, Marc Van Kreveld, Ross Purves, Bettina Speckmann, Yaguang Tao, Kevin Verbeek and Jo Wood. Modeling Checkpoint-based Movement with the Earth Mover’s Distance

Abstract: Movement data comes in various forms, including trajectory data and checkpoint data. While trajectories give detailed information about the movement of individual entities, checkpoint data in its simplest form does not give identities, just counts at checkpoints. However, checkpoint data is of increasing interest since it is readily available due to privacy reasons and as a by-product of other data collection. In this paper we propose to use the Earth Mover’s Distance as a versatile tool to reconstruct individual movements or flow based on checkpoint counts at different times. We analyze the modeling possibilities and provide experiments that validate model predictions, based on coarse-grained aggregations of data about actual movements of couriers in London, UK. While we cannot expect to reconstruct precise individual movements from highly granular checkpoint data, the evaluation does show that the approach can generate meaningful estimates of object movements.

Carson Farmer and Carsten Keßler. Hierarchical Prism Trees for Scalable Time Geographic Analysis

Abstract: As location-aware applications and location-based services continue to increase in popularity, data sources describing a range of dynamic processes occurring in near real-time over multiple spatial and temporal scales are becoming the norm. At the same time, existing frameworks useful for understanding these dynamic spatio-temporal data, such as time geography, are unable to scale to the high volume, velocity, and variety of these emerging data sources. In this paper, we introduce a computational framework that turns time geography into a scalable analysis tool that can handle large and rapidly changing datasets. The Hierarchical Prism Tree (HPT) is a dynamic data structure for fast queries on spatio-temporal objects based on time geographic principles and theories, which takes advantage of recent advances in moving object databases and computer graphics. We demonstrate the utility of our proposed HPT using two common time geography tasks (finding similar trajectories and mapping potential space-time interactions), taking advantage of open data on space-time vehicle emissions from the EnviroCar platform.

Tuhin Paul, Kevin Stanley, Nathaniel Osgood, Scott Bell and Nazeem Muhajarine. Scaling Behavior of Human Mobility Distributions

Abstract: Recent technical advances have made high-fidelity tracking of populations possible. However, these datasets, such as GPS traces, can be comprised of millions of records, well beyond what even a skilled analyst can digest. To facilitate human analysis, these records are often expressed as aggregate distributions capturing behaviors of interest. While these aggregate distributions can provide substantial insight, the spatio-temporal resolution at which they are captured can impact the shape of the resulting distribution. We present an analysis of five spatial datasets, and codify the impact of rebinning the data at different spatio-temporal resolutions. We find that all aggregate metrics considered are affected by rebinning, but that some distributions do so regularly and predictably, while others do not. This work provides important insight into which metrics can be used to compare human behavior across datasets and the kinds of relationships between that can be expected.

Krzysztof Janowicz, Yingjie Hu, Grant McKenzie, Song Gao, Blake Regalia, Gengchen Mai, Rui Zhu, Benjamin Adams and Kerry Taylor. Moon Landing or Safari? A Study of Systematic Errors and their Causes in Geographic Linked Data

Abstract: While the adoption of Linked Data technologies has grown dramatically over the past few years, it has not come without its own set of growing challenges. The triplification of domain data into Linked Data has not only given rise to a leading role of places and positioning information for the dense interlinkage of data about actors, objects, and events, but also led to massive errors in the generation, transformation, and semantic annotation of data. Though types of errors range substantially, systematic errors often offer the most interesting research opportunities. This work presents the first comprehensive study of these systematic errors and their potential causes. It also discusses lessons learned and means to avoid some of these issues in the future.

Ashwin Shashidharan, Derek Berkel, Ranga Raju Vatsavai and Ross Meentemeyer. pFUTURES: A Parallel Framework for Cellular Automaton Based Urban Growth Models

Abstract: Simulating structural changes in landscape is a routine task in computational geography. Owing to advances in sensing and data collection technologies, geospatial data is becoming available at finer spatial and temporal resolutions. However, in practice, these large datasets impede land simulation based studies over large geographical regions due to computational and I/O challenges. The memory overhead of sequential implementations and long execution times further limit the possibilities of simulating future urban scenarios. In this paper, we present a generic framework for coordinating I/O and computation for geospatial simulations in a distributed computing environment. We present three parallel approaches and demonstrate the performance and scalability benefits of our parallel implementation pFUTURES, an extension of the FUTURES open-source multi-level urban growth model. Our analysis shows that although a time synchronous parallel approach obtains the same results as a sequential model, an asynchronous parallel approach provides better scaling due to reduced I/O and communication overheads.

Chris Allen, Thomas Hervey, Werner Kuhn, Sara Lafia, Daniel W. Phillips and Behzad Vahedi. Exploring the Notion of Spatial Lenses

Abstract: We explore the idea of spatial lenses as pieces of software interpreting data sets in a particular spatial view of an environment. The lenses serve to prepare the data sets for subsequent analysis in that view. Examples include a network lens to view places in a literary text, or a eld lens to interpret pharmacy sales in terms of seasonal allergy risks. The theory underlying these lenses is that of core concepts of spatial information, but here we exploit how these concepts enhance the usability of data rather than that of systems. Spatial lenses also supply transformations between multiple views of an environment, for example, between eld and object views. They lift these transformations from the level of data format conversions to that of understanding an environment in multiple ways. In software engineering terms, spatial lenses are defined by constructors, generating instances of core concept representations from spatial data sets. Deployed as web services or libraries, spatial lenses would make larger varieties of data sets amenable to mapping and spatial analysis, compared to today’s situation, where le formats determine and limit what one can do. To illustrate and evaluate the idea of spatial lenses, we present a set of experimental lenses, implemented in a variety of languages, and test them with a variety of data sets, some of them non-spatial.