Thomas Mølhave (Moelhave)

Publications

Dissertation

Handling Massive Terrains and Unreliable Memory

Thomas Mølhave.

Aarhus University, 2009.

Abstract:
Recent technological advances have greatly increased the ability to acquire, store, and analyze data. These developments have significantly improved the potential of many commercial and scientific applications, and lead to many new scientific discoveries. We are growing accustomed to accessing massive amounts of information from almost anywhere using devices ranging from cell phones and tiny GPS navigation systems, to ordinary computers and beyond. The large amount of information available presents a number of problems and opportunities. One of the main obstacles is most software is not designed to handle large amounts of data, resulting in crashes or running for a very long time on even moderately-sized data sets. Another problem is contemporary memory devices can be unreliable due to a number of factors, such as power failures, radiation, and cosmic rays. The content of a cell in unreliable memory can be silently altered and this can adversely affect most traditional algorithms. The focus of this dissertation is on the algorithms and data structures specifically designed for solving a number of the problems involving large data sets and unreliable memory devices. The dissertation is divided into two parts. In Part I, we use the classical external memory model by Aggarwal and Vitter, and the cache-oblivious model recently proposed by Frigo et al., to design cache-efficient algorithms. We focus on problems involving terrain models which, due to modern terrestrial scanning techniques, can be very large. We present the TerraSTREAM software package, which solves many common computational problems on big terrains. We also present an I/O-efficient algorithm for computing contour maps of a terrain and a cache-oblivious algorithm for finding intersections between two sets of internally non-intersecting line segments. In Part II we use the faulty memory RAM, proposed by Finocchi and Italiano, to model unreliable memory circuits and design algorithms that are resilient to memory faults. We present a resilient priority as well as an optimal comparison-based resilient algorithm for searching in a sorted array. We also show how to use this algorithm to get a dynamic resilient dictionary. Finally, we present a model that combines the standard external memory model with the faulty memory RAM and present lower and upper bounds for I/O-efficient resilient dictionaries, an I/O-efficient resilient sorting algorithm and an I/O-efficient resilient priority queue.

hide details read details

Conference Papers

Maintaining Contour Trees of Dynamic Terrains

Pankaj K. Agarwal, Thomas M\olhave, Morten Revsb\aek, Issam Safa, Yusu Wang, Jungwoo Yang.

SCG '15 Proceedings of the 31th Annual Symposium on Computational Geometry, 2015.

Abstract:
We study the problem of maintaining the contour tree T of a terrain Σ, represented as a triangulated xy-monotone surface, as the heights of its vertices vary continuously with time. We characterize the combinatorial changes in T and how they relate to topological changes in Σ. We present a kinetic data structure (KDS) for maintaining T efficiently. It maintains certificates that fail, i.e., an event occurs, only when the heights of two adjacent vertices become equal or two saddle vertices appear on the same contour. Assuming that the heights of two vertices of Σ become equal only O(1) times and these instances can be computed in O(1) time, the KDS processes O(κ+n) events, where n is the number of vertices in Σ and κ is the number of events at which the combinatorial structure of T changes, and processes each event in O(log n) time. The KDS can be extended to maintain an augmented contour tree and a join/split tree.
doi: 10.4230/LIPIcs.SOCG.2015.796

hide details read details
Computing Highly Occluded Paths Using a Sparse Network

Niel Lebeck, Thomas Mølhave, Pankaj K. Agarwal.

GIS '14: Proceedings of the 22th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2014.

Abstract:
Computing paths over a terrain that are highly occluded with respect to observers is an important problem in GIS. Given a fast algorithm for computing the visibility map, the path-planning step becomes the bottleneck. In this paper, we present an approach for quickly computing occluded paths over a terrain using a sparse network, a sparse 1-dimensional network over the terrain. We present different strategies for constructing the sparse network. Experimental results show that our approach results in significantly improved time for computing highly occluded paths between two query points, and that the different strategies offer a tradeoff between higher-quality paths and lower preprocessing times. Furthermore, there are strategies that achieve near-optimal paths with small preprocessing cost.
doi: 10.1145/2666310.2666394

hide details read details
Model-Driven Matching and Segmentation of Trajectories

Swaminathan Sankararamana, Pankaj K. Agarwal, Thomas Mølhave, Jiangwei Pan, Arnold P. Boedihardjo.

GIS '13: Proceedings of the 21th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2013.

Abstract:
A fundamental problem in analyzing trajectory data is to identify common patterns between pairs or among groups of trajectories. In this paper, we consider the problem of matching similar portions between a pair of trajectories, each observed as a sequence of points sampled from it. We present new measures of trajectory similarity — both local and global — between a pair of trajectories to distinguish between similar and dissimilar portions. We then use this model to perform segmentation of a set of trajectories into \em fragments, contiguous portions of trajectories shared by many of them. Our model for similarity is robust under noise and sampling rate variations. The model also yields a score which can be used to rank multiple pairs of trajectories according to similarity, e.g. in clustering applications. We present quadratic time algorithms to compute the similarity between trajectory pairs under our measures together with algorithms to identify fragments in a large set of trajectories efficiently using the similarity model. Finally, we present an extensive experimental study evaluating the effectiveness of our approach on real datasets, comparing it with earlier approaches. Our experiments show that our model for similarity is highly accurate in distinguishing similar and dissimilar portions as compared to earlier methods even with sparse sampling. Further, our segmentation algorithm is able to identify a small set of fragments capturing the common parts of trajectories in the dataset.
doi: 10.1145/2525314.2525360

hide details read details
Computing Highly Occluded Paths on a Terrain

Niel Lebeck, Thomas Mølhave, Pankaj K. Agarwal.

GIS '13: Proceedings of the 21th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2013.

Abstract:
Understanding the locations of highly occluded paths on a terrain is a fundamental GIS problem. In this paper we present a model and a fast algorithm for computing highly occluded paths on a terrain. It does not assume that the observer locations are known and yields a path likely to be occluded under a rational observer strategy. We present experimental results that examine several different observer strategies.
doi: 10.1145/2525314.2525363

hide details read details
Simplifying Massive Contour Maps

Lars Arge, Lasse Deleuran, Thomas Mølhave, Morten Revsbæk, Jakob Truelsen.

ESA '12 Proceedings of the 20th European Symposium on Algorithms, 2012.

Abstract:
We present a simple, efficient and practical algorithm for constructing and subsequently simplifying contour maps from massive high-resolution DEMs, under some practically realistic assumptions on the DEM and contours.
doi: 10.1007/978-3-642-33090-2_10

hide details read details
TerraNNI: Natural Neighbor Interpolation on a 3D Grid Using a GPU

Alex Beutel, Thomas Mølhave, Pankaj K. Agarwal, Arnold P. Boedihardjo, James A. Shine.

GIS '11 Proceedings of the 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2011.

Abstract:
With modern focus on LiDAR technology the amount of topographic data, in the form of massive point clouds, has increased dramatically. Furthermore, due to the popularity of LiDAR, repeated surveys of the same areas are becoming more common. This trend will only increase as topographic changes prompt surveys over already scanned terrain, in which case we obtain large spatio-temporal data sets. In dynamic terrains, such as coastal regions, such spatio-temporal data can offer interesting insight into how the terrain changes over time. An initial step in the analysis of such data is to create a digital elevation model representing the terrain over time. In the case of spatio-temporal data sets those models often represent elevation on a 3D volumetric grid. This involves interpolating the elevation of LiDAR points on these grid points. In this paper we show how to efficiently perform natural neighbor interpolation over a 3D volumetric grid. Using a graphics processing unit (GPU), we describe different algorithms to attain speed and GPU-memory trade-offs. Our algorithm extends to higher dimensions. Our experimental results demonstrate that the algorithm is efficient and scalable.
doi: 10.1145/2093973.2093984

hide details read details
Exploiting Temporal Coherence in Forest Dynamics

Pankaj K. Agarwal, Thomas Mølhave, Hai Yu, James S. Clark.

SCG '11 Proceedings of the 27th Annual Symposium on Computational Geometry, 2011.

Abstract:
Understanding the impact of climate and land-use on forest ecosystems involves modeling and simulating complex spatial interactions at many different scales. With this goal in mind, we have developed an individual-based, spatially explicit forest simulator, which incorporates fine-scale processes that influence forest dynamics. In this paper we present new, faster algorithms for computing understory light and for dispersal of seeds — the two most computationally intensive submodules in our simulator. By exploiting temporal coherence, we circumvent the problem of doing the entire simulation at each step. We provide experimental results that support the efficiency and efficacy of our approach.
doi: 10.1145/1998196.1998210

hide details read details
I/O-Efficient Contour Queries on Terrains

Pankaj K. Agarwal, Thomas Mølhave, Bardia Sadri.

SODA '11: Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, 2011.

Abstract:
A terrain M can be represented as a triangulation of the plane along with a height function associated with the vertices (and linearly interpolated within the edges and triangles) of M. We investigate the problem of answering contour queries on M: Given a height l and a triangle f of M that intersects the level set of M at height l, report the list of the edges of the connected component of this level set that intersect f, sorted in clockwise or counter-clockwise order. Contour queries are different from level-set queries in that only one contour (connected component of the level set) out of all those that may exist is expected to be reported. We present an I/O-efficient data structure of linear size that answers a contour query in O(log_B N + T/B) I/Os, where N is the number of triangles in the terrain and T is the number of edges in the output contour. The data structure can be constructed using O(Sort(N)) I/Os.

hide details read details
Best paper awardNatural neighbor interpolation based grid DEM construction using a GPU

Alex Beutel, Thomas Mølhave, Pankaj K. Agarwal.

GIS '10: Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2010.

Abstract:
With modern LiDAR technology the amount of topographic data, in the form of massive point clouds, has increased dramatically. One of the most fundamental GIS tasks is to construct a grid digital elevation model (DEM) from these 3D point clouds. In this paper we present a simple yet very fast algorithm for constructing a grid DEM from massive point clouds using natural neighbor interpolation (NNI). We use a graphics processing unit (GPU) to significantly speed up the computation. To handle the large data sets and to deal with graphics hardware limitations clever blocking schemes are used to partition the point cloud. For example, using standard desktop computers and graphics hardware, we construct a high-resolution grid with 150 million cells from two billion points in less than thirty-seven minutes. This is about one-tenth of the time required for the same computer to perform a standard linear interpolation, which produces a much less smooth surface.
doi: 10.1145/1869790.1869817

hide details read details
Cleaning Massive Sonar Point Clouds

Lars Arge, Kasper Green Larsen, Thomas Mølhave, Freek Walderveen.

GIS '10: Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2010.

Abstract:
We consider the problem of automatically cleaning massive sonar data point clouds, that is, the problem of automatically removing noisy points that for example appear as a result of scans of (shoals of) fish, multiple reflections, scanner self-reflections, refraction in gas bubbles, and so on. We describe a new algorithm that avoids the problems of previous local-neighbourhood based algorithms. Our algorithm is theoretically I/O-efficient, that is, it is capable of efficiently processing massive sonar point clouds that do not fit in internal memory but must reside on disk. The algorithm is also relatively simple and thus practically efficient, partly due to the development of a new simple algorithm for computing the connected components of a graph embedded in the plane. A version of our cleaning algorithm has already been incorporated in a commercial product.
doi: 10.1145/1869790.1869815

hide details read details
Scalable algorithms for large high-resolution terrain data

Thomas Mølhave, Pankaj K. Agarwal, Lars Arge, Morten Revsbæk.

COM.Geo '10: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010.

Abstract:
In this paper we demonstrate that the technology required to perform typical GIS computations on very large high-resolution terrain models has matured enough to be ready for use by practitioners. We also demonstrate the impact that high-resolution data has on common problems. To our knowledge, some of the computations we present have never before been carried out by standard desktop computers on data sets of comparable size.
doi: 10.1145/1823854.1823878

hide details read details
Counting in the Presence of Memory Faults

Gerth Brodal, Allan Jørgensen, Gabriel Moruz, Thomas Mølhave.

ISAAC '09: Proceedings of the 20th Annual International Symposium on Algorithms and Computation, 2009.

Abstract:
The faulty memory RAM presented by Finocchi and Italiano is a variant of the RAM model where the content of any memory cell can get corrupted at any time, and corrupted cells cannot be distinguished from uncorrupted cells. An upper bound, δ, on the number of corruptions and O(1) reliable memory cells are provided. Θ(δ) times and paying Θ(δ) time every time a counter is queried or incremented. In this paper we decrease the expensive increment cost to o(δ) and present upper and lower bound tradeoffs decreasing the increment time at the cost of the accuracy of the counters.
doi: 10.1007/978-3-642-10631-6_85

hide details read details
Impacts of 21st century sea-level rise on a major city (Aarhus, Denmark) - an assessment based on fine-resolution digital topography and a new flooding algorithm

Jesper Moeslund Eshøj, Peder Klith Bøcher, Jens-Christian Svenning, Thomas Mølhave, Lars Arge.

IOP Conf. Series: Earth and Environmental Science, 2009.

Abstract:
This study examines the potential impact of 21st century sea-level rise on Aarhus, the second largest city in Denmark, emphasizing the economic risk to the city's real estate. Furthermore, it assesses which possible adaptation measures that can be taken to prevent flooding in areas particularly at risk from flooding. We combine a new national Digital Elevation Model in very fine resolution ( 2 meter), a new highly computationally efficient flooding algorithm that accurately models the influence of barriers, and geospatial data on real-estate values to assess the economic real-estate risk posed by future sea-level rise to Aarhus. Under the A2 and A1FI (IPCC) climate scenarios we show that relatively large residential areas in the northern part of the city as well as areas around the river running through the city are likely to become flooded in the event of extreme, but realistic weather events. In addition, most of the large Aarhus harbour would also risk flooding. As much of the area at risk represent high-value real estate, it seems clear that proactive measures other than simple abandonment should be taken in order to avoid heavy economic losses. Among the different possibilities for dealing with an increased sea level, the strategic placement of flood-gates at key potential water-inflow routes and the construction or elevation of existing dikes seems to be the most convenient, most socially acceptable, and maybe also the cheapest solution. Finally, we suggest that high-detail flooding models similar to those produced in this study will become an important tool for a climate-change-integrated planning of future city development as well as for the development of evacuation plans.
doi: 10.1088/1755-1315/8/1/012022

hide details read details
Fault Tolerant External Memory Algorithms

Gerth Stølting Brodal, Allan Grønlund Jørgensen, Thomas Mølhave.

WADS '09: Proceedings of the 11th Algorithms and Data Structures Symposium, 2009.

Abstract:
Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano. However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable.
doi: 10.1007/978-3-642-10631-6_85

hide details read details
Cache-Oblivious Red-Blue Line Segment Intersection

Lars Arge, Thomas Mølhave, Norbert Zeh.

ESA '08: Proceedings of the 16th annual European symposium on Algorithms, 2008.

Abstract:
We present an optimal cache-oblivious algorithm for finding all intersections between a set of non-intersecting red segments and a set of non-intersecting blue segments in the plane. Our algorithm uses O(N/Blog_M/BN/B+T/B) memory transfers, where N is the total number of segments, M and B are the memory and block transfer sizes of any two consecutive levels of any multilevel memory hierarchy, and T is the number of intersections.
doi: 10.1007/978-3-540-87744-8_8

hide details read details
I/O-Efficient Algorithms for Computing Contours on a Terrain
Pankaj K. Agarwal, Lars Arge, Thomas Mølhave, Bardia Sadri.

SCG '08: Proceedings of the 24th Annual Symposium on Computational Geometry, 2008.
Abstract:
A terrain M is the graph of a bivariate function. We assume that M is represented as a triangulated surface with N vertices. A contour (or isoline) of M is a connected component of a level set of M. Generically, each contour is a closed polygonal curve; at ``critical'' levels these curves may touch each other or collapse to a point. We present I/O-efficient algorithms for the following two problems related to computing contours of M:

[(i)] Given a sequence l₁ < ... < l_s of real numbers, we present an I/O-optimal algorithm that reports all contours of M at heights l₁, ..., l_s using O(Sort(N)+T/B) I/Os, where T is the total number edges in the output contours, B is the ``block size,'' and Sort(N) is the number of I/Os needed to sort N elements. The algorithm uses O(N/B) disk blocks. Each contour is generated individually with its composing segments sorted in clockwise or counterclockwise order. Moreover, our algorithm generates information on how the contours are nested.
[(ii)] We can preprocess M, using O(Sort(N)) I/Os, into a linear-size data structure so that all contours at a given height can be reported using O(log_B N + T/B) I/Os, where T is the output size. Each contour is generated individually with its composing segments sorted in clockwise or counterclockwise order.
doi: 10.1145/1377676.1377698
hide details read details
TerraStream: From Elevation Data to Watershed Hierarchies

Andrew Danner, Thomas Mølhave, Ke Yi, Pankaj K. Agarwal, Lars Arge, Helena Mitasova.

GIS '07: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, 2007.

Abstract:
We consider the problem of extracting a river network and a watershed hierarchy from a terrain given as a set of irregularly spaced points. We describe TerraStream, a ``pipelined'' solution that consists of four main stages: construction of a digital elevation model (DEM), hydrological conditioning, extraction of river networks, and construction of a watershed hierarchy. Our approach has several advantages over existing methods. First, we design and implement the pipeline so each stage is scalable to massive data sets; a single non-scalable stage would create a bottleneck and limit overall scalability. Second, we develop the algorithms in a general framework so that they work for both TIN and grid DEMs. Terra-Stream is flexible and allows users to choose from various models and parameters, yet our pipeline is designed to reduce (or eliminate) the need for manual intervention between stages. We have implemented TerraStream and present experimental results on real elevation point sets that show that our approach handles massive multi-gigabyte terrain data sets. For example, we can process a data set containing over 300 million points—over 20GB of raw data—in under 26 hours, where most of the time (76%) is spent in the initial CPU-intensive DEM construction stage.
doi: 10.1145/1341012.1341049

hide details read details
Optimal Resilient Dynamic Dictionaries

Gerth Stølting Brodal, Rolf Fagerberg, Irene Finocchi, Fabrizio Grandoni, Giuseppe Italiano, Allan Grønlund Jørgensen, Gabriel Moruz, Thomas Mølhave.

ESA '07: Proceedings of the 15th annual European symposium on Algorithms, 2007.

Abstract:
We investigate the problem of computing in the presence of faults that may arbitrarily (i.e., adversarially) corrupt memory locations. In the faulty memory model, any memory cell can get corrupted at any time, and corrupted cells cannot be distinguished from uncorrupted ones. An upper bound δ on the number of corruptions and O(1) reliable memory cells are provided. In this model, we focus on the design of resilient dictionaries, i.e., dictionaries which are able to operate correctly (at least) on the set of uncorrupted keys. We first present a simple resilient dynamic search tree, based on random sampling, with O(log n + δ) expected amortized cost per operation, and O(n) space complexity. We then propose an optimal deterministic static dictionary supporting searches in Θ(log n+δ) time in the worst case, and we show how to use it in a dynamic setting in order to support updates in O(log n+δ) amortized time. Our dynamic dictionary also supports range queries in O(log n+δ+t) worst case time, where t is the size of the output. Finally, we show that every resilient search tree (with some reasonable properties) must take Ω(log n + δ) worst-case time per search.
doi: 10.1007/978-3-540-75520-3_32

hide details read details
Priority Queues Resilient to Memory Faults

Allan Grønlund Jørgensen, Gabriel Moruz, Thomas Mølhave.

WADS '07: Proceedings of the 10th International Workshop on Algorithms and Data Structures, 2007.

Abstract:
In the faulty-memory RAM model, the content of memory cells can get corrupted at any time during the execution of an algorithm, and a constant number of uncorruptible registers are available. A resilient data structure in this model works correctly on the set of uncorrupted values. In this paper we introduce a resilient priority queue. The deletemin operation of a resilient priority queue returns either the minimum uncorrupted element or some corrupted element. Our resilient priority queue uses O(n) space to store n elements. Both insert and deletemin operations are performed in O(log n+δ) time amortized, where δ is the maximum amount of corruptions tolerated. Our priority queue matches the performance of classical optimal priority queues in the RAM model when the number of corruptions tolerated is O(log n). We prove matching worst case lower bounds for resilient priority queues storing only structural information in the uncorruptible registers between operations.
doi: 10.1007/978-3-540-73951-7_12

hide details read details

Journal Papers

To AppearTerraNNI: Natural Neighbor Interpolation on 2D and 3D Grids using a GPU

Pankaj K. Agarwal, Alex Beutel, Thomas Mølhave.

Transactions on Spatial Algorithms and Systems, 2015.

Abstract:
With modern focus on remote sensing technology, such as LiDAR, the amount of spatial data, in the form of massive point clouds, has increased dramatically. Furthermore, repeated surveys of the same areas are becoming more common. This trend will only increase as topographic changes prompt surveys over already scanned areas, in which case we obtain large spatio-temporal data sets. An initial step in the analysis of such spatial data is to create a digital elevation model representing the terrain, possibly over time. In the case of spatial (resp. spatio-temporal) data sets, these models often represent elevation on a 2D (resp. 3D) grid. This involves interpolating the elevation of LiDAR points on these grid points. In this paper we show how to efficiently perform natural neighbor interpolation over a 2D and 3D grid. Using a graphics processing unit (GPU), we describe different algorithms to attain speed and GPU-memory trade-offs. Our experimental results demonstrate that our algorithms are not only significantly faster than earlier ones, but also scale to much bigger data sets than previous algorithms were unable to handle.
doi: 10.1145/2786757

hide details read details
Attaching uncertainty to deterministic spatial interpolations

Souparno Ghosh, Alan E. Gelfand, Thomas Mølhave.

Statistical Methodology, 2012.

Abstract:
Deterministic spatial interpolation algorithms such as the natural neighbor interpolation (NNI) or the Cressman interpolation schemes are widely used to interpolate environmental features. In particular, the former have been applied to digital elevation models (DEM's), the latter to weather data and pollutant exposure. However, they are unsatisfying in that they fail to provide any uncertainty assessment. Such schemes are not model-based; rather, they provide a set of rules, usually geometrically motivated, by which point-level data is interpolated to a grid. We distinguish this setting from the case where the deterministic model is essentially a mapping from inputs to outputs in which case a joint model can be formulated to assign uncertainty. In our setting we have no inputs, only an interpolated surface at some spatial resolution. We propose a general approach to handle the non model-based setting. In fact, the approach can be used to assign uncertainty to any supplied surface regardless of how it was created. We first formulate a useful notion of uncertainty and then show, with additional external validation data, that we can attach uncertainty using a convenient version of a data fusion model. We also clarify the distinction between this setting and the more usual case where we are trying to build an explanatory model to explain an environmental surface. We discuss two settings for such interpolation, one where the surface is presumed to be continuous such as elevation or temperature and the other where the surface would be discontinuous such as with precipitation where, at any location, there would be a point mass in the distribution at 0. We work within a hierarchical Bayesian framework and illustrate with a DEM within the Cape Floristic Region of South Africa.
doi: 10.1016/j.stamet.2011.06.001

hide details read details

Other Papers

Invited abstractUsing TPIE for Processing Massive Data Sets in C++

Thomas Mølhave.

ACM SIGSPATIAL Special, 2012.
Volumetric Grid Construction using 3D Natural Neighbor Interpolation on the GPU

Alex Beutel, Thomas Mølhave, Pankaj K. Agarwal.

MASSIVE '11: Proceedings of the Workshop on Massive Data Algorithmics, 2011.

Abstract:
With modern focus on LiDAR technology the amount of topographic data, in the form of massive point clouds, has increased dramatically. Furthermore, due to the popularity of LiDAR repeated surveys of the same areas are beginning to become more common, a trend that will only increase as topographic changes prompt surveys over already scanned terrain to be made. In those cases we get large spatio-temporal datasets. In dynamic terrains, such as coastal regions, such spatio-temporal data can offer interesting insight into how the terrain changes over time. An initial step in the analysis of such data is to create a model representing the terrain. In the case of spatio-temporal datasets those models are often 3D volumetric grids. In this paper we show how to efficiently compute natural neighbor interpolation in 3 and higher dimensions. We use a graphics processing unit (GPU) to increase performance and we describe different algorithms to attain speed and GPU-memory trade-offs.

hide details read details
Modeling and Analyzing Massive Terrain Data

Pankaj K. Agarwal, Thomas Mølhave.

National Science Foundation TeraGrid Workshop on Cyber-GIS, 2010.
Fault Tolerant External Memory Algorithms

Gerth Stølting Brodal, Allan Grønlund Jørgensen, Thomas Mølhave.

MASSIVE '09: Proceedings of the Workshop on Massive Data Algorithmics, 2009.

Abstract:
Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano. However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable.

hide details read details
GIS ved MADALGO

Lars Arge, Thomas Mølhave.

Geoforum, 2009.
Optimal Resilient Dynamic Dictionaries

Gerth Stølting Brodal, Rolf Fagerberg, Allan Grønlund Jørgensen, Gabriel Moruz, Thomas Mølhave.

Department of Computer Science, Aarhus University, 2007.

Abstract:
In the resilient memory model any memory cell can get corrupted at any time, and corrupted cells cannot be distinguished from uncorrupted cells. An upper bound, δ, on the number of corruptions and O(1) reliable memory cells are provided. In this model, a data structure is denoted resilient if it gives the correct output on the set of uncorrupted elements. We propose two optimal resilient static dictionaries, a randomized one and a deterministic one. The randomized dictionary supports searches in O(log n+δ) expected time using O(log δ) random bits in the worst case, under the assumption that corruptions are not performed by an adaptive adversary. The deterministic static dictionary supports searches in O(log n+δ) time in the worst case. We also introduce a deterministic dynamic resilient dictionary supporting searches in O(log n + δ) time in the worst case, which is optimal, and updates in O(log n+δ) amortized time. Our dynamic dictionary supports range queries in O(log n+δ+k) worst case time, where k is the size of the output.

hide details read details

Abstracts

Tilgængeligt, Troværdigt og Handlingsrettet Terrændata

Lars Arge, Thomas Mølhave, Morten Revsbæk.
Kortdage '13 Presented at Kortdage, 2013.
Analyzing big terrain data from space

Lars Arge, Thomas Mølhave, Morten Revsbæk, Jakob Truelsen, Freek Walderveen.
Presented at European Space Agency - Big Data From Space, 2013.
Abstract:
Terrain data is gathered for increasingly large areas and levels of detail. Since the Shuttle Radar Topography Mission (SRTM) more than a decade ago, nearglobal equator). In 2009 the ASTER GDEM was made available with global coverage in 1-arcsecond (30m) resolution and already from next year, in 2014, the radarsatellite-sourced 12m WorldDEM model from Astrium will be available with global coverage. GIS software has traditionally enabled end-users to answer a wide range of useful questions based on detailed terrain data e.g. questions concerning flood risk from sea and rainfall or questions concerning visibility of objects in the terrain (e.g. power lines, radio towers and windmills). However most of these GIS software are built on data processing algorithms that fundamentally assume data to fit in the main memory of the computing device. These algorithms incur a significant slowdown (often a factor in the order of 10^6) once data becomes bigger than main memory, effectively making the computation infeasible. A backof-the-envelope assessment shows that a 12m raster model with global coverage will contain on the order of 10^12 raster cells. This is significantly larger than the memory of most computing devices and it will be infeasible to analyze a dataset of this size (or anywhere near it) with traditional GIS software. Many different approaches have been attempted to work around the shortcomings of traditional GIS software in handling big terrain data. The two most widely used are simply to either lower the detail/resolution of the data so that it fits in main memory (data simplification) or split the data into smaller pieces and handle each piece separately and independently(data tiling). However, for many purposes, such as flood risk and visibility, these approaches significantly reduce the quality of the output produced. For example, data simplification might cause important information about dikes and other terrain features to be lost, significantly reducing the quality of the flood risk analysis and data tiling makes it difficult to reason about hydrological features spanning tile boundaries. In recent years a major research effort has been put into developing so-called external memory algorithms for a wide range of computational problems. These algorithms do not assume that data fits in memory of the computing device and can therefore efficiently process extremely big data sets. SCALGO was founded by leading researchers in computational geometry and external memory algorithms and aims at commercializing external memory algorithms for a wide range of GIS problems. SCALGO has developed software that can answer e.g. flood risk questions and visibility questions on detailed global data without using the known workarounds of data simplification and data tiling. In our talk, we will highlight the vast potential of the high quality terrain data that will be available in the near future from space-based sensor platforms. We will give an overview of the algorithmic techniques needed to process this terrain data and present examples of how we have previously successfully analyzed flood risk and visibility on raster terrains containing billions of raster cells. For example, we will give an online demonstration of the results we have achieved from analyzing the near-global SRTM data.

hide details read details
Beregning af National Oversvømmelsesrisiko

Lars Arge, Thomas Mølhave, Morten Revsbæk.
Kortdage '12 Presented at Kortdage, 2012.
Abstract:

hide details read details
Detaljerede og brugbare landsdækkende konturkort

Lars Arge, Lasse Deleuran, Thomas Mølhave, Morten Revsbæk, Jakob Truelsen.
Kortdage '12 Presented at Kortdage, 2012.
Abstract:
doi: 10.1145/2367574.2367579

hide details read details
Flood Risk Analysis Using Massive LiDAR Terrain Data

Lars Arge, Thomas Mølhave, Morten Revsbæk.
ELMF '11 Presented at the European LiDAR Mapping Forum, 2011.
Abstract:
As detailed LiDAR terrain data for large geographic areas is increasingly made public, the problems in processing the massive point clouds are becoming increasingly apparent. For example, while two detailed LiDAR terrain datasets were produced for the country of Denmark several years ago, the use of the data in advanced countrywide analysis and modeling applications have been virtually nonexistent. The LiDAR point cloud for Denmark's approximately 42.000 square-kilometers consists of about 26 billion points, which creates serious problems for most software. In this talk we will illustrate this issue in connection with flood risk analysis. We will discuss why massive LiDAR data often exposes software scalability problems, and also how these problems can be overcome using advanced algorithms developed at Center for Massive Data Algorithmics (MADALGO) at Aarhus University and commercialized by SCALGO. We will end the talk with an interactive visualization of how software based on these algorithms and running on a normal desktop computer has been used to analyze flood risk due to rising sea-level and extreme rain for the entire country of Denmark. The sea-level rise flood risk analysis will soon be made publicly available by the Danish government. The demonstration will include examples of how the large and detailed countrywide LiDAR point cloud was essential in the flood risk analysis, and thus provide motivation for governments and other stakeholders to invest in large national-scale LiDAR surveys.

hide details read details
Hvor løber vandet hen? Oversvømmelsesberegninger på store højdemodeller

Lars Arge, Thomas Mølhave, Jakob Truelsen, Johnny K. Rasmussen.
Kortdage '09 Presented at Kortdage, 2009.
Abstract:
Eksisterende GIS software er ofte ikke ret godt til at håndtere store datasæt. Dette er f.eks. blevet åbenlyst efter fremkomsten af detaljerede LIDAR-baserede 2-meter højdemodeller for hele Danmark. I foredraget vil vi diskutere dette i forbindelse med beregning af oversvømmelsesrisiko; vi vil diskutere hvorfor der er problemer med eksisterende software i sådanne beregninger, og også hvordan disse problemer kan afhjælpes. Foredraget vil inkludere en demonstration af dele af den MADALGO-udviklede software-pakke TerraSTREAM, som kan håndtere selv meget store højdemodeller, samt en Google Maps visualisering af en beregning af oversvømmelsesrisiko for Danmark ved brug af TerraSTREAM og COWI's landsdækkende 2-meter højdemodel. Os bekendt findes der ikke andet software der kan foretage en sådan beregning på en detaljeret landsdækkende model. Selvom detaljerede højdemodeller muliggør en mere præcis beregning af oversvømmelsesrisiko, f.eks. ved at de indeholder vigtige men forholdsvis små detaljer i landskabet (såsom diger), så er en helt præcis beregning af oversvømmelsesrisiko meget kompliceret. Ofte kan en god første vurdering dog opnås ved at fokusere på højdemodellen, og f.eks. ignorerer effekten af kloaker og grundvand. De beregninger vi vil præsentere fokuserer således på afstrømning af regnvand og på effekten af stigende have. Vores afstrømningmodellering består i en beregning af den såkaldte akkumulerede afstrømning for hvert punkt i en højdemodel (celle hhv. trekant i en grid hhv. TIN model), som ofte bruges til at estimere hvor man kan frygte oversvømmelse efter et kraftigt regnskyl. Vores modellering af stigende have bestå i en beregning af præcis hvilke punkter der vil blive oversvømmet ved en given vandstandsstigning. Beregning består således ikke blot i at finde alle dele af modellen under den givne vandstand, men derimod i at finde præcis de dele af modellen der kan nås fra kysten uden at møde en barriere højere end den givne vandstand. På denne måde tages der f.eks. - i modsætning til mange andre modelleringsværktøjer - højde for diger. Visualisering af vores beregninger på 2-meter modellen af Danmark der indgår i foredraget vil bl.a. illustrere effekten af diger igennem en sammenligning af den beregnede oversvømmelsesrisiko ved brug af traditionelle og nyere detaljerede højdemodeller.

hide details read details
Massive Terrain Data Processing: Scalable Algorithms

Andrew Danner, Thomas Mølhave, Ke Yi, Pankaj K. Agarwal, Lars Arge, Helena Mitasova.
FOSS4G '06 Presented at Free And Open Source Software for Geoinformatics, 2006.
Abstract:
Modern remote sensing methods such as LIDAR readily generate very large data sets of high-resolution elevation data. Several applications including stream mapping, landslide risk assessment, hydrological and erosion modeling can benefit from this high-resolution data, but processing the data sets which can be tens or hundreds of gigabytes in size poses a number of technical challenges. LIDAR point sets must be transformed into a digital elevation model (DEM) and derived products such as a river network or watersheds, line of sight information before users can conduct relevant studies. We describe our approach as a pipeline consisting of a number of individual stages. In the first stage we convert raw LIDAR point sets to a digital elevation models using the spline approximation method with substantially modified segmentation procedure to handle hundreds of millions of points. The constructed DEM may have some artifacts due to sampling noise or introduced by the approximation method. We therefore remove from the terrain topological noise that would impede water flow along a river network while preserving large natural depressions or sinks such as quarries or craters. The next stages use the denoised DEM for constructing various derived data or terrain analysis tools. For example, we have developed these stages for computing flow network and water shade hierarchies. We designed and implemented the pipeline mentioned above such that the entire pipeline is scalable to large data sets. A single non-scalable stage in the pipeline would create a bottleneck and limit overall scalability. The experimental results on real LIDAR data that show our approach is scalable to data sets containing hundreds of million of points--over 20GB of raw data. Our approach allows users to go from raw data to useful high-level information with little or no manual intervention; at the same time, our software is highly modular and each stage can be run individually if certain intermediate results are desired.

hide details read details

Thomas Mølhave (Moelhave)

About Me

Current position

CTO, Co-founder

Contact information

Education:

2009: PhD

2005: Master of Science

2004: Bachelor in Computer Science

Scientific interests

Publications

Dissertation

Conference Papers

Journal Papers

Other Papers

Abstracts

Other Information

Teaching

Fun facts

Links