We rebuilt Canada’s Spatial Access Measures dataset to show what happens when public data meets modern tools. By converting the legacy CSVs to GeoParquet and running them through DuckDB-WASM and deck.gl, we created an interactive map that works entirely in the browser. It’s fast, lightweight, and ready to explore.

Blog post cover image

Image by Srikanth Sistu

We like to conduct experiments with modern, open, and lightweight ways to make public data easier to use and share.

Canada’s Spatial Access Measures (SAM) dataset offered a perfect test case. SAM measures how easily residents can reach essential services—like healthcare, education, or groceries—by walking, cycling, or public transit. Despite its value, the dataset sits inside of large CSV files without built-in geometry. Analysts have to join shapefiles, manage multiple files, and run cumbersome workflows just to visualize or query the data.

We wanted to see what this dataset would look like as a fully cloud-native geospatial asset that is optimized for performance, interactivity, and accessibility. Using GeoParquet, DuckDB-WASM, and deck.gl, we built an entirely browser-based application that lets anyone explore and query millions of rows of spatial data, instantly.

This project reimagines the SAM dataset as a cloud-native geospatial asset by converting it to GeoParquet, a columnar, compressed format with embedded geometry. Combined with DuckDB-WASM (a SQL engine that runs entirely in the browser) and deck.gl for high-performance maps, we enable fully interactive spatial analysis, with no backend server or custom API.

Explore the cloud-native demo app

Check out the open repository

Spatial Access Measures demo app

From CSV to Cloud-Native

The original SAM dataset came as four massive CSVs totaling hundreds of megabytes, each containing accessibility metrics but no geometry. To map the data, you had to separately download dissemination block shapefiles from Statistics Canada and manually join them which is a tedious process that made the data technically open, but practically hard to use.

We converted the SAM dataset into a single GeoParquet file (a compressed, columnar format that embeds geometry alongside attributes). Each record was joined with its dissemination block polygon, encoded in Well-Known Binary (WKB) for compatibility with DuckDB-WASM (which doesn’t yet support GeoArrow geometries). The result: one compact, analysis-ready file that can be queried directly in the browser without any backend or API.

We also tuned the file for performance. Sorting rows by city name improved filtering speed, and adjusting row group size (~122,880 rows) balanced load time and memory efficiency. These optimizations let users explore data instantly through the browser, without heavy downloads or specialized software.

Why GeoParquet Works

Smaller and faster. Even with geometry added, the GeoParquet file is far smaller than the original CSVs. Compression and columnar encoding make it quick to load from cloud storage with no unzipping or preprocessing.

Spatially aware. Built-in geometries and bounding box indexes enable spatial filtering and queries through DuckDB’s spatial extension. The dataset stays in WGS84, ready for web maps out of the box.

Self-describing. Metadata captures schema and coordinate system information, so new users can open and understand the file without separate documentation or manual setup.

You can see the full transformation process in the Spatial_access_measures.ipynb notebook.

Running SQL in the Browser

Once the dataset lived in GeoParquet, the next step was to make it queryable entirely in the browser. DuckDB-WASM, a WebAssembly build of the analytical database DuckDB, makes this possible. It supports multi-million-row SQL queries that run locally on the client side, with no server or database connection.

We use DuckDB’s httpfs extension to stream only the relevant data via HTTP range requests. When a user filters by city or amenity type, DuckDB fetches just the needed columns and rows. Geometry is embedded and indexed, so spatial filters and joins are fast and efficient. Users can run queries like:

SELECT * FROM sam_data
WHERE city_name = 'Toronto' AND acs_idx_gs < 0.2;

and see results in under a second—no backend, no lag. It’s a fully interactive data exploration experience that feels like a web app with a tuned database, but runs entirely on the user’s device.

You can see this implementation in App.tsx.

Interactive Mapping with Deck.GL

For visualization, we use deck.gl—a WebGL-powered framework designed for high-performance maps. deck.gl handles complex geometries by pushing computation to the GPU, allowing smooth rendering of thousands (or millions) of features without freezing the browser.

Our map uses GeoArrowPolygonLayer to display dissemination blocks as choropleths. Accessibility scores are visualized with a purple-to-white gradient (low to high access), and users can filter by transport mode (walking, cycling, transit) or amenity type (groceries, healthcare, education). Hovering over a block reveals detailed scores and context.

Because data flows directly from DuckDB as Arrow tables with no intermediate JSON parsing, the visualization updates almost instantly. This binary-to-binary pipeline eliminates a major performance bottleneck in typical web maps and creates a responsive, GPU-accelerated experience that runs smoothly even at national scale.

A Faster, Lighter experience

The initial map views loads in seconds while only fetching the relevant rows from the Parquet file. The same approach can scale to even larger datasets by partitioning Parquet files by region. DuckDB simply skips irrelevant partitions, maintaining speed and responsiveness.

Why This Matters for Data Publishers

By combining GeoParquet, DuckDB-WASM, and deck.gl, we’ve built a model for lightweight, performant, and accessible spatial data publishing. Data providers can host a single optimized file and a static web app, with no backend or API, and still deliver a fully interactive analytical experience.

This approach lowers infrastructure costs and simplifies maintenance while preserving openness and privacy. It also raises the standard for open data delivery: “click to explore in-browser” instead of “download and wrangle offline.”

The implications extend beyond the SAM dataset. Census data, infrastructure networks, environmental models, and other large public datasets can all benefit from this approach. As cloud-native formats mature and browser-based tools evolve, interactive geospatial analysis can live on the open web.

Explore the demo, browse the repo, and imagine what other legacy datasets could be brought to life in the browser.

What we're doing.

Latest