DuckDB Spatial¶
DuckDB is a fast, self-contained analytical database. DuckDB Spatial is an extension to DuckDB that adds geospatial support.
You can pass a DuckDB query into the top-level viz
function to quickly inspect data:
import duckdb
from lonboard import viz
sql = "SELECT * FROM spatial_table;"
query = duckdb.sql(sql)
viz(query)
Additionally, all relevant Lonboard layer classes have a from_duckdb
method for DuckDB query input.
import duckdb
from lonboard import Map, PolygonLayer
sql = "SELECT * FROM polygon_table;"
query = duckdb.sql(sql)
layer = PolygonLayer.from_duckdb(
query,
get_fill_color=[255, 0, 0],
)
m = Map(layer)
m
Warning
The DuckDB query must be run with
duckdb.sql()
or
duckdb.DuckDBPyConnection.sql()
and not with duckdb.execute()
or duckdb.DuckDBPyConnection.execute()
.
For example
import duckdb
from lonboard import viz
sql = "SELECT * FROM spatial_table;"
query = duckdb.sql(sql)
viz(query)
Warning
DuckDB Spatial does not currently expose coordinate reference system information, so the user must ensure that data has been reprojected to EPSG:4326.
Custom Connection¶
If you're using a custom DuckDB connection (not the global default connection), ensure that you pass the connection object into the con
named parameter of viz
or from_duckdb
:
import duckdb
from lonboard import viz
con = duckdb.connect()
sql = "SELECT * FROM spatial_table;"
query = con.sql(sql)
viz(query, con=con)
You can also render an entire table by using the table()
method:
import duckdb
from lonboard import viz
con = duckdb.connect()
con.execute("CREATE TABLE spatial_table AS ...;")
viz(con.table(), con=con)
Or with from_duckdb
:
import duckdb
from lonboard import Map, PolygonLayer
con = duckdb.connect()
sql = "SELECT * FROM polygon_table;"
query = con.sql(sql)
layer = PolygonLayer.from_duckdb(
query,
con=con,
get_fill_color=[255, 0, 0],
)
m = Map(layer)
m
In from_duckdb
, you can alternatively just pass the SQL string into the method:
import duckdb
from lonboard import Map, PolygonLayer
con = duckdb.connect()
layer = PolygonLayer.from_duckdb(
"SELECT * FROM polygon_table;",
con=con,
get_fill_color=[255, 0, 0],
)
m = Map(layer)
m
Warning
The DuckDB query must be run with
duckdb.DuckDBPyConnection.sql()
and not with duckdb.DuckDBPyConnection.execute()
.
Additionally, ensure you pass in the con
parameter:
import duckdb
from lonboard import viz
con = duckdb.connect()
sql = "SELECT * FROM spatial_table;"
query = con.sql(sql)
viz(query, con=con)
Implementation Notes¶
Lonboard integrates with DuckDB by using its Arrow export support.
As of DuckDB Spatial version 0.10.2, DuckDB Spatial's primary GEOMETRY
type uses a custom, non-stable binary format. When exported to Python via an Arrow table, this GEOMETRY
type is maintained as a binary blob that is impossible to reliably parse in Python. The only way to interpret this column in Python-based tools is to get DuckDB to cast this column to a standardized format like Well-Known Binary (WKB).
We could require that users always call ST_AsWKB
on the geometry column on every query, but this is unwieldy and easy to forget. Instead, by requiring that the user pass in a DuckDBPyRelation
object instead of a DuckDBPyConnection
, we're able to use the replacement scan feature of DuckDBPyRelation
to call back into DuckDB and perform our own ST_AsWKB
call on the data, before it's left DuckDB memory, and then export the WKB geometries out to Arrow.
This is why we sometimes need access to the DuckDBPyConnection
(the con
variable) that created the DuckDBPyRelation
, because we need to perform a new query on the same connection to convert the geometry column into WKB.
Note that geometry types other than GEOMETRY
, namely WKB_BLOB
, POINT_2D
, LINESTRING_2D
, POLYGON_2D
, and BOX_2D
, do not suffer from this limitation. We can directly interpret those types in Python and do not need to call back into DuckDB.