U.S. County-to-County Migration¶
This notebook is derived from the original deck.gl example in JavaScript, which you can see here.
This dataset originally came from the U.S. Census Bureau and represents people moving in and out of each county between 2009-2013.
You can view a hosted version of this notebook on Notebook Sharing Space (6MB download).
Imports¶
import geopandas as gpd
import numpy as np
import pandas as pd
import pyarrow as pa
import requests
import shapely
from matplotlib.colors import Normalize
from lonboard import Map, ScatterplotLayer
from lonboard.experimental import ArcLayer
from lonboard.layer_extension import BrushingExtension
Fetch the data from the version in the deck.gl-data
repository.
url = "https://raw.githubusercontent.com/visgl/deck.gl-data/master/examples/arc/counties.json"
r = requests.get(url)
source_data = r.json()
arcs = []
targets = []
sources = []
pairs = {}
features = source_data["features"]
for i, county in enumerate(features):
flows = county["properties"]["flows"]
target_centroid = county["properties"]["centroid"]
total_value = {
"gain": 0,
"loss": 0,
}
for to_id, value in flows.items():
if value > 0:
total_value["gain"] += value
else:
total_value["loss"] += value
# If number is too small, ignore it
if abs(value) < 50:
continue
pair_key = "-".join(map(str, sorted([i, int(to_id)])))
source_centroid = features[int(to_id)]["properties"]["centroid"]
gain = np.sign(flows[to_id])
# add point at arc source
sources.append(
{
"position": source_centroid,
"target": target_centroid,
"name": features[int(to_id)]["properties"]["name"],
"radius": 3,
"gain": -gain,
}
)
# eliminate duplicate arcs
if pair_key in pairs.keys():
continue
pairs[pair_key] = True
if gain > 0:
arcs.append(
{
"target": target_centroid,
"source": source_centroid,
"value": flows[to_id],
}
)
else:
arcs.append(
{
"target": source_centroid,
"source": target_centroid,
"value": flows[to_id],
}
)
# add point at arc target
targets.append(
{
**total_value,
"position": [target_centroid[0], target_centroid[1], 10],
"net": total_value["gain"] + total_value["loss"],
"name": county["properties"]["name"],
}
)
# sort targets by radius large -> small
targets = sorted(targets, key=lambda d: abs(d["net"]), reverse=True)
normalizer = Normalize(0, abs(targets[0]["net"]))
We define some color constants, as well as a color lookup array.
A nice trick in numpy is that if you have a two-dimensional array like:
[
[166, 3, 3],
[ 35, 181, 184]
]
you can perform a lookup based on the index to transform data from one dimensionality to another. In this case, we'll use 0
and 1
— the two available indexes of the array's first dimension — to create an array of colors.
So when we call COLORS[colors_lookup]
that creates an output array of something like:
[
[166, 3, 3],
[ 35, 181, 184],
[166, 3, 3],
[166, 3, 3]
]
equal to the number of rows in our dataset. We can then pass this to any parameter that accepts a ColorAccessor.
# migrate out
SOURCE_COLOR = [166, 3, 3]
# migrate in
TARGET_COLOR = [35, 181, 184]
# Combine into a single arr to use as a lookup table
COLORS = np.vstack(
[np.array(SOURCE_COLOR, dtype=np.uint8), np.array(TARGET_COLOR, dtype=np.uint8)]
)
SOURCE_LOOKUP = 0
TARGET_LOOKUP = 1
brushing_extension = BrushingExtension()
brushing_radius = 200000
Convert the sources
list of dictionaries into a GeoPandas GeoDataFrame
to pass into a ScatterplotLayer
.
source_arr = np.array([source["position"] for source in sources])
source_positions = shapely.points(source_arr[:, 0], source_arr[:, 1])
source_gdf = gpd.GeoDataFrame(
pd.DataFrame.from_records(sources)[["name", "radius", "gain"]],
geometry=source_positions,
crs="EPSG:4326"
)
# We use a lookup table (`COLORS`) to apply either the target color or the source color
# to the array
source_colors_lookup = np.where(source_gdf["gain"] > 0, TARGET_LOOKUP, SOURCE_LOOKUP)
source_fill_colors = COLORS[source_colors_lookup]
Create a ScatterplotLayer
for source points:
source_layer = ScatterplotLayer.from_geopandas(
source_gdf,
get_fill_color=source_fill_colors,
radius_scale=3000,
pickable=False,
extensions=[brushing_extension],
brushing_radius=brushing_radius,
)
targets_arr = np.array([target["position"] for target in targets])
target_positions = shapely.points(targets_arr[:, 0], targets_arr[:, 1])
target_gdf = gpd.GeoDataFrame(
pd.DataFrame.from_records(targets)[["name", "gain", "loss", "net"]],
geometry=target_positions,
crs="EPSG:4326"
)
# We use a lookup table (`COLORS`) to apply either the target color or the source color
# to the array
target_line_colors_lookup = np.where(target_gdf["net"] > 0, TARGET_LOOKUP, SOURCE_LOOKUP)
target_line_colors = COLORS[target_line_colors_lookup]
Create a ScatterplotLayer
for target points:
target_ring_layer = ScatterplotLayer.from_geopandas(
target_gdf,
get_line_color=target_line_colors,
radius_scale=4000,
pickable=True,
stroked=True,
filled=False,
line_width_min_pixels=2,
extensions=[brushing_extension],
brushing_radius=brushing_radius,
)
Note: the ArcLayer
can't currently be created from a GeoDataFrame because it
needs two point columns, not one. This is a large part of why it's still
marked under the "experimental" module.
Here we pass a numpy array for each point column. This is allowed as long as the shape of the array is (N, 2)
or (N, 3)
(i.e. 2D or 3D coordinates).
value = np.array([arc["value"] for arc in arcs])
get_source_position = np.array([arc["source"] for arc in arcs])
get_target_position = np.array([arc["target"] for arc in arcs])
table = pa.table({"value": value})
arc_layer = ArcLayer(
table=table,
get_source_position=get_source_position,
get_target_position=get_target_position,
get_source_color=SOURCE_COLOR,
get_target_color=TARGET_COLOR,
get_width=1,
opacity=0.4,
pickable=False,
extensions=[brushing_extension],
brushing_radius=brushing_radius,
)
Now we can create a map using these three layers we've created.
As you hover over the map, it should render only the arcs near your cursor.
You can modify brushing_extension.brushing_radius
to control how large the brush is around your cursor.
map_ = Map(layers=[source_layer, target_ring_layer, arc_layer], picking_radius=10)
map_