Use of the Jupyter Ecosystem

Use of the Jupyter Ecosystem#

The Jupyter ecosystem consists of several tools and components that are designed to facilitate interactive computing, data analysis, and visualization. Here’s an overview of some of the main tools in the Jupyter ecosystem and their purposes:

  1. Jupyter Notebook:

    • Purpose: Jupyter Notebook is an open-source web application that allows users to create and share documents containing live code, equations, visualizations, and narrative text. Notebooks support various programming languages, including Python, R, Julia, and more.

    • Features:

      • Supports inline code execution, allowing users to write and execute code cells interactively.

      • Integrates with Markdown for creating formatted text cells (like this one!), enabling the creation of rich, narrative-driven documents.

      • Provides support for embedding multimedia content, such as images (like the image “Visualization A” below), videos (e.g. Visualization B below), and interactive visualizations (such as Visualization C below depicting a map of recent earthquake magnitudes in the state of California in the USA).

      • Allows the creation of interactive widgets for data exploration, parameter tuning, experiment tracking, logging metrics (e.g. TensorBoard) and more.

      • Facilitates collaboration by enabling users to share notebooks via GitHub, etc.

Visualization A Sahara dust storm over Atlantic “Sahara dust storm over Atlantic” captured by a crew onboard the International Space Station (ISS). Photographer: Alex Gerst, from NASA Image and Video Library. Source URL: https://images.nasa.gov/details/iss040e092540.

  1. Jupyter Lab:

    • Purpose: JupyterLab is a web-based interface. It provides an integrated development environment (IDE) for Jupyter notebooks, code editors, terminals, and other interactive components. It supports some cool features such as the following:

      • Offers a flexible layout system with tabs, panes, and multiple panels, allowing users to customize their workspace.

      • Supports multiple file formats, including notebooks, scripts, Markdown files, and more, in a single interface.

      • Provides advanced code editing features, such as syntax highlighting, code completion, and code folding.

      • Integrates with version control systems like Git for collaborative development.

      • Offers a wide range of extensions and plugins to enhance functionality, such as debugging tools, data viewers, and interactive widgets.

  2. Jupyter Book:

    • Purpose: Jupyter Book is a tool for building interactive, web-based books from Jupyter Notebooks (like this website!). It allows authors to combine executable code, text, mathematical equations, and visualizations into a cohesive narrative.

    • More cool features:

      • Enables the creation of interactive online books that incorporate live code execution and interactive widgets.

      • Supports multiple output formats, including HTML, PDF, and ePub, making it easy to share and distribute books.

      • Provides built-in support for version control and continuous integration, ensuring that books stay up-to-date and reproducible.

      • Allows customization of book themes, layouts, and styling to match the author’s preferences.

      • Facilitates collaboration among authors and readers through features like commenting, discussion threads, and annotations.

    • Some great use cases for a Jupyter Book include documentation websites for a package and/or project, as well as a framework for publishing turorials and courses (such as this!).

These tools, along with others in the Jupyter ecosystem, cater to different aspects of interactive computing, data analysis, visualization, and communication. Whether you’re a researcher, data scientist, educator, or developer, the Jupyter ecosystem provides a versatile set of tools for working with data and sharing your findings with others.

More on Jupyter notebooks#

In industry and academia alike, it’s often the case that Jupyter notebooks are used to provide a transparent and interactive way to share code and results. Publishing Jupyter notebooks that detail the data analysis, visualization, and modeling steps is incredibly useful for demonstrating quick tests. Let’s discuss some of the best practices used in industry when publishing Jupyter notebooks.

1. Preparing Your Notebook:

  • Organize your Jupyter notebook to provide a clear and structured narrative of your research or analysis workflow.

  • Include markdown cells to provide explanations, background information, and interpretations of your code and results.

  • Ensure that your notebook is well-commented and follows best practices for code readability and documentation (you can find guidance in the chapter on “Open Access to Data and Code”).

2. Cleaning and Hiding Sensitive Information:

  • Remove or obfuscate any sensitive or confidential information, such as API keys, passwords, or personal data, before publishing your notebook.

  • Consider using tools like the nbstripout package to strip output cells and metadata from your notebook to reduce its size and remove potentially sensitive information.

3. Sharing on Hosting Platforms:

  • Make use of online platforms and services specifically designed for hosting and sharing Jupyter notebooks, such as GitHub and Jupyter Book.

  • accompany with a clear README file and appropriate licensing.

5. Version Control and Collaboration:

  • Leverage version control systems like Git to track changes to your Jupyter notebooks over time and collaborate with others on shared projects.

  • Encourage collaboration and feedback by allowing others to fork, clone, and contribute to your notebook repository through pull requests and issue discussions.

  • Currently, functionality for reviewing Jupyter notebooks in PRs is still an area that can be improved. Changes are difficult to render natively in Github but there are external open source tools that can be integrated to help with this. One such is Notebook Viewer.

6. Providing Examples and Tutorials:

  • Include examples, tutorials, and use cases alongside your published notebooks to help users understand how to use and adapt your code for their own purposes.

  • Markdown cells are critical for providing context, instructions, and guidance on how to execute and interpret the code in your notebook.

7. Engaging with the Community:

  • Share links to your published notebooks on appropriate forums or relevant community channels to reach a broader audience and solicit feedback.

Visualization B The following video is titled “HST Zoom-Way-Out”, credit to Michael McClare, Jake Dean. Obtained from the NASA Image and Video Library. Source URL: https://images.nasa.gov/details/GSFC_20080520_HST_m10217_Zoom_Out.

from IPython.display import HTML

video_url = 'https://images-assets.nasa.gov/video/GSFC_20080520_HST_m10217_Zoom_Out/GSFC_20080520_HST_m10217_Zoom_Out~orig.mp4'

# Construct the HTML code to embed the video
video_html = f"""
<video width="640" height="360" controls>
  <source src="{video_url}" type="video/mp4">
  Your browser does not support the video tag.
</video>
"""

# Display the HTML code to embed the video
HTML(video_html)

Visualization C

The forementioned interactive plot!

from datetime import datetime, timedelta

import geopandas as gpd
import pandas as pd
import plotly.graph_objects as go
import requests

# Calculate the date one week ago from today
END_DATE = datetime.today()
START_DATE = END_DATE - timedelta(days=7)

# Format the dates as strings in the required format for the API URL
START_DATE_STR = START_DATE.strftime('%Y-%m-%dT%H:%M:%S')
END_DATE_STR = END_DATE.strftime('%Y-%m-%dT%H:%M:%S')

# API URL to fetch earthquake data from USGS for California within the last week
URL_CALIFORNIA = f'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&minlatitude=32&maxlatitude=42&minlongitude=-124.5&maxlongitude=-114.1&starttime={START_DATE_STR}&endtime={END_DATE_STR}'

# Set initial center and zoom level for the map
CENTER_LAT = 36.7783
CENTER_LON = -119.4179
ZOOM_LEVEL = 5

# Fetch earthquake data from the API for California and read it into a GeoDataFrame
response_california = requests.get(URL_CALIFORNIA)
data_california = response_california.json()
gdf_california = gpd.GeoDataFrame.from_features(data_california['features'])

# Extract temporal information from the GeoDataFrame
gdf_california['time'] = pd.to_datetime(gdf_california['time'].apply(lambda x: x), unit='ms')

# Convert the datetime to a numerical format
gdf_california['time_numeric'] = gdf_california['time'].apply(lambda x: x.timestamp())

# Normalize the time to be between 0 and 1 for the color scale
gdf_california['time_normalized'] = (gdf_california['time_numeric'] - gdf_california['time_numeric'].min()) / (gdf_california['time_numeric'].max() - gdf_california['time_numeric'].min())

# Create an interactive map plot using Plotly
fig = go.Figure()

# Add a scatter plot layer for the earthquakes in California
fig.add_trace(go.Scattergeo(
    lon = gdf_california['geometry'].x,
    lat = gdf_california['geometry'].y,
    text = gdf_california.apply(lambda row: f'Magnitude: {row["mag"]}<br>Time: {row["time"]}', axis=1),
    mode = 'markers',
    marker = dict(
        size = abs(gdf_california['mag'].apply(lambda x: x)) * 5,  # Adjust marker size based on magnitude
        color = gdf_california['time_normalized'],
        colorscale='icefire',
        colorbar_title='Recency'
    )
))

# Update visualization with map configuration and illustrative preferences
fig.update_geos(
    projection_type="natural earth",  # Set projection type
    landcolor="lightgray",  # Set land color
    oceancolor="lightblue",  # Set ocean color
    showocean=True,
    showland=True,
    showcountries=True,
    showlakes=True,
    showcoastlines=True,
    coastlinecolor="royalblue",  # Set coastline color
    resolution=110,
    center=dict(lon=CENTER_LON, lat=CENTER_LAT),  # Set center of the map
    fitbounds="locations"
)

# Set title and axis labels
fig.update_layout(
    title_text = 'Earthquakes in California (Last Week). Marker size reflecting magnitude.',
)

# Show the plot
fig.show()

For the above visualization, let’s find the time range of what was retrieved.

# Find the minimum and maximum timestamps in the 'time' column
min_time = gdf_california['time'].min()
max_time = gdf_california['time'].max()

# Print the time range
print("Time range of the visualized earthquake data:")
print("From:", min_time)
print("To:", max_time)
Time range of the visualized earthquake data:
From: 2024-06-30 21:00:31.270000
To: 2024-07-07 16:27:01.440000