TiTiler-CMR Compatibility Report¶
What Datasets are Compatible with TiTiler-CMR?¶
This report details TiTiler-CMR compatible and incompatible datasets.
What is tested?¶
TiTiler-CMR has /tiles and /statistics groups of endpoints, with /timeseries/statistics being an expansion of the /statistics group. The /tiles and /statistics endpoints read data from source files. Some files work with TiTiler-CMR and some do not; the reasons are detailed in this report. While datasets compatible with tiling are highly likely to also be compatible with statistics, this report only covers tiling compatibility.
This report provides:
- A searchable listing of NASA EOSDIS datasets and whether they have been found to be compatible or incompatible with TiTiler-CMR's tiling endpoints.
- An overview of the types of datasets that were to be compatible and incompatible.
See titiler-cmr-compatibility's METHODOLOGY documentation for a detailed explanation of how datasets were tested.
What datasets are compatible?¶
Out of 10047 datasets tested, 719 were found to be compatible.
Searchable Dataset Table¶
To search across all datasets (where a granule was found) and check compatibility status, view the interactive table:
→ View Interactive Dataset Table
To determine if a particular dataset is compatible, search for it by name and check the Tiling Compatible column.
To regenerate the table, run the generate_table.py script.
What types of datasets are compatible?¶
- GES_DISC, PODAAC, ORNLDAAC and OBDAAC have the most compatible datasets.
- Most compatible datasets are processing level 3 or 4.
- Most compatible datasets are NetCDF-4 format, however ~200 are Cloud-Optimized GeoTIFFs (COGs).
Now for some 🥧 charts.
Compatible Datasets by Data Center¶
Percentage of compatible datasets by data center¶
| Data Center | % Tiling-Compatible Datasets | |
|---|---|---|
| 0 | ASDC | 2.22% |
| 1 | ASF | 0.00% |
| 2 | GES_DISC | 10.09% |
| 3 | GHRCDAAC | 4.27% |
| 4 | LAADS | 0.00% |
| 5 | LPDAAC | 5.28% |
| 6 | NSIDC | 6.57% |
| 7 | OBDAAC | 13.79% |
| 8 | ORNLDAAC | 7.29% |
| 9 | PODAAC | 18.35% |
| 10 | SEDAC | 0.00% |
Compatible Datasets by Processing Level¶
Compatible Datasets by File Extension¶
Not all datasets have the format attribute, so we assess format using the extension attribute.
What datasets are incompatible and why?¶
Only about 7% of datasets were found to be compatible. What's up with the other 93%?
The most common reason that datasets are deemed incompatible is because of an unsupported format.
Unsupported Formats¶
A majority of collections use an unsupported format. Supported formats and extensions were whitelisted, see titiler-cmr-compatibility's constants.py.
To determine if a collection had files of a supported or unsupported format, titiler-cmr-compatibility uses collection and granule metadata. If no format property is present in the metadata, the file extension of the random sample granule is used.
Collections were classified as having an unsupported format when, in this order:
- The format was explicitly defined in the metadata and that format was not in the list of supported formats.
- If no format was specified in the metadata, the file extension was extracted and compared against the list of supported extensions.
- Since not all file extensions fully describe the file format (for example,
.tifcould be a cloud-optimized or not-cloud-optimized GeoTIFF, and.nccould be NetCDF-4 or an earlier NetCDF version), the code proceeds to opening files which have an ambiguous file format. The code attempts to open all files with a file extension in the list of supported extensions. Sometimes this resulted in errors including the sub-stringsis not the signature of a valid netCDF4 fileorCannot seek streaming HTTP file, indicating unsupported formats.
No X,Y Dimensions¶
The second most common reason datasets were deemed incompatible was titiler.xarray.io could not determine the x and y dimensions from the dimensions of the dataset. This represents a data characteristic issue rather than a technical limitation, as x, y dimensions are necessary to visualize data on a map.
However, many of these datasets are characterised as processing level 3 or 4. Further investigation could be warranted.
Group Structure¶
Many HDF5 and some NetCDF-4 datasets use a group (aka hierarchical structure). At the time of writing, files with groups are not supported by TiTiler-CMR, but support could easily be added in the near future.
Other Incompatibility Issues¶
The rest of the datasets were deemed to be incompatible for the following reasons:
- No granules were returned from the CMR granules endpoint using the collection concept id.
- An error was thrown when attempting to open the file with xarray or rasterio. These errors, apart from those listed above, were often caused by various data-type related issues.
ForbiddenorUnauthorizederrors.- Timing out (each dataset was given 30 seconds to process).
- Out of memory (OOM) errors: these datasets are not represented in the results as OOM errors cause the lambda function to crash.
Timing out and OOM errors both signal that these datasets are potentially too large to enable dynamic tiling in an acceptable responsive time frame.
Summary and Next Steps¶
While this process provides a nice listing and number of datasets that are compatible with TiTiler-CMR, individual datasets will still require testing and assessment. However, we are confident that compatible datasets could be identified as such through metadata.
Next steps:
- Support grouped datasets in TiTiler-CMR and re-run assessment.
- Discuss how to "publish" or otherwise advertise datasets that can be used with TiTiler-CMR.