Tutorial 3: Large rasters#

This tutorial demonstrates floodsr with a larger raster and an end-to-end CLI run that fetches HRDEM and writes the super-resolved output in one shot.

What you’ll do:

  • verify the extended install from Extended Install

  • download and visualize a larger test raster

  • run floodsr tohr with --fetch-hrdem

  • inspect both the fetched HRDEM and final output

Install floodsr#

This tutorial requires the extended install described in Extended Install. floodsr uses this two-level installation approach to support as many users as possible:

  • the basic install for those who need something easy to install and dont need the extra machinery for handling large rasters, and

  • the extended install for those more comfortable with Python and who want to use the full power of floodsr for large rasters.

If you followed the instructions for the Basic Install, the simplest way to upgrade is to install a new environment per Extended Install.

And you should probably uninstall the old/previous basic environment to avoid confusion and free up disk space. See Installation for install and uninstall commands for each mode and execution context.

Proceed with Extended Install based on your execution context:

# Colab experimental setup
# !apt-get update -qq
# !apt-get install -y -qq gdal-bin libgdal-dev
# !pip install -q --upgrade pip
# !pip install -q "gdal[numpy]==$(gdal-config --version).*" rasterio geopandas pyproj shapely fiona
# !pip install -q floodsr

Prove the install#

Let’s check the versions now:

import matplotlib
import rasterio
import sys

print(f"matplotlib=={matplotlib.__version__}")
print(f"rasterio=={rasterio.__version__}")
print(f"python={sys.executable}")
matplotlib==3.10.8
rasterio==1.5.0
python=/opt/conda/envs/dev/bin/python
!floodsr --version

If these don’t work, double check you selected correct kernel (e.g., Python (floodsr-gdal)) for this notebook.

Imports#

import os
from pathlib import Path
from urllib.request import urlretrieve

import matplotlib.pyplot as plt
import numpy as np
import rasterio
from rasterio.enums import Resampling

Set Cache#

When rasters become large, caching downloads to disk can be a huge time saver.

This tutorial demonstrates a reusable cache directory so repeated HRDEM fetches and model resolution do not need to start from scratch each time. The size required by the cache (and your working directory for that matter) will of course depend on the size of the rasters you work with. For large rasters, this can be >10 GB.

Edit the below cell if you want floodsr to reuse a different cache directory.

# Edit this cell if you want `floodsr` to reuse a different cache directory.
base_cache_dir = "./_cache"

# windows example. un-comment and edit as needed.
# base_cache_dir = r"C:\Users\<you>\floodsr_cache"
# Resolve the cache path (and create if needed) once so later CLI cells can pass it directly.
base_cache_dir = Path(base_cache_dir).expanduser().resolve()
base_cache_dir.mkdir(parents=True, exist_ok=True)
print(f"cache_dir={base_cache_dir}")
cache_dir=/workspace/_cache

Set Paths#

let’s start by setting some paths to python variables for convenience

# Define the low-res input name and the derived output paths.
lowres_name = "inunriver_historical_000000000WATCH_1980_rp01000.tif"
hrdem_fp = Path(f"{Path(lowres_name).stem}_fetched_hrdem.vrt").resolve()
requested_result_fp = Path("result.tif").resolve()
result_fp = requested_result_fp

print(f"hrdem_fp={hrdem_fp}")
print(f"result_fp={result_fp}")
hrdem_fp=/home/cefect/LS/10_IO/2407_FHIMP/tmp/floodsr-tutorial_3-AurJ8Z/run/inunriver_historical_000000000WATCH_1980_rp01000_fetched_hrdem.vrt
result_fp=/home/cefect/LS/10_IO/2407_FHIMP/tmp/floodsr-tutorial_3-AurJ8Z/run/result.tif

Download Test Data#

For this tutorial, we use a tile from the old Aqueduct Floods Hazard Maps dataset.

A clip of this has been uploaded to our project repo for easy retrieval. Fetch it into your working directory using the urlretrieve utility function again.

lowres_fp = Path(lowres_name).resolve()

urlretrieve(
    "https://github.com/cefect/floodsr/releases/download/v0.0.9/inunriver_historical_000000000WATCH_1980_rp01000.tif",
    lowres_fp,
)
assert lowres_fp.is_file(), f"could not find {lowres_fp}"

This fetches a raster like:

Aqueduct screenshot

We can also plot it to get a sense of the input data.

with rasterio.open(lowres_fp) as src: # no helper function here because we need fancier loading below
    lowres = src.read(1).astype(float)
    if src.nodata is not None:
        lowres[lowres == src.nodata] = np.nan

    # print out some info about the input raster
    print(f"lowres raster: size={Path(lowres_fp).stat().st_size / 1024**2:.1f} MB, shape={src.shape}, res={src.res}, crs={src.crs}")


plt.figure(figsize=(6, 4))
im = plt.imshow(np.where(lowres > 0, lowres, np.nan), cmap="Blues")
plt.colorbar(im, fraction=0.046, pad=0.04)
_ = plt.title(f"Low-res input depth")
 
lowres raster: size=0.0 MB, shape=(30, 60), res=(0.008333333333333333, 0.008333333333333333), crs=EPSG:4326
../_images/eee8f780c2f6ee1e2be3dc2f8fc0bbcccd2f4a208ce06208241b525e7b6ca2db.png

yuck

you’ll notice this raster uses EPSG:4326, which is lat/lon coordinates. Hence the goofy resolution. We’ll want to keep this in mind when we set our --crs-policy.

Fetch HRDEM and Run tohr#

For this larger example, we use one CLI command to fetch HRDEM, resolve the model, and run super-resolution in one shot with the following arguments:

  • --in {lowres_fp} points floodsr at the low-resolution flood-depth raster we downloaded above.

  • -f / --fetch-hrdem tells floodsr to fetch HRDEM directly instead of requiring a local DEM file ahead of time.

  • --fetch-out {hrdem_fp.name} keeps a copy of the fetched HRDEM on disk so we can inspect it in the next section.

  • --fetch-force-tiling forces the HRDEM fetch step to run in tiled windows. This is helpful for larger extents because it avoids trying to assemble the whole DEM in one large read.

  • --cache-dir {base_cache_dir} tells both the fetch step and model-resolution step to reuse the cache directory we configured above.

  • --window-method hard tells the ToHR runtime to use non-overlapping hard windows instead of feathered blending. For this large raster example, we set this explicitly because the large-raster memory-safe path is tied to hard windowing (for now).

  • --crs-policy use-dem tells floodsr to treat the fetched DEM grid as the target grid when the input flood raster and DEM use different coordinate reference systems. In this example, the low-resolution flood raster is geographic (EPSG:4326) while the fetched HRDEM is projected (EPSG:3979).

  • --out result.tif requests a predictable output basename in the working directory. Depending on the active backend, the final artifact may be materialized as either a GeoTIFF or a VRT, and floodsr prints the resolved output path on the last output line.

  • --min-depth-threshold=0.1 masks out very small predicted depths in the final output.

See CLI Reference for the full CLI reference.

# Run the one-shot fetch + super-resolution command through the notebook CLI.
!floodsr tohr --in {lowres_fp} -f --fetch-out {hrdem_fp.name} --fetch-force-tiling --cache-dir {base_cache_dir} --window-method hard --crs-policy use-dem --out {requested_result_fp.name} --min-depth-threshold=0.1

Plot the results#

The previous command wrote the fetched HRDEM to --fetch-out, so we can inspect the DEM tile directly.

To speed up plotting, we use a resampled version of the result. The raster is so large that the plot looks the same when resampled like this; however, be careful not to mix up your plotting work from your analysis work.

with rasterio.open(hrdem_fp) as src:
    preview_scale = max(src.height / 1024, src.width / 1024, 1)
    preview_shape = (
        max(1, int(src.height / preview_scale)),
        max(1, int(src.width / preview_scale)),
    )
    dem = src.read(1, out_shape=preview_shape, resampling=Resampling.nearest).astype(float)
    if src.nodata is not None:
        dem[dem == src.nodata] = np.nan

plt.figure(figsize=(6, 4))
im = plt.imshow(dem, cmap="terrain")
plt.colorbar(im, fraction=0.046, pad=0.04)
plt.title("Fetched HRDEM")
plt.axis("off")
plt.tight_layout()

print(f"DEM full shape: {src.shape}")
print(f"DEM preview shape: {dem.shape}")
DEM full shape: (36698, 41378)
DEM preview shape: (908, 1024)
../_images/291562de7709b5aa52942abd01f41549fbeb5730b6e35a2979e86690e35efb51.png

And now we plot the hi-res result

with rasterio.open(result_fp) as src:
    preview_scale = max(src.height / 1024, src.width / 1024, 1)
    preview_shape = (
        max(1, int(src.height / preview_scale)),
        max(1, int(src.width / preview_scale)),
    )
    result = src.read(1, out_shape=preview_shape, resampling=Resampling.bilinear).astype(float)
    if src.nodata is not None:
        result[result == src.nodata] = np.nan

plt.figure(figsize=(6, 4))
im = plt.imshow(np.where(result > 0, result, np.nan), cmap="Blues")
plt.colorbar(im, fraction=0.046, pad=0.04)
plt.title("Output depth")
plt.axis("off")
plt.tight_layout()

print(f"Output full shape: {src.shape}")
print(f"Output preview shape: {result.shape}")
Output full shape: (36698, 41378)
Output preview shape: (908, 1024)
../_images/4cbd7ec58ca3ae1713ab5f89457fc5c2968d305dcb23f66ee7b7829aecab32c3.png