Delineation Engines¶

Agribound provides seven delineation engines, each suited to different use cases, satellite sources, and hardware configurations.

Engine Comparison¶

Engine	Key	Approach	Strengths	GPU Required	Reference
Delineate-Anything	`delineate-anything`	YOLO instance segmentation (2 model variants)	Fast; resolution-agnostic (1--10 m+); routes through FTW for S2 with native MPS support	Recommended	Lavreniuk et al. (2025)
Fields of The World	`ftw`	Semantic segmentation (14+ models: EfficientNet-B3/B5/B7, UNet, UPerNet)	Strong generalization; 25-country training set; bi-temporal input (planting + harvest); all models via `list_ftw_models()`	Yes	Kerner et al. (2025)
GeoAI Field Boundary	`geoai`	Mask R-CNN instance segmentation	Built-in NDVI support; auto-falls back to CPU on Apple Silicon (MPS). Without fine-tuning on region-specific reference data, GeoAI typically does not delineate any fields	No	Wu (2026)
DINOv3	`dinov3`	DINOv3 ViT backbone (SAT-493M satellite-pretrained) + DPT segmentation head	Satellite-native ViT features pretrained on 493M satellite images; LoRA fine-tuning; resolution-agnostic	Yes	Siméoni et al. (2025)
Prithvi-EO-2.0	`prithvi`	NASA/IBM ViT foundation model (embed / PCA / segment modes)	1024-D ViT embeddings from 6 HLS bands; PCA baseline for comparison. ViT embed mode requires fine-tuning for good results	Recommended (embed); No (PCA)	Szwarcman et al. (2024)
Embedding	`embedding`	Unsupervised clustering of pre-computed embeddings	No GPU needed; no labeled data required	No	Brown et al. (2025), Feng et al. (2025)
Ensemble	`ensemble`	Multi-engine or multi-model consensus (vote / union / intersection)	Best accuracy; supports running same engine with different models	Depends on engines	--

Engine Details¶

Delineate-Anything¶

Instance segmentation based on Ultralytics YOLO (DelineateAnything and DelineateAnything-S), trained on the FBIS-22M dataset. Resolution-agnostic: works across 1 m (NAIP) to 10 m+ (Sentinel-2) imagery.

For Sentinel-2, DA automatically routes through FTW's built-in instance segmentation with proper S2 preprocessing (/3000 normalization) and native MPS (Apple GPU) support. For all other sensors, the standalone DA pipeline with sensor-agnostic percentile normalization is used.

pip install agribound[delineate-anything]

Supported sources: landsat, sentinel2, hls, naip, spot, local

Fine-tuning: Supported (YOLO). Chips are converted to PNG with percentile-normalized uint8 RGB.

Reference: arXiv:2504.02534

Fields of The World (FTW)¶

Semantic segmentation using EfficientNet-B3/B5/B7, UNet, UPerNet, and DeepLabV3+ architectures. Ships with 14+ pre-trained models covering 25 countries. All models are available via agribound.list_ftw_models(). Produces field interior and boundary masks that are then polygonized.

pip install agribound[ftw]

Supported sources: landsat, sentinel2, hls, local

Fine-tuning: Not yet supported (requires paired temporal windows). Pre-trained weights are used directly.

Reference: Fields of The World (FTW) dataset

GeoAI Field Boundary¶

Mask R-CNN instance segmentation from the geoai-py package. Includes built-in NDVI computation for enhanced multi-spectral input.

pip install agribound[geoai]

Supported sources: sentinel2, naip, local

Reference: geoai-py package

Apple Silicon (MPS)

Mask R-CNN is unstable on Apple Silicon GPUs via MPS (Metal Performance Shaders). Metal command buffer errors cause crashes during both training and inference. Agribound automatically detects MPS and falls back to CPU for all GeoAI operations. All other engines (FTW, Delineate-Anything, Prithvi) work correctly on MPS.

DINOv3¶

DINOv3 Vision Transformer backbone with a DPT (Dense Prediction Transformer) segmentation head. Uses LoRA-efficient fine-tuning with a frozen backbone for fast adaptation on reference boundaries. Resolution-agnostic — works across all satellite sources.

pip install agribound[geoai]

Supported sources: landsat, sentinel2, hls, naip, spot, local

Requires fine-tuning: Yes — DINOv3 requires fine-tuning on reference boundaries to produce meaningful field segmentation. Set fine_tune=True with reference_boundaries.

Reference: Siméoni et al. (2025), DINOv3

Prithvi-EO-2.0¶

NASA/IBM foundation model (300M-parameter Vision Transformer) pretrained on HLS imagery with masked autoencoders. Supports three modes:

embed (default) — Extracts 1024-D ViT encoder embeddings from 224×224 patches, then K-means clusters them to delineate fields. Uses all 6 HLS bands (Blue, Green, Red, NIR, SWIR1, SWIR2) with Prithvi's pre-training normalization. GPU recommended. Without fine-tuning, ViT embeddings tend to produce very few, over-merged fields. Fine-tuning on reference boundaries is recommended for production use.
pca — Lightweight baseline that clusters PCA-reduced spectral bands (R, G, B, NIR) without running the ViT encoder. No GPU or transformers needed. Useful for comparison.
segment — Fine-tuned UPerNet decoder via terratorch. Requires a checkpoint from fine-tuning on reference boundaries.

pip install agribound[prithvi]

# ViT embedding mode (default)
agribound.delineate(..., engine="prithvi", engine_params={"mode": "embed"})

# PCA baseline
agribound.delineate(..., engine="prithvi", engine_params={"mode": "pca"})

# Fine-tuned segmentation
agribound.delineate(..., engine="prithvi",
                    engine_params={"mode": "segment", "checkpoint_path": "..."})

Supported sources: landsat, sentinel2, hls, local

Reference: Szwarcman et al. (2024), Prithvi-EO-2.0

Embedding Clustering¶

Unsupervised approach using K-means or spectral clustering on pre-computed pixel embeddings. Does not require a GPU. Designed for use with the Google Satellite Embedding V1 and TESSERA embedding datasets.

pip install agribound                # Google Embeddings (no extra deps)
pip install agribound[tessera]       # TESSERA Embeddings

Supported sources: google-embedding, tessera-embedding

Reference: Google AlphaEarth, TESSERA (Feng et al.)

Ensemble¶

Combines outputs from multiple engines using majority vote or polygon intersection. Runs the specified constituent engines and merges their results to improve robustness.

pip install agribound[all]

Supported sources: landsat, sentinel2, hls, naip, spot, local

When to use ensembles

Ensembles work best when multiple models run on the same sensor data. Each architecture has different biases, and vote-merging cancels out individual errors because every model sees the same pixels.

Ensembles across different sensors (e.g., Sentinel-2 + Landsat + NAIP) do not work well due to resolution mismatch (1 m vs 30 m polygons), temporal mismatch (different overpass dates), and spatial alignment errors. For multi-sensor analysis, compare per-source results independently rather than merging them.

SAM2 is not a standalone engine — it is an optional post-processing step that refines field boundaries. Each polygon's bounding box is fed to SAM2 as a prompt, and SAM2 produces a pixel-accurate mask that replaces the original geometry.

pip install agribound[samgeo]

Recommended usage: Apply SAM2 to the final ensemble output rather than per-engine, since refinement scales linearly with the number of polygons. For large study areas (thousands of fields), use sam_model="tiny" for faster processing.

For single-engine runs, enable via engine_params:

gdf = agribound.delineate(
    ...,
    engine_params={"sam_refine": True, "sam_model": "tiny"},
)

For ensemble workflows, call refine_boundaries() directly on the merged result:

from agribound.engines.samgeo_engine import refine_boundaries

gdf = refine_boundaries(ensemble_gdf, raster_path, config)

SAM2 model variants: "tiny", "small", "base_plus", "large" (default). Batch size configurable via engine_params["sam_batch_size"] (default 100).

Reference: Wu & Osco (2023), Ravi et al. (2024)

When to Use Each Engine¶

Scenario	Recommended Engine
High-resolution imagery (1--6 m), NAIP or SPOT	`delineate-anything`
Sentinel-2 in a country covered by FTW pre-trained models	`ftw`
General-purpose Sentinel-2 or NAIP with NDVI	`geoai`
Fine-tuning on reference boundaries (any sensor)	`dinov3` (SAT-493M)
Multi-temporal Landsat/HLS analysis (6 bands)	`prithvi` (embed mode)
No GPU, no reference data, global coverage	`embedding` + LULC filter
Maximum accuracy, multiple engines on same sensor	`ensemble`

Recommended Workflows¶

Situation	Approach	Example
Reference boundaries available	DINOv3 + SAM2 per source	Example 14
No reference boundaries	Embedding clustering + LULC filter + SAM2	Example 15
Multi-model ensemble	All engines on same sensor, majority vote	Example 12
Multi-year time series	Single engine per year, fine-tune once	Example 01
Quick local test	Delineate-Anything on local GeoTIFF	Example 10

GPU Requirements¶

All engines except embedding require a CUDA-capable GPU for inference. The device configuration parameter controls hardware selection:

config = AgriboundConfig(
    device="auto",  # auto-detect: cuda > mps > cpu
    ...
)

Supported values: auto, cuda, cpu, mps (Apple Silicon).

Warning

Running GPU-required engines on CPU is technically possible but will be extremely slow for anything beyond small test areas.

Delineation Engines¶

Engine Comparison¶

Engine Details¶

Delineate-Anything¶

Fields of The World (FTW)¶

GeoAI Field Boundary¶

DINOv3¶

Prithvi-EO-2.0¶

Embedding Clustering¶

Ensemble¶

SAM2 Boundary Refinement¶

When to Use Each Engine¶

Recommended Workflows¶

GPU Requirements¶