Skip to content

Core Module

core

Core module for effective precipitation calculations using Google Earth Engine.

This module provides the main :class:EffectivePrecipitation class for calculating effective precipitation from various climate datasets available on Google Earth Engine.

The module supports multiple effective precipitation methods:

  • Ensemble: Mean of 6 methods (default)
  • CROPWAT: FAO CROPWAT method
  • FAO/AGLW: FAO Dependable Rainfall (80% exceedance)
  • Fixed Percentage: Simple fixed percentage method
  • Dependable Rainfall: FAO Dependable Rainfall method
  • FarmWest: FarmWest method
  • USDA-SCS: Soil moisture depletion method (requires AWC and ETo)
  • TAGEM-SuET: Turkish irrigation method (requires ETo)
  • PCML: Physics-Constrained ML for Western U.S. (pre-computed GEE asset)
Example
from pycropwat import EffectivePrecipitation
ep = EffectivePrecipitation(
    asset_id='ECMWF/ERA5_LAND/MONTHLY_AGGR',
    precip_band='total_precipitation_sum',
    geometry_path='study_area.geojson',
    start_year=2015,
    end_year=2020,
    precip_scale_factor=1000,
    method='ensemble'
)
results = ep.process(output_dir='./output', n_workers=4)
See Also

pycropwat.methods : Individual effective precipitation calculation functions. pycropwat.analysis : Post-processing and analysis tools. pycropwat.utils : Utility functions for GEE and file operations.

EffectivePrecipitation

EffectivePrecipitation(asset_id: str, precip_band: str, geometry_path: Optional[Union[str, Path]] = None, start_year: int = None, end_year: int = None, scale: Optional[float] = None, precip_scale_factor: float = 1.0, gee_project: Optional[str] = None, gee_geometry_asset: Optional[str] = None, method: PeffMethod = 'ensemble', method_params: Optional[dict] = None)

Calculate effective precipitation from GEE climate data.

Supports multiple effective precipitation calculation methods including CROPWAT, FAO/AGLW, Fixed Percentage, Dependable Rainfall, FarmWest, and USDA-SCS (which requires AWC and ETo data).

Parameters:

Name Type Description Default
asset_id str

GEE ImageCollection asset ID for precipitation data. Common options:

  • ECMWF/ERA5_LAND/MONTHLY_AGGR (ERA5-Land, global, ~11km),
  • IDAHO_EPSCOR/TERRACLIMATE (TerraClimate, global, ~4km),
  • IDAHO_EPSCOR/GRIDMET (GridMET, CONUS, ~4km),
  • OREGONSTATE/PRISM/AN81m (PRISM, CONUS, ~4km),
  • UCSB-CHG/CHIRPS/DAILY (CHIRPS, 50°S-50°N, ~5km),
  • NASA/GPM_L3/IMERG_MONTHLY_V06 (GPM IMERG, global, ~11km).
required
precip_band str

Name of the precipitation band in the asset. Examples:

  • ERA5-Land: total_precipitation_sum
  • TerraClimate: pr
  • GridMET: pr
  • PRISM: ppt
  • CHIRPS: precipitation
  • GPM IMERG: precipitation
required
geometry_path str, Path, or None

Path to shapefile or GeoJSON file defining the region of interest. Can also be a GEE FeatureCollection asset ID. Set to None if using gee_geometry_asset instead.

None
start_year int

Start year for processing (inclusive).

None
end_year int

End year for processing (inclusive).

None
scale float

Output resolution in meters. If None (default), uses native resolution of the dataset.

None
precip_scale_factor float

Factor to convert precipitation to mm. Default is 1.0. Common values: ERA5-Land (m to mm) = 1000, TerraClimate = 1.0, GridMET = 1.0.

1.0
gee_project str

GEE project ID for authentication. Required for cloud-based GEE access.

None
gee_geometry_asset str

GEE FeatureCollection asset ID for the region of interest. Takes precedence over geometry_path if both are provided.

None
method str

Effective precipitation calculation method. Default is 'ensemble'. Options:

  • 'ensemble' - Mean of 6 methods (default, requires AWC and ETo)
  • 'cropwat' - CROPWAT method (FAO standard)
  • 'fao_aglw' - FAO Dependable Rainfall (80% exceedance)
  • 'fixed_percentage' - Simple fixed percentage method
  • 'dependable_rainfall' - FAO Dependable Rainfall method
  • 'farmwest' - FarmWest method
  • 'usda_scs' - USDA-SCS soil moisture depletion method (requires AWC and ETo data via method_params)
  • 'suet' - TAGEM-SuET method (Turkish Irrigation Management System) (requires ETo data via method_params)
  • 'pcml' - Physics-Constrained ML (Western U.S. only, Jan 2000 - Sep 2024) Uses default GEE asset: projects/ee-peff-westus-unmasked/assets/effective_precip_monthly_unmasked
'ensemble'
method_params dict

Additional parameters for the selected method:

For 'fixed_percentage': - percentage (float): Fraction 0-1. Default 0.7.

For 'dependable_rainfall': - probability (float): Probability level 0.5-0.9. Default 0.75.

For 'usda_scs': - awc_asset (str): GEE Image asset ID for AWC data. Required. U.S.: projects/openet/soil/ssurgo_AWC_WTA_0to152cm_composite Global: projects/sat-io/open-datasets/FAO/HWSD_V2_SMU - awc_band (str): Band name for AWC. Default 'AWC'. - eto_asset (str): GEE ImageCollection asset ID for ETo. Required. U.S.: projects/openet/assets/reference_et/conus/gridmet/monthly/v1 Global: projects/climate-engine-pro/assets/ce-ag-era5-v2/daily - eto_band (str): Band name for ETo. Default 'eto'. U.S. (GridMET): 'eto', Global (AgERA5): 'ReferenceET_PenmanMonteith_FAO56' - eto_is_daily (bool): Whether ETo is daily. Default False. Set True for AgERA5 daily data. - eto_scale_factor (float): Scale factor for ETo. Default 1.0. - rooting_depth (float): Rooting depth in meters. Default 1.0. - mad_factor (float): Management Allowed Depletion factor (0-1). Controls what fraction of soil water storage is available. Default 0.5.

None

Attributes:

Name Type Description
geometry Geometry

The loaded geometry for the region of interest.

collection ImageCollection

The filtered and scaled precipitation image collection.

bounds list

Bounding box coordinates of the geometry.

Examples:

Basic usage with Ensemble method (default):

from pycropwat import EffectivePrecipitation
ep = EffectivePrecipitation(
    asset_id='ECMWF/ERA5_LAND/MONTHLY_AGGR',
    precip_band='total_precipitation_sum',
    geometry_path='roi.geojson',
    start_year=2015,
    end_year=2020,
    precip_scale_factor=1000
)
ep.process(output_dir='./output', n_workers=4)

Using GEE FeatureCollection asset:

ep = EffectivePrecipitation(
    asset_id='ECMWF/ERA5_LAND/MONTHLY_AGGR',
    precip_band='total_precipitation_sum',
    gee_geometry_asset='projects/my-project/assets/study_area',
    start_year=2015,
    end_year=2020,
    precip_scale_factor=1000,
    gee_project='my-gee-project'
)

Using FAO/AGLW method:

ep = EffectivePrecipitation(
    asset_id='IDAHO_EPSCOR/TERRACLIMATE',
    precip_band='pr',
    geometry_path='study_area.geojson',
    start_year=2000,
    end_year=2020,
    method='fao_aglw'
)

Using fixed percentage method (80%):

ep = EffectivePrecipitation(
    asset_id='IDAHO_EPSCOR/GRIDMET',
    precip_band='pr',
    geometry_path='farm.geojson',
    start_year=2010,
    end_year=2020,
    method='fixed_percentage',
    method_params={'percentage': 0.8}
)

Using USDA-SCS method with AWC and ETo data:

ep = EffectivePrecipitation(
    asset_id='ECMWF/ERA5_LAND/MONTHLY_AGGR',
    precip_band='total_precipitation_sum',
    geometry_path='arizona.geojson',
    start_year=2015,
    end_year=2020,
    precip_scale_factor=1000,
    method='usda_scs',
    method_params={
        'awc_asset': 'projects/my-project/assets/soil_awc',
        'awc_band': 'AWC',
        'eto_asset': 'IDAHO_EPSCOR/GRIDMET',
        'eto_band': 'eto',
        'eto_is_daily': True,
        'rooting_depth': 1.0
    }
)
See Also

pycropwat.methods : Individual effective precipitation calculation functions. pycropwat.analysis : Post-processing and analysis tools.

Source code in pycropwat/core.py
def __init__(
    self,
    asset_id: str,
    precip_band: str,
    geometry_path: Optional[Union[str, Path]] = None,
    start_year: int = None,
    end_year: int = None,
    scale: Optional[float] = None,
    precip_scale_factor: float = 1.0,
    gee_project: Optional[str] = None,
    gee_geometry_asset: Optional[str] = None,
    method: PeffMethod = 'ensemble',
    method_params: Optional[dict] = None,
):
    self.asset_id = asset_id
    self.precip_band = precip_band
    self.geometry_path = geometry_path
    self.gee_geometry_asset = gee_geometry_asset
    self.start_year = start_year
    self.end_year = end_year
    self.scale = scale  # None means use native resolution
    self.precip_scale_factor = precip_scale_factor
    self.gee_project = gee_project
    self.method = method
    self.method_params = method_params or {}

    # Get the effective precipitation function
    self._peff_function = get_method_function(method)

    # USDA-SCS specific: cache for AWC data (loaded once)
    self._awc_cache = None

    # Input directory for saving downloaded data (set during process())
    self._input_dir = None

    # Check if this is PCML method (uses single multi-band Image instead of ImageCollection)
    self._is_pcml = (method == 'pcml' or self.precip_band == PCML_DEFAULT_BAND)

    # For PCML, use default asset if placeholder provided
    if self._is_pcml:
        if self.asset_id == 'PLACEHOLDER' or self.asset_id is None:
            self.asset_id = PCML_DEFAULT_ASSET
            logger.info(f"Using default PCML asset: {self.asset_id}")
        self.precip_band = PCML_DEFAULT_BAND

    # Validate that at least one geometry source is provided (not required for PCML)
    if geometry_path is None and gee_geometry_asset is None and not self._is_pcml:
        raise ValueError("Either geometry_path or gee_geometry_asset must be provided")

    # Initialize GEE
    initialize_gee(self.gee_project)

    # For PCML, use the asset's own geometry if no geometry provided
    if self._is_pcml and geometry_path is None and gee_geometry_asset is None:
        # Load PCML image first
        self._pcml_image = ee.Image(self.asset_id)
        # Use predefined Western U.S. bounding box since PCML image geometry is unbounded
        self.geometry = ee.Geometry.Polygon([PCML_WESTERN_US_BOUNDS])
        self.bounds = PCML_WESTERN_US_BOUNDS
        logger.info("Using predefined Western U.S. bounding box for PCML")
    else:
        # Load geometry from GEE asset or local file
        self.geometry = load_geometry(geometry_path, gee_asset=gee_geometry_asset)
        self.bounds = self.geometry.bounds().getInfo()['coordinates'][0]

    # Get date range
    self.start_date, self.end_date = get_date_range(start_year, end_year)

    # Load and filter image collection (or load PCML image)
    self._load_collection()

cropwat_effective_precip staticmethod

cropwat_effective_precip(pr: ndarray) -> np.ndarray

Calculate CROPWAT effective precipitation.

Parameters:

Name Type Description Default
pr ndarray

Precipitation in mm.

required

Returns:

Type Description
ndarray

Effective precipitation in mm.

Source code in pycropwat/core.py
@staticmethod
def cropwat_effective_precip(pr: np.ndarray) -> np.ndarray:
    """
    Calculate CROPWAT effective precipitation.

    Parameters
    ----------

    pr : np.ndarray
        Precipitation in mm.

    Returns
    -------
    np.ndarray
        Effective precipitation in mm.
    """
    ep = np.where(
        pr <= 250,
        pr * (125 - 0.2 * pr) / 125,
        0.1 * pr + 125
    )
    return ep

process

process(output_dir: Union[str, Path], n_workers: int = 4, months: Optional[List[int]] = None, input_dir: Optional[Union[str, Path]] = None, save_inputs: bool = False) -> List[Tuple[Optional[Path], Optional[Path]]]

Process all months and save effective precipitation rasters.

Downloads precipitation data from Google Earth Engine, calculates effective precipitation using the configured method, and saves results as GeoTIFF files. Uses Dask for parallel processing of multiple months.

Parameters:

Name Type Description Default
output_dir str or Path

Directory to save output rasters. Will be created if it doesn't exist.

required
n_workers int

Number of parallel workers for Dask. Default is 4. Set to 1 for sequential processing.

4
months list of int

List of months to process (1-12). If None, processes all months in the date range. Useful for seasonal analyses.

None
input_dir str or Path

Directory to save downloaded input data (precipitation, AWC, ETo). If None and save_inputs is True, uses output_dir/../analysis_inputs.

None
save_inputs bool

Whether to save downloaded input data as GeoTIFF files. Default is False. Useful for debugging or further analysis.

False

Returns:

Type Description
list of tuple

List of tuples containing paths to saved files: (effective_precip_path, effective_precip_fraction_path). Returns (None, None) for months that failed to process.

Notes

Output files are named:

  • effective_precip_YYYY_MM.tif - Effective precipitation in mm
  • effective_precip_fraction_YYYY_MM.tif - Effective/total ratio (non-PCML methods)
  • effective_precip_fraction_YYYY.tif - Annual (water year) fraction (PCML method only)

For the USDA-SCS method, AWC and ETo data are automatically downloaded and cached for efficiency.

Examples:

Process all months in parallel:

ep = EffectivePrecipitation(...)
results = ep.process(output_dir='./output', n_workers=8)

Process only summer months:

results = ep.process(
    output_dir='./output',
    months=[6, 7, 8]  # June, July, August
)

Save input data for debugging:

results = ep.process(
    output_dir='./output',
    save_inputs=True,
    input_dir='./inputs'
)
See Also
process_sequential: Sequential processing for debugging.
Source code in pycropwat/core.py
def process(
    self,
    output_dir: Union[str, Path],
    n_workers: int = 4,
    months: Optional[List[int]] = None,
    input_dir: Optional[Union[str, Path]] = None,
    save_inputs: bool = False
) -> List[Tuple[Optional[Path], Optional[Path]]]:
    """
    Process all months and save effective precipitation rasters.

    Downloads precipitation data from Google Earth Engine, calculates
    effective precipitation using the configured method, and saves
    results as GeoTIFF files. Uses Dask for parallel processing of
    multiple months.

    Parameters
    ----------

    output_dir : str or Path
        Directory to save output rasters. Will be created if it
        doesn't exist.

    n_workers : int, optional
        Number of parallel workers for Dask. Default is 4.
        Set to 1 for sequential processing.

    months : list of int, optional
        List of months to process (1-12). If None, processes all months
        in the date range. Useful for seasonal analyses.

    input_dir : str or Path, optional
        Directory to save downloaded input data (precipitation, AWC, ETo).
        If None and save_inputs is True, uses ``output_dir/../analysis_inputs``.

    save_inputs : bool, optional
        Whether to save downloaded input data as GeoTIFF files.
        Default is False. Useful for debugging or further analysis.

    Returns
    -------
    list of tuple
        List of tuples containing paths to saved files:
        ``(effective_precip_path, effective_precip_fraction_path)``.
        Returns ``(None, None)`` for months that failed to process.

    Notes
    -----
    Output files are named:

    - ``effective_precip_YYYY_MM.tif`` - Effective precipitation in mm
    - ``effective_precip_fraction_YYYY_MM.tif`` - Effective/total ratio (non-PCML methods)
    - ``effective_precip_fraction_YYYY.tif`` - Annual (water year) fraction (PCML method only)

    For the USDA-SCS method, AWC and ETo data are automatically downloaded
    and cached for efficiency.

    Examples
    --------
    Process all months in parallel:

    ```python
    ep = EffectivePrecipitation(...)
    results = ep.process(output_dir='./output', n_workers=8)
    ```

    Process only summer months:

    ```python
    results = ep.process(
        output_dir='./output',
        months=[6, 7, 8]  # June, July, August
    )
    ```

    Save input data for debugging:

    ```python
    results = ep.process(
        output_dir='./output',
        save_inputs=True,
        input_dir='./inputs'
    )
    ```

    See Also
    --------
        process_sequential: Sequential processing for debugging.
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    # Set up input directory for saving downloaded data
    if save_inputs:
        if input_dir is not None:
            self._input_dir = Path(input_dir)
        else:
            # Default: parallel to output_dir in analysis_inputs
            self._input_dir = output_dir.parent / 'analysis_inputs' / output_dir.name
        self._input_dir.mkdir(parents=True, exist_ok=True)
        logger.info(f"Input data will be saved to: {self._input_dir}")
    else:
        self._input_dir = None

    # Generate list of (year, month) to process
    all_dates = get_monthly_dates(self.start_year, self.end_year)

    if months is not None:
        all_dates = [(y, m) for y, m in all_dates if m in months]

    logger.info(f"Processing {len(all_dates)} months with {n_workers} workers")

    # Create delayed tasks
    tasks = [
        delayed(self._process_single_month)(year, month, output_dir)
        for year, month in all_dates
    ]

    # Execute in parallel with progress bar
    with ProgressBar():
        results = compute(*tasks, num_workers=n_workers)

    return list(results)

process_sequential

process_sequential(output_dir: Union[str, Path], months: Optional[List[int]] = None, input_dir: Optional[Union[str, Path]] = None, save_inputs: bool = False) -> List[Tuple[Optional[Path], Optional[Path]]]

Process all months sequentially (useful for debugging).

Same as :meth:process but without parallel processing. Useful for debugging issues, testing on small datasets, or when GEE rate limits are a concern.

Parameters:

Name Type Description Default
output_dir str or Path

Directory to save output rasters. Will be created if it doesn't exist.

required
months list of int

List of months to process (1-12). If None, processes all months in the date range.

None
input_dir str or Path

Directory to save downloaded input data (precipitation, AWC, ETo). If None and save_inputs is True, uses output_dir/../analysis_inputs.

None
save_inputs bool

Whether to save downloaded input data. Default is False.

False

Returns:

Type Description
list of tuple

List of tuples containing paths to saved files: (effective_precip_path, effective_precip_fraction_path). Returns (None, None) for months that failed to process.

Examples:

Debug a single month:

ep = EffectivePrecipitation(...)
results = ep.process_sequential(
    output_dir='./output',
    months=[1]  # Process only January
)
See Also
process: Parallel processing method (recommended for production).
Source code in pycropwat/core.py
def process_sequential(
    self,
    output_dir: Union[str, Path],
    months: Optional[List[int]] = None,
    input_dir: Optional[Union[str, Path]] = None,
    save_inputs: bool = False
) -> List[Tuple[Optional[Path], Optional[Path]]]:
    """
    Process all months sequentially (useful for debugging).

    Same as :meth:`process` but without parallel processing. Useful for
    debugging issues, testing on small datasets, or when GEE rate limits
    are a concern.

    Parameters
    ----------

    output_dir : str or Path
        Directory to save output rasters. Will be created if it
        doesn't exist.

    months : list of int, optional
        List of months to process (1-12). If None, processes all months
        in the date range.

    input_dir : str or Path, optional
        Directory to save downloaded input data (precipitation, AWC, ETo).
        If None and save_inputs is True, uses ``output_dir/../analysis_inputs``.

    save_inputs : bool, optional
        Whether to save downloaded input data. Default is False.

    Returns
    -------
    list of tuple
        List of tuples containing paths to saved files:
        ``(effective_precip_path, effective_precip_fraction_path)``.
        Returns ``(None, None)`` for months that failed to process.

    Examples
    --------
    Debug a single month:

    ```python
    ep = EffectivePrecipitation(...)
    results = ep.process_sequential(
        output_dir='./output',
        months=[1]  # Process only January
    )
    ```

    See Also
    --------
        process: Parallel processing method (recommended for production).
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    # Set up input directory for saving downloaded data
    if save_inputs:
        if input_dir is not None:
            self._input_dir = Path(input_dir)
        else:
            # Default: parallel to output_dir in analysis_inputs
            self._input_dir = output_dir.parent / 'analysis_inputs' / output_dir.name
        self._input_dir.mkdir(parents=True, exist_ok=True)
        logger.info(f"Input data will be saved to: {self._input_dir}")
    else:
        self._input_dir = None

    all_dates = get_monthly_dates(self.start_year, self.end_year)

    if months is not None:
        all_dates = [(y, m) for y, m in all_dates if m in months]

    results = []
    for year, month in all_dates:
        result = self._process_single_month(year, month, output_dir)
        results.append(result)

    return results