Skip to content

Project Configuration

Project Organisation

To run a TopoPyScale project, you need to have the following file structure:

my_project/
    ├── inputs/
        ├── dem/ 
            ├── my_dem.tif
            ├── my_dem_mask.tif     (OPTIONAL: to mask part of the DEM)
            ├── my_dem_groups.tif   (OPTIONAL: to delineate groups of clusters)
            └── pts_list.csv        (OPTIONAL: to downscale to specific points)
        └── climate/
            ├── daily/
            ├── ERA5.zarr           (OPTIONAL: to downscale with zarr optimization)
            └── yearly/
                ├── PLEV*.nc
                └── SURF*.nc
    ├── outputs/
            ├── tmp/
            └── downscaled/
    ├── pipeline.py (OPTIONAL: script for the downscaling instructions)
    └── config.yml

TopoPyScale will automatically generates the inputs/ and outputs/ folder structure in which climatic and topographic forcings will be stored. Then, TopoPyScale is implemented with the assumption that the Python console will be open from the root path of my_project/

File config.yml

The configuration file contains all parameters needed to run a downscaling job. It includes general information about the job, as well as specific routine and values. Examples of config.yml file can be found in the repositoryTopoPyScale_examples.

The configuration consists of a YAML file, which is a common standard for storing configurations. Further help on YAML syntax can be found here, and the Python packages pyyaml and Munch allows to interact with such kind of file. Be aware that YAML is indent sensitive.

For TopoPyScale, the configuration file must contain at least the following:

project:
  name: Name of the project
  description: Describe the project
  authors:
    - Author 1 (can add contact and affiliation here)
    - Author 2
    - Author 3
  date: Date at which the project is run. This is metadata
  directory: /path/to/project/

  # start and end date of the timeperiod of interest
  start: 2018-01-01
  end: 2018-01-31
  split:
    IO: False       # Flag to split downscaling in time or not
    time: 2         # number of years to split timeline in
    space: None     # NOT IMPLEMENTED

  # indicate which climate data to use. Currently only era5 available (see climate section below)
  climate: era5

  # This is for the option of fetching DEM with API (NOT YET SUPPORTED)
  extent:

  # method and setting for parallelizing (for clustering, solar geometry (multicore only), and downscaling (multicore and dask))
  parallelization:
    downscaling_method: dask                  # multicore or dask. Multicore is using the Python library multiprocessing
    setting:      
        multicore:
            CPU_cores: 6                      # number of core to use (clustering, solar geometry and downscaling)
        dask:                                 # Options to use Dask for Dowsncaling. See dask documentation (https://docs.dask.org)
            n_workers: 6                      # number of workers
            threads_per_worker: 1             # number of threads per worker 
            memory_target_fraction: 0.95      # fraction of memory to use
            memory_limit: 1.2GB               # max memory usage per worker.




#.....................................................................................................
climate:
  # For now TopoPyScale only supports ERA5-reanalysis input climate data
  precip_lapse_rate: False     # Apply precipitation lapse-rate correction (currently valid for Northern Hemisphere only)

  # Settings for the ERA5 atmospheric forcings
  era5:
    path: inputs/climate/   # Can either be a absolute path or relative to the project directory
    product: reanalysis     # ensemble not available yet
    timestep: 1H            # 1H, 3H 6H or else.

    # Choose pressure levels relevant to your project and evailable in ERA5 Pressure Levels
    plevels: [ 700,750,775,800,825,850,875,900,925,950,975,1000 ]
    download_threads: 1               # Number of threads to request downloads with cdsapi
    realtime: False                   # (Optional) Forces redownload of latest month of ERA5 data upon each run of code (allows daily updates for realtime applications)
    data_repository: cds              # repository from where to download data: cds (copernicus official ERA5), google_cloud_storage (Google archive of ERA5)
    cds_output_format: netcdf         # netcdt or grib. Grib is not supported by topoclass
    cds_download_format: unarchived   # unarchived or zip
    rm_daily: False                   # remove 
    zarr_store: ERA5.zarr             # name of the zarr store containing the ERA5 data (local store. currently not yet compatible with remote store)

#.....................................................................................................
dem:
  path: C:/GIS/DEMs                       # (optional) Absolute path where the DEM file is stored
  file: myDEM.tif                         # Name of the dem file. Must be a raster.
  epsg: 32632                             # projection EPSG code
  horizon_increments: 10                  # horizon increment angle in degrees
  solar_position_method: nrel_numpy       # (optional) method to compute solar_geom with pvlib libraries.  

#.....................................................................................................
sampling:

  # choose downscaling using dem segmentation 'toposub' or a list of points 'points'. Possible values: toposub, points
  method: toposub

  # In case method == 'points', indicate a file with a list of points and the point coordinate projection EPSG code
  points:
    csv_file: pt_list.csv               # filename of list of points
    epsg: 4326                          # EPSG code of the points (x,y) coordinates in file
    name_column: pt_name                # (optional) column containing point_name. If not provided, point_name will be automatically assigned

  # In case method == 'toposub'
  toposub:
    clustering_method: minibatchkmean   # clustering method available: kmean, minibatchkmean
    n_clusters: 50                      # number of cluster to segment the DEM
    random_seed: 2                      # random seed for the K-mean clustering 
    clustering_features: { 'x': 1, 'y': 1, 'elevation': 1, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1 }  # dictionnary of the features of choice to use in clustering with their relative importance. Relative importance is a multiplier after scaling
    clustering_mask: inputs/dem/catchment_mask.tif      # optional relative path to a .tif containing a mask (0/1)
    clustering_groups: inputs/dem/groups.tif            # optional relative path to a .tif containing cluster groups (int values), e.g. land cover
    clustering_group_weights: inputs/dem/gr_wgths.csv   # optional relative path to a .csv file in /inputs/dem/ containing the columns ['group', 'weights'] indicating the relative number of clusters per group. If this file does not exist, each group has a number of cluster proportional to its relative area. 

#.....................................................................................................
toposcale:
  interpolation_method: idw               # interpolation methods available: linear or idw
  LW_terrain_contribution: True           # (bool)    Turn ON/OFF terrain contribution to longwave

#.....................................................................................................
outputs:
  directory: outputs                    # (optional) absolute path where to store the final downscaled products.
  variables: all                        # list of variables to export in netcdf. ['t','p','SW']. Default None or all
  file:
    clean_outputs: False                # (bool)    remove the entire outputs/ directory prior to downscaling
    clean_FSM: True                     # (bool)    remove the entire sim/ directory
    df_centroids: df_centroids.pck      # (pickle)  dataframe containing the points of interest with their topographic features
    ds_param: ds_param.nc               # (netcdf)  topographic parameters (slope, aspect, etc.)
    ds_solar: ds_solar.nc               # (netcdf)  solar geometry
    da_horizon: da_horizon.nc           # (netcdf)  horizon angles
    landform: landform.tif              # (geotiff) rasters of of cluster labels, [TopoSub]
    downscaled_pt: down_pt_*.nc         # (netcdf)  filename of the downscaled timeseries
    zarr_store: down.zarr               # (zarr)    name of the zarr store for the downscaled timeseries (optional, only working with Dask)

clean_up:
  rm_tmp_dirs: True                   # (optional: bool) remove the created tmp directories after downscaling?

The file config.yml is parsed by TopoPyScale at the time the class topoclass('config.yml') is created.

Possible values of configurations

Project

Field Example Value Required Possible Values Description
name Finse yes string metadata: Name of the project
description Downscaling for Finse yes string metadata: Description of the project
authors yes list of strings metadata: name of downscaling authors
date Nov 2021 yes string metadata: creation date of the downscaling
directory ./path_to_my_project/ no string path where the downscaling project is located. If empty, default will be python current working directory
start 2018-10-01 yes %Y-%m-%d start date of the downscaling. Currently must date available in ERA5 dataset
end 2018-12-31 yes %Y-%m-%d end date of the downscaling. Currently must date available in ERA5 dataset. FYI, TopoPyScale downsloads ERA5 data monthly.
split
IO False yes True, False Use True to split in time the downscaling project in case of long timeseries. This is decrease memory usage.
time 1 only if split.IO is True Number of years to chunck climate data timeseries
space None no not yet implemented
extent None no not yet implemented
climate era5 yes era5 source of climate data. ERA5 is the only supported dataset at the moment
parallelization Settings and method to parallelize
downscaling_method multicore yes multicore, dask method to parallelize downscaling
setting
multicore yes
CPU_cores 4 yes integer Number of cores to use
dask no, only if using dask
n_workers 6 integer number of workers
threads_per_worker 1 integer number of threads per worker
memory_target_fraction 0.95 float (0-1) fraction of memory to use
memory_limit 1.2GB string max memory usage per worker

Climate

Field Example Value Required Possible Values Description
precip_lapse_rate True y True, False Apply precipitation lapse rate
era5 As of now TopoPyScale only supports ERA5 data
path inputs/climate/ y string path to store climate data (either relative to the project directory or absolute path possible)
product reanalysis y reanalysis no other product available at the moment.
timestep 1H y 1H timestep to run TopoPyScale. Currently only 1H available
plevels [700,800,900,1000] y array Indicate ERA5 pressure level to use. The lower pressure level must be higher than the highest elevation of the DEM
download_threads 12 y integer Number of downloading threads to use
data_repository google_cloud_storage y string Indicate which data repositoryt to download data from: 'cds or google_cloud_storage
cds_output_format netcdf n string indicate file format CDS will deliver. netcdf or grib
cds_download_format unarchived n string indicate download format CDS will deliver. unarchived or zip
rm_daily False n string remove or not daily downloads. To save storage after download
realtime False n True, False Upon each new run of code redownloads latest month (ERA5T) to obtain daily updates of partial months.
zarr_store ERA5.zarr no string name of the zarr store containing the ERA5 data (local store. currently not yet compatible with remote store)

dem

Field Example Value Required Possible Values Description
path C:/GIS/DEMs no str (absolute path) vAbsolute path where the DEM file is stored
file ASTER_Finse.tif yes *.tif filename of the DEM
epsg 32632 yes EPSG CRS projection code EPSG CRS projection code of the DEM geoTiff
horizon_increments 10 yes 1-90 sector angle to compute horizon angle. Unit: degree.
solar_position_method nrel_c no See methods of pvlib pvlib can use different method to compute sun position. May require specific installation

Sampling

Field Example Value Required Possible Values Description
method toposub yes toposub, points choice to run dowscaling for a list of points in a .csv file, or for a spatial job using toposub clustering method
points
csv_file station_list.csv only if method is points *.csv, *.txt name of the .csv file containing the list of points. must contain at least the fields x,y
epsg 4326 only if method is points EPSG CRS projection code EPSG CRS projection code of the coordinate x,y provided in the .csv file
name_column pt_name no All column names of the csv file Name of the column containing a ID of the points (unique!). This Id can be number or string. It will be used to name the downscaled data at the points.
toposub
clustering_method minibatchkmean only if method is toposub kmean, minibatchkmean clustering method. minibatchkmean is parallelized and lot faster. See scikit-learn documentation.
n_clusters 10 only if method is toposub integer number of cluster k-mean will segement DEM by
random_seed 2 only if method is toposub integer random seed to use in k-mean
clustering_features {'x':1, 'y':1, 'elevation':4, 'slope':1, 'aspect_cos':1, 'aspect_sin':1, 'svf':1} only if method is toposub python dict: Python dictionary that list which features the clustering must be done with. Importance value is a scaling factor applied to specific feature in case one feature may be more important for segmenting the DEM. Default should be 1.
clustering_mask clustering/catchment_mask.tif optional (only used if method is toposub) '*/.tif' Path (or relative path) to a .tif file containing the mask (0/1 values) which pixels of the DEM are used. Needs to have the same grid/resolution as the input DEM.
clustering_groups clustering/VEG_CODE.tif optional (only used if method is toposub) '*/.tif' Path (or relative path) to a .tif file containing integer values of groups to split the clustering into (e.g. Vegetation codes 1-9). Needs to have the same grid/resolution as the input DEM.
clustering_group_weights clustering/gr_wweights.csv optional (only used if method is toposub) '*/.csv' Path (or relative path) to a .csv file. the file contains at least the two columns group, weights. sum(weights) must equal to 1

Toposcale

Field Example Value Required Possible Values Description
interpolation_method idw y idw, linear interpolation method: inverse distance weight or linear interpolation
LW_terrain_contribution True y True, False Use longwave terrain contribution correction or not

Outputs

Field Example Value Required Possible Values Description
directory C:/ERA5/Downscaled no str (absolute path) The absolute path where to store the final downscaled product
variables all yes all, or Variable to export when using to_netcdf() function
file
clean_outputs True yes True, False remove all files from /outputs/ folder prior to downscaling. If False all files in outputs/ are kept, but outputs/tmp/ is removed. False can be used to speed up job if computing DEM morphometrics, solar geometries, and horizons have been done.
clean_FSM True yes True, False remove all files from /fsm_sims/ folder prior to downscaling and running FSM. If false, existing files will not be deleted.
df_centroids df_centroids.pck yes *.pck filename to store dataframe of the points/centroids downscaling takes place . File is saved in outputs/
ds_param ds_param.nc yes *.nc filename to store dataset of the DEM morphometric and cluster/centroid labels map. File is saved in outputs/
ds_solar ds_solar.nc yes *.nc filename to store dataset of the solar geometry of DEM. File is saved in outputs/
da_horizon da_horizon.nc yes *.nc filename to store DataArray of the DEM horizons. File is saved in outputs/
landform landform.tif no *.tif filename to store raster of the points/centroids downscaling map. File is saved in outputs/
downscaled_pt down_pt_*.nc yes *.nc filename to store dataset of the dowsncaled points/centroids. File is saved in outputs/
zarr_store down.zarr no string name of the zarr store for the downscaled timeseries (optional, only working with Dask)
### clean_up
Field Example Value Required Possible Values Description
delete_tmp_dirs True no True, False If True, the created tmp directories will get deleted after downscaling the climate

File csv format for a list of points

The list of points is a comma-separated value file (.csv) which must contain at least the fields x,y. All other columns will be loaded into a dataframe and can be used for further analysis (but won't be required by TopoPyScale).

An example of a list of points:

Name,stn_number,latitude,longitude,x,y
Finsevatne,SN25830,60.5938,7.527,419320.867306002,6718447.86246835
Fet-I-Eidfjord,SN49800,60.4085,7.2798,405243.856317655,6698143.36494597
Skurdevikåi,SN29900,60.3778,7.5693,421114.679132306,6694343.36865902
Midtstova,SN53530,60.6563,7.2755,405730.30171528,6725742.26010349
FV50-Vestredalen,SN53990,60.7418,7.5748,422296.164018722,6734871.61164008
Klevavatnet,SN53480,60.7192,7.2085,402259.379226592,6732844.21093029

This file will loaded in mp.toposub.df_centroids dataframe.

Parallelization

TopoPyScale uses parallelization for a number of steps. For most, the Python library multiprocessing is used to either handle multithreads (management of downloading request from cds server), or multicore (clustering, solar geometry, downscaling). An optional method using Dask is available to perform the dowscaling step.

Settings for parallelization should be adapted and considered according to your machine (laptop, server, HPC, ...).

Zarr

As of now, we are starting using the file format Zarr to improve IO methods. Zarr is a recent archival format for multidimensional datasets. It behaves like a database, from whic only the data of interest are being loaded into memory. There exist a number of Zarr repository of the ERA5 dataset (Google, AWS, etc.) for which we do not have yet well establisehed method to pull data from. However, when downloading data from CDS, it is now possible to download these data as netcdf as before, but then convert them localy into a zarr archive. This improves significantly the downscaling speed (x1.4). Example code will soon be available to demonstrate using these newly added options.