Skip to content

module TopoPyScale.topo_sub

Clustering routines for TopoSUB

S. Filhol, Oct 2021

TODO:

  • explore other clustering methods available in scikit-learn: https://scikit-learn.org/stable/modules/clustering.html
  • look into DBSCAN and its relative

function ds_to_indexed_dataframe

ds_to_indexed_dataframe(ds)

Function to convert dataset to dataframe

See definition of function in topo_utils.py

Args:

  • ds (dataset): xarray dataset N * 2D Dataarray

Returns:


function scale_df

scale_df(
    df_param,
    scaler=StandardScaler(),
    features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1}
)

Function to scale features of a pandas dataframe

Args:

  • df_param (dataframe): features to scale
  • scaler (scaler object): Default is StandardScaler()
  • features (dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}

Returns:

  • dataframe: scaled data

function inverse_scale_df

inverse_scale_df(
    df_scaled,
    scaler,
    features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1}
)

Function to inverse feature scaling of a pandas dataframe

Args:

  • df_scaled (dataframe): scaled data to transform back to original (inverse transfrom)
  • scaler (scaler object): original scikit learn scaler
  • features (dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}

Returns:

  • dataframe: data in original format

function kmeans_clustering

kmeans_clustering(
    df_param,
    n_clusters=100,
    features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
    seed=None,
    **kwargs
)

Function to perform K-mean clustering

Args:

  • df_param (dataframe): features
  • features (dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}
  • n_clusters (int): number of clusters
  • seed (int): None or int for random seed generator

kwargs:

Returns:

  • dataframe: df_centers
  • kmean object: kmeans
  • dataframe: df_param

function minibatch_kmeans_clustering

minibatch_kmeans_clustering(
    df_param,
    n_clusters=100,
    features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
    n_cores=4,
    seed=None,
    **kwargs
)

Function to perform mini-batch K-mean clustering

Args:

  • df_param (dataframe): features
  • n_clusters (int): number of clusters
  • features (dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}
  • n_cores (int): number of processor core

kwargs:

Returns:

  • dataframe: centroids
  • kmean object: kmean model
  • dataframe: labels of input data

function search_number_of_clusters

search_number_of_clusters(
    df_param,
    method='minibatchkmean',
    cluster_range=array([100, 300, 500, 700, 900]),
    features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
    scaler_type=StandardScaler(),
    scaler=None,
    seed=2,
    plot=True
)

Function to help identify an optimum number of clusters using the elbow method

Args:

  • df_param (dataframe): pandas dataframe containing input variable to the clustering method
  • method (str): method for clustering. Currently available: ['minibatchkmean', 'kmeans']
  • range_n_clusters (array int): array of number of clusters to derive scores for
  • features (dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}
  • scaler_type (scikit_learn obj): type of scaler to use: e.g. StandardScaler() or RobustScaler()
  • scaler (scikit_learn obj): fitted scaler to dataset. Implies that df_param is already scaled
  • seed (int): random seed for kmeans clustering
  • plot (bool): plot results or not

Returns:

  • dataframe: wcss score, Davies Boulding score, Calinsky Harabasz score

function plot_center_clusters

plot_center_clusters(
    dem_file,
    ds_param,
    df_centers,
    var='elevation',
    cmap=<matplotlib.colors.ListedColormap object at 0x7f669fe9c8e0>,
    figsize=(14, 10)
)

Function to plot the location of the cluster centroids over the DEM

Args:

  • dem_file (str): path to dem raster file
  • ds_param (dataset): topo_param parameters ['elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']
  • df_centers (dataframe): containing cluster centroid parameters ['x', 'y', 'elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']
  • var (str): variable to plot as background
  • cmap (pyplot cmap): pyplot colormap to represent the variable.

function write_landform

write_landform(
    dem_file,
    df_param,
    project_directory='./',
    out_dir: Optional[str, Path] = None,
    out_name: Optional[str] = None
)  Union[str, Path]

Function to write a landform file which maps cluster ids to dem pixels

Args:

  • dem_file (str): path to dem raster file
  • ds_param (dataset): topo_param parameters ['elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']

This file was automatically generated via lazydocs.