module TopoPyScale.topo_sub
Clustering routines for TopoSUB
S. Filhol, Oct 2021
TODO:
- explore other clustering methods available in scikit-learn: https://scikit-learn.org/stable/modules/clustering.html
- look into DBSCAN and its relative
function ds_to_indexed_dataframe
Function to convert dataset to dataframe
See definition of function in topo_utils.py
Args:
ds(dataset): xarray dataset N * 2D Dataarray
Returns:
function scale_df
scale_df(
df_param,
scaler=StandardScaler(),
features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1}
)
Function to scale features of a pandas dataframe
Args:
df_param(dataframe): features to scalescaler(scaler object): Default is StandardScaler()features(dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}
Returns:
dataframe: scaled data
function inverse_scale_df
inverse_scale_df(
df_scaled,
scaler,
features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1}
)
Function to inverse feature scaling of a pandas dataframe
Args:
df_scaled(dataframe): scaled data to transform back to original (inverse transfrom)scaler(scaler object): original scikit learn scalerfeatures(dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}
Returns:
dataframe: data in original format
function kmeans_clustering
kmeans_clustering(
df_param,
n_clusters=100,
features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
seed=None,
**kwargs
)
Function to perform K-mean clustering
Args:
df_param(dataframe): featuresfeatures(dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}n_clusters(int): number of clustersseed(int): None or int for random seed generator
kwargs:
Returns:
dataframe: df_centerskmean object: kmeansdataframe: df_param
function minibatch_kmeans_clustering
minibatch_kmeans_clustering(
df_param,
n_clusters=100,
features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
n_cores=4,
seed=None,
**kwargs
)
Function to perform mini-batch K-mean clustering
Args:
df_param(dataframe): featuresn_clusters(int): number of clustersfeatures(dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}n_cores(int): number of processor core
kwargs:
Returns:
dataframe: centroidskmean object: kmean modeldataframe: labels of input data
function search_number_of_clusters
search_number_of_clusters(
df_param,
method='minibatchkmean',
cluster_range=array([100, 300, 500, 700, 900]),
features={'x': 1, 'y': 1, 'elevation': 4, 'slope': 1, 'aspect_cos': 1, 'aspect_sin': 1, 'svf': 1},
scaler_type=StandardScaler(),
scaler=None,
seed=2,
plot=True
)
Function to help identify an optimum number of clusters using the elbow method
Args:
df_param(dataframe): pandas dataframe containing input variable to the clustering methodmethod(str): method for clustering. Currently available: ['minibatchkmean', 'kmeans']range_n_clusters(array int): array of number of clusters to derive scores forfeatures(dict): dictionnary of features to use as predictors with their respect importance. {'x':1, 'y':1}scaler_type(scikit_learn obj): type of scaler to use: e.g. StandardScaler() or RobustScaler()scaler(scikit_learn obj): fitted scaler to dataset. Implies that df_param is already scaledseed(int): random seed for kmeans clusteringplot(bool): plot results or not
Returns:
dataframe: wcss score, Davies Boulding score, Calinsky Harabasz score
function plot_center_clusters
plot_center_clusters(
dem_file,
ds_param,
df_centers,
var='elevation',
cmap=<matplotlib.colors.ListedColormap object at 0x7f669fe9c8e0>,
figsize=(14, 10)
)
Function to plot the location of the cluster centroids over the DEM
Args:
dem_file(str): path to dem raster fileds_param(dataset): topo_param parameters ['elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']df_centers(dataframe): containing cluster centroid parameters ['x', 'y', 'elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']var(str): variable to plot as backgroundcmap(pyplot cmap): pyplot colormap to represent the variable.
function write_landform
write_landform(
dem_file,
df_param,
project_directory='./',
out_dir: Optional[str, Path] = None,
out_name: Optional[str] = None
) → Union[str, Path]
Function to write a landform file which maps cluster ids to dem pixels
Args:
dem_file(str): path to dem raster fileds_param(dataset): topo_param parameters ['elev', 'slope', 'aspect_cos', 'aspect_sin', 'svf']
This file was automatically generated via lazydocs.