Further Evalution and Clustering Functions

MCBB.distance_matrixFunction
 distance_matrix(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, bin_edges::AbstractArray)

Calculate the distance matrix between all individual solutions.

Histogram Method

If it is called with the histograms flag true, computes for each run in the solution sol for each measure a histogram of the measures of all system dimensions. The binning of the histograms is computed with Freedman-Draconis rule and the same across all runs for each measure.

The distance matrix is then computed given a suitable histogram distance function histogram_distance between these histograms.

This is intended to be used in order to avoid symmetric configurations in larger systems to be distinguished from each other. Example: Given a system with 10 identical oscillators. Given this distance calculation a state where oscillator 1-5 are synchronized and 6-10 are not syncronized would be in the same cluster as a state where oscillator 6-10 are synchronized and 1-5 are not synchronized. If you don't want this kind of behaviour, use the regular distance_matrix function.

Sparse and memory mapped options

There are seperate routines for computing very large matrices, using either memory maped arrays (see distance_matrix_mmap) or sparse arrays (see distance_matrix_sparse).

Arguments

  • sol: solution
  • prob: problem
  • distance_func: The actual calculating the distance between the measures/parameters of each solution with each other. Signature should be (measure_1::Union{Array,Number}, measure_2::Union{Array,Number}) -> distance::Number. Example and default is(x,y)->sum(abs.(x .- y))`.
  • weights: Instead of the actual measure weights[i_measure]*measure is handed over to distance_func. Thus weights need to be $N_{meas}+N_{par}$ long array.

Kwargs

  • relative_parameter: If true, the paramater values during distance calcuation is rescaled to [0,1]
  • histograms::Bool: If true, the distance calculation is based on distance_matrix_histogram with the default histogram distance wasserstein_histogram_distance.
  • histogram_distance_func: The distance function between two histograms. Default is wasserstein_histogram_distance.
  • matrix_distance_func: The distance function between two matrices or arrays or length different from $N_{dim}$. Used e.g. for Crosscorrelation.
  • ecdf::Bool if true the histogram_distance function gets the empirical cdfs instead of the histogram
  • k_bin::Int: Multiplier to increase ($k_{bin}>1$) or decrease the bin width and thus decrease or increase the number of bins. It is a multiplier to the Freedman-Draconis rule. Default: $k_{bin}=1$
  • nbin_default::Int: If the IQR is very small and thus the number of bins larger than nbin_default, the number of bins is set back to nbin_default and the edges and width adjusted accordingly.
  • nbin::Int If specified, ingore all other histogram binning calculation and use nbin bins for the histograms.
  • bin_edges::AbstractArray: If specified ignore all other histogram binning calculations and use this as the edges of the histogram (has to have one more element than bins, hence all edges). Needs to be an Array with as many elements as measures, if one wants automatic binning for one observables, this element of the array has to be nothing. E.g.: [1:1:10, nothing, 2:0.5:5].

Returns an instance of DistanceMatrix or DistanceMatrixHist

source
MCBB.distance_matrix_mmapFunction
distance_matrix_mmap(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, nbin_default::Int=50, el_type=Float32, save_name="mmap-distance-matrix.bin")

Computes the distance matrix like distance_matrix but uses memory-maped arrays. Use this if the distance matrix is too large for the memory of your computer. Same inputs as distance_matrix, but with added el_type that determines the eltype of the saved matrix and save_name the name of the file on the hard disk.

Due to the restriction of memory-maped arrays saving and loading distance matrices computed like this with JLD2 will only work within a single machine. A way to reload these matrices / transfer them, is reload_mmap_distance_matrix.

source
MCBB.compute_distanceFunction
compute_distance(sol::myMCSol, i_meas::Int, distance_func::Function; use_histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, bin_edges::AbstractRange)

Computes a (part of the) distance matrix for only a single measure i_meas. Follows otherwise the same logic as distance_matrix but returns the matrix as an Array{T,2}.

source
MCBB.distance_matrix_sparseFunction
distance_matrix_sparse(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, nbin_default::Int=50, nbin::Union{Int, Nothing}=nothing, bin_edges::Union{AbstractArray, Nothing}=nothing, sparse_threshold::Number=Inf, el_type=Float32, check_inf_nan::Bool=true)

Computes the distance matrix sparse. Same arguments as distance_matrix with extra arguments

* `sparse_threshold`: Only distances smaller than this value are saved
* `check_inf_nan`: Only performs the Inf/NaN check if true.
source
MCBB.DistanceMatrixType
DistanceMatrix{T}

Type for distance matrices. This type should behave just like any AbstractArray{T,2}. There's a convert to AbstractArray{T,2}.

It also holds additional information about the distance calculation.

Fields (and constructor)

  • data::AbstractArray{T,2}: The actual distance matrix
  • weights::AbstractArray{T,1}: The weights that were used to compute it
  • distance_func::Function: The function that was used to compute it
  • relative_parameter::Bool: Was the parameter rescaled?
source
MCBB.DistanceMatrixHistType
DistanceMatrixHist{T}

Type for distance matrices which were computed using Histograms. This type should behave just like any AbstractArray{T,2}. There's a convert to AbstractArray{T,2}.

It also holds additional information about the distance calculation.

Fields (and constructor)

  • data::AbstractArray{T,2}: The actual distance matrix
  • weights::AbstractArray{T,1}: The weights that were used to compute it
  • distance_func::Function: The function that was used to compute the distance between the global measures
  • matrix_distance_func::Union{Function, Nothing}: The function that was used to compute it
  • relative_parameter::Bool: Was the parameter rescaled?
  • histogram_distance::Function: Function used to compute the histogram distance
  • hist_edges: Array of arrays/ranges with all histogram edges
  • bin_width: Array of all histogram bin widths
  • ecdf: Was the ECDF used in the distance computation?
  • k_bin: Additional factor in bin_width computation
source
MCBB.metadata!Method
metadata!(dm::AbstractDistanceMatrix, )

Sets the input [AbstractDistanceMatrix] matrix itself empty, thus only containing metadata. This is usefull if the matrix itself is already saved otherwise (like with Mmap).

source
MCBB.wasserstein_histogram_distanceFunction

One possible histogram distance for distance_matrix_histogram (also the default one). It calculates the 1-Wasserstein / Earth Movers Distance between the two ECDFs by first computing the ECDF and then computing the discrete integral

$\int_{-\infty}^{+\infty}|ECDF(hist\_1) - ECDF(hist\_2)| dx = \sum_i | ECDF(hist\_1)_i - ECDF(hist\_2)_i | \cdot bin\_width$.

Returns a single (real) number. The input is the ecdf.

Adopted from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html

source
MCBB.ecdf_histFunction
ecdf_hist(hist::Histogram)

Returns the ECDF of a histogram (normalized) as an Array.

source
MCBB.cluster_distanceFunction
cluster_distance(sol::myMCSol, D::AbstractDistanceMatrix, cluster_results::ClusteringResult,  cluster_1::Int, cluster_2::Int; measures::Union{AbstractArray, Nothing}=nothing, distance_func=nothing, histogram_distance=nothing, matrix_distance_func=nothing, k_bin::Number=1)

Does calculate the distance between the members of two cluster seperatly for each measure

Inputs

  • sol: Solution object
  • D: distance matrix from distance_matrix
  • cluster_results: results from the clustering
  • cluster_1: Index of the first cluster to be analysed (noise/outlier cluster = 1)
  • cluster_2: Index of the second cluster to be analysed
  • measures: Which measures should be analysed, default: all.

Output

  • Array with
  • Summary dictionary, mean and std of the distances
source
MCBB.cluster_meansMethod
cluster_means(sol::myMCSol, clusters::ClusteringResult)

Returns the mean of each measure for each cluster.

source
MCBB.cluster_membershipFunction
cluster_membership(par::AbstractArray, clusters::ClusteringResult)

Calculates the proportion of members for each cluster for all parameter values.

Returns an instance ClusterMembershipResult with fields:

  • par: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.
  • data: members of the clusters on the parameter grid
source
cluster_membership(prob::myMCProblem, clusters::ClusteringResult, window_size::AbstractArray, window_offset::AbstractArray; normalize::Bool=true, min_members::Int=0)
cluster_membership(prob::myMCProblem, clusters::ClusteringResult, window_size::Number, window_offset::Number; normalize::Bool=true,  min_members::Int=0)

Calculates the proportion of members for each cluster within a parameter sliding window.

  • prob: problem
  • sol: solution of prob
  • clusters: results from a DBSCAN run.
  • window_size: Size of the window. In case multiple paramaters being varied has to be an array.
  • window_offset: Offset of the sliding window. In case multiple paramaters being varied has to be an array.

Returns an instance ClusterMembershipResult with fields:

  • par: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.
  • data: members of the clusters on the parameter grid

The results can be plotted with directly with plot(results, kwargs...). See ClusterMembershipResult for details on the plotting and operationg on this type.

source
MCBB.ClusterMembershipResultType
ClusterMembershipResult{T,S}

Stores the results of cluster_membership and can be used for ClusterMembershipPlot.

Fields

  • par: Parameter Array or Mesh
  • data: Cluster Membership data on par-Parameter grid.
  • multidim_flag: Is the experiment multidimensional?

Plot

plot(cm::ClusterMembershipResult, kwargs...)

Does plot the ClusterMembershipResult. Uses Plot recipes and thus hands over all kwargs possible from Plots.jl.

Hints

The order of the labels for the legend is reversed.

Additional Kwargs

  • plot_index: Range or Array with the indices of the clusters to be plotted. Default: all.

Additonal operation defined

* can be indexed
* can be sorted, [`Base.sort!(cm::ClusterMembershipResult; ignore_first::Bool)`](@ref)
* can be summed, [`Base.sum(cm::ClusterMembershipResult, indices::AbstractArray{Int,1})`](@ref)
source
Base.sort!Method
sort!(cm::ClusterMembershipResult; ignore_first::Bool=false)

Sorts cm inplace by the count of members of the clusters from low to high. If ignore_first is true, the first cluster (with DBSCAN this is the outlier cluster) is ignored while sorting and remains the first cluster.

source
Base.sumMethod
Base.sum(cm::ClusterMembershipResult, indices::AbstractArray{Int,1})

Returns a ClusterMembershipResult with all indices clusters summed together.

source
MCBB.get_trajectoryFunction
get_trajectory(prob::MCBBProblem, sol::MCBBSol, clusters::ClusteringResult, i::Int; only_sol::Bool=true)

Solves and returns a trajectory that is classified in cluster i. Randomly selects one IC/Parameter configuration, so that mulitple executions of this routine will yield different results! If only_sol==true it returns only the solution, otherwise it returns a tuple (solution, problem, i_run) where i_run is the number of the trial in prob and sol.

get_trajectory(prob::MCBBProblem, sol::MCBBSol, i::Int, only_sol::Bool=true)

Solves problem i and returns a trajectory. If only_sol==true it returns only the solution, otherwise it returns a tuple (solution, problem, i_run) where i_run is the number of the trial in prob and sol.

Example

Plot with e.g

using PyPlot
IM = imshow(Matrix(get_trajectory(prob,sol,db_res,1)), aspect=2)
ylabel("System Dimension i")
xlabel("Time t")
cb = colorbar(IM, orientation="horizontal")
cb[:ax][:set_xlabel]("Colorbar: Im(z)", rotation=0)
source
MCBB.cluster_measure_meanFunction
cluster_measure_mean(sol::myMCSol, clusters:ClusteringResult, i::Int)

Return the Mean of measure i for each cluster.

source
MCBB.cluster_measure_stdFunction
cluster_measure_std(sol::myMCSol, clusters:ClusteringResult, i::Int)

Return the std of measure i for each cluster.

source
MCBB.cluster_measuresFunction
 cluster_measures(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, window_size::AbstractArray, window_offset::AbstractArray)
 cluster_measures(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, window_size::Number, window_offset::Number)

Calculated the measures for each cluster along a sliding window. Can also handle multiple parameters being varied.

  • prob: problem
  • sol: solution of prob
  • clusters: results from a DBSCAN run.
  • window_size: Size of the window. In case multiple paramaters being varied has to be an array.
  • window_offset: Offset of the sliding window. In case multiple paramaters being varied has to be an array.

Returns an instance of ClusterMeasureResult with fields:

  • par: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.
  • cluster_measures: (per dimension) measures on the parameter grid
  • cluster_measures_global: global measures on the parameter grid

Plot:

  • The i-th measure can be plotted with plot(res::ClusterMeasureResult, i::Int, kwargs...)

  • A single cluster and measure can be plotted with plot(res::ClusterMeasureResult, i_meas::Int, i_cluster::Int, kwargs...).

source
MCBB.ClusterMeasureResultType
ClusterMeasureResult

Results of cluster_measures.

Fields:

  • par
  • cluster_measures
  • cluster_measures_global

Plot:

The i-th measure of the j-th cluster can be plotted with plot(res::ClusterMeasureResult, i::Int, j::Int, kwargs...)

source
MCBB.cluster_measures_sliding_histogramsFunction
cluster_measures_sliding_histograms(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, i_meas::Int, window_size::Number, window_offset::Number; kwargs...)

Calculates for each window in the sliding window array a histogram of all results of meausure i_meas of all runs seperatly for each cluster.

Input:

  • prob::myMCProblem: problem
  • sol::myMCSol: solution object
  • clusters::ClusteringResult: cluster results
  • i_meas::Int: index of the measure to be analyzed
  • window_size::AbstractArray: size of the window, number or Array with length according to the number of parameters
  • window_offset::AbstractArray: size of the window, number or Array with length according to the number of parameters

Keyword arguments

  • k_bin::Number: Bin Count Modifier. k_bin-times the Freedman Draconis rule is used for binning the data. Default: 1
  • normalization_mode::Symbol, normalization mode applied to Histograms. Directly handed over to normalize.
  • nbin::Int: Uses nbins for the histograms instead of the (automatic) Freedman Draconis rule
  • bin_edges::AbstractRange: Uses these edges for the histograms.
  • state_filter::AbstractArray: Only use these system dimension as the basis for the computation, default: all. Attention: if the evalation function already used a state_filter this will be refering only to the system dimension that were measured.

Returns an instance of ClusterMeasureHistogramResult with fields:

  • hist_vals: Ncluster, Nwindows..., N_bins - sized array with the value of the histograms for each window
  • par: midpoint of the sliding windows, "x-axis-labels" of the plot
  • hist_bins: center of the bins, "y-axis-label" of the plot

Can be plotted with plot(res::ClusterMeasureHistogramResult, kwargs...). See ClusterMeasureHistogramResult for details.

source
MCBB.ClusterMeasureHistogramResultType
ClusterMeasureHistogramResult

Stores results of cluster_measures_sliding_histograms.

Fields

  • hist_vals: Ncluster, Nwindows..., N_bins - sized array with the value of the histograms for each window
  • par: midpoint of the sliding windows, "x-axis-labels" of the plot
  • hist_edges: center of the bins, "y-axis-label" of the plot
  • multidim_flag

Plot

Can be plotted with plot(res::ClusterMeasureHistogramResult, i, kwargs...). With i being the number of the cluster.

source
MCBB.ClusterICSpacesType
ClusterICSpaces

This function/struct returns the distributions as histograms of ICs (and Parameter) in each dimension for cluster seperatly, it also returns the data itself, means and stds. If additional keyword arguments minpar, maxpar are given, it limits the analysis to the specified parameter range.

Fields of the struct:

  • data: array of array of arrays, the ICs and pars for each cluster and dimension
  • histograms: Ncluster x Ndim Array of Histograms of ICs/Par
  • means: Means of each dimension for each cluster
  • stds: Stds of each dimension for each cluster
  • cross_dim_means: list of Means of ICs across IC-dimensions per Cluster
  • cross_dim_stds: list of Std of ICs across IC-dimensions per Cluster
  • cross_dim_kurts: list of Kurtosis of ICs across IC-dimensions per Cluster

Constructor

ClusterICSpaces(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult; min_par::Number=-Inf, max_par::Number=Inf, nbins::Int64=20)
  • prob: Problem
  • sol: solution of prob
  • clusters: DBSCAN results
  • min_par, max_par: restrict the analysis to parameters within this value range
  • nbins: Number of bins of the histograms
source
MCBB.cluster_n_noiseFunction
cluster_n_noise(clusters::ClusteringResult)

Returns the number of points assignt to the "noise" cluster (somehow this is not automaticlly returned by the routine of Clustering.jl).

source
MCBB.measure_on_parameter_sliding_windowFunction
measure_on_parameter_sliding_window

Does calculate measures (per cluster) on parameter sliding windows. This routine is called by cluster_membership and cluster_measures but can also be used for plotting measures on the parameter grid manually.

ATTENTION: If a cluster has no members within a window the value is set to NaN. This should simply omit these points from beeing plotted (while missing and nothing are currently not compatible with most plotting packages).

measure_on_parameter_sliding_window(prob::myMCProblem, sol::myMCSol, i::Int, clusters::ClusteringResult, window_size::Number, window_offset::Number)

Does return the i-th measure for each cluster seperatly on the parameter sliding window grid

  • prob: Problem

  • sol: solution of prob

  • i: function returns the i-th measure

  • clusters: results from a DBSCAN run.

  • window_size: Size of the window. In case multiple paramaters being varied has to be an array.

  • window_offset: Offset of the sliding window. In case multiple paramaters being varied has to be an array.

    measureonparameterslidingwindow(prob::myMCProblem, sol::myMCSol, i::Int, windowsize::Number, windowoffset::Number)

Does return the i-th measure on the parameter sliding window grid (does not calculate the measure for each cluster seperatly)

All methods return a tuple with:

  • parameter_windows: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.
  • cluster_measures: members of the clusters on the parameter grid
source
MCBB.k_distFunction
 k_dist(D::AbstractArray, k::Int=4)

Helper function for estimating a espilon value for DBSCAN. In the original paper, Ester et al. suggest to plot the k-dist graph (espacially for $k=4$) to estimate a value for eps given $minPts = k$. It computes the distance to the k-th nearast neighbour for all data points given their distance matrix.

  • D: Distance matrix
  • k: calculate the distance to the k-th neighbour

Returns sorted array with the k-dist of all elements of D.

source
MCBB.KNN_distFunction
KNN_dist(D::AbstractArray, K::Int)

Returns the cumulative K-th nearest neighbour distance.

  • D: Distance matrix
  • K
source
MCBB.KNN_dist_relativeFunction
KNN_dist_relative(D::AbstractArray, rel_K::Float64=0.005)

Returns the cumulative distance to the rel_K*N nearest neighbour.

  • D: Distance matrix
  • rel_K
source
Missing docstring.

Missing docstring for load_D. Check Documenter's build log for details.

Missing docstring.

Missing docstring for load_D_hist. Check Documenter's build log for details.