Further Evalution and Clustering Functions
MCBB.distance_matrix
— Function distance_matrix(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, bin_edges::AbstractArray)
Calculate the distance matrix between all individual solutions.
Histogram Method
If it is called with the histograms
flag true
, computes for each run in the solution sol
for each measure a histogram of the measures of all system dimensions. The binning of the histograms is computed with Freedman-Draconis rule and the same across all runs for each measure.
The distance matrix is then computed given a suitable histogram distance function histogram_distance
between these histograms.
This is intended to be used in order to avoid symmetric configurations in larger systems to be distinguished from each other. Example: Given a system with 10 identical oscillators. Given this distance calculation a state where oscillator 1-5 are synchronized and 6-10 are not syncronized would be in the same cluster as a state where oscillator 6-10 are synchronized and 1-5 are not synchronized. If you don't want this kind of behaviour, use the regular distance_matrix
function.
Sparse and memory mapped options
There are seperate routines for computing very large matrices, using either memory maped arrays (see distance_matrix_mmap
) or sparse arrays (see distance_matrix_sparse
).
Arguments
sol
: solutionprob
: problemdistance_func
: The actual calculating the distance between the measures/parameters of each solution with each other. Signature should be(measure_1::Union{Array,Number}, measure_2::Union{Array,Number}) -> distance::Number. Example and default is
(x,y)->sum(abs.(x .- y))`.weights
: Instead of the actual measureweights[i_measure]*measure
is handed over todistance_func
. Thusweights
need to be $N_{meas}+N_{par}$ long array.
Kwargs
relative_parameter
: If true, the paramater values during distance calcuation is rescaled to [0,1]histograms::Bool
: If true, the distance calculation is based ondistance_matrix_histogram
with the default histogram distancewasserstein_histogram_distance
.histogram_distance_func
: The distance function between two histograms. Default iswasserstein_histogram_distance
.matrix_distance_func
: The distance function between two matrices or arrays or length different from $N_{dim}$. Used e.g. for Crosscorrelation.ecdf::Bool
if true thehistogram_distance
function gets the empirical cdfs instead of the histogramk_bin::Int
: Multiplier to increase ($k_{bin}>1$) or decrease the bin width and thus decrease or increase the number of bins. It is a multiplier to the Freedman-Draconis rule. Default: $k_{bin}=1$nbin_default::Int
: If the IQR is very small and thus the number of bins larger thannbin_default
, the number of bins is set back tonbin_default
and the edges and width adjusted accordingly.nbin::Int
If specified, ingore all other histogram binning calculation and use nbin bins for the histograms.bin_edges::AbstractArray
: If specified ignore all other histogram binning calculations and use this as the edges of the histogram (has to have one more element than bins, hence all edges). Needs to be an Array with as many elements as measures, if one wants automatic binning for one observables, this element of the array has to benothing
. E.g.:[1:1:10, nothing, 2:0.5:5]
.
Returns an instance of DistanceMatrix
or DistanceMatrixHist
MCBB.distance_matrix_mmap
— Functiondistance_matrix_mmap(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, nbin_default::Int=50, el_type=Float32, save_name="mmap-distance-matrix.bin")
Computes the distance matrix like distance_matrix
but uses memory-maped arrays. Use this if the distance matrix is too large for the memory of your computer. Same inputs as distance_matrix
, but with added el_type
that determines the eltype of the saved matrix and save_name
the name of the file on the hard disk.
Due to the restriction of memory-maped arrays saving and loading distance matrices computed like this with JLD2 will only work within a single machine. A way to reload these matrices / transfer them, is reload_mmap_distance_matrix
.
MCBB.compute_distance
— Functioncompute_distance(sol::myMCSol, i_meas::Int, distance_func::Function; use_histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, bin_edges::AbstractRange)
Computes a (part of the) distance matrix for only a single measure i_meas
. Follows otherwise the same logic as distance_matrix
but returns the matrix as an Array{T,2}
.
MCBB.distance_matrix_sparse
— Functiondistance_matrix_sparse(sol::myMCSol, prob::myMCProblem, distance_func::Function, weights::AbstractArray; matrix_distance_func::Union{Function, Nothing}=nothing, histogram_distance_func::Union{Function, Nothing}=wasserstein_histogram_distance, relative_parameter::Bool=false, histograms::Bool=false, use_ecdf::Bool=true, k_bin::Number=1, nbin_default::Int=50, nbin::Union{Int, Nothing}=nothing, bin_edges::Union{AbstractArray, Nothing}=nothing, sparse_threshold::Number=Inf, el_type=Float32, check_inf_nan::Bool=true)
Computes the distance matrix sparse. Same arguments as distance_matrix
with extra arguments
* `sparse_threshold`: Only distances smaller than this value are saved
* `check_inf_nan`: Only performs the Inf/NaN check if true.
MCBB.AbstractDistanceMatrix
— Typeabstract type AbstractDistanceMatrix{T} <: AbstractArray{T,2} end
Abstract Datatype for all Distance Matrix types. Currently, there are within MCBB: * DistanceMatrix
* DistanceMatrixHist
MCBB.DistanceMatrix
— TypeDistanceMatrix{T}
Type for distance matrices. This type should behave just like any AbstractArray{T,2}
. There's a convert
to AbstractArray{T,2}
.
It also holds additional information about the distance calculation.
Fields (and constructor)
data::AbstractArray{T,2}
: The actual distance matrixweights::AbstractArray{T,1}
: The weights that were used to compute itdistance_func::Function
: The function that was used to compute itrelative_parameter::Bool
: Was the parameter rescaled?
MCBB.DistanceMatrixHist
— TypeDistanceMatrixHist{T}
Type for distance matrices which were computed using Histograms. This type should behave just like any AbstractArray{T,2}
. There's a convert
to AbstractArray{T,2}
.
It also holds additional information about the distance calculation.
Fields (and constructor)
data::AbstractArray{T,2}
: The actual distance matrixweights::AbstractArray{T,1}
: The weights that were used to compute itdistance_func::Function
: The function that was used to compute the distance between the global measuresmatrix_distance_func::Union{Function, Nothing}
: The function that was used to compute itrelative_parameter::Bool
: Was the parameter rescaled?histogram_distance::Function
: Function used to compute the histogram distancehist_edges
: Array of arrays/ranges with all histogram edgesbin_width
: Array of all histogram bin widthsecdf
: Was the ECDF used in the distance computation?k_bin
: Additional factor in bin_width computation
MCBB.metadata!
— Methodmetadata!(dm::AbstractDistanceMatrix, )
Sets the input [AbstractDistanceMatrix
] matrix itself empty, thus only containing metadata. This is usefull if the matrix itself is already saved otherwise (like with Mmap).
MCBB.wasserstein_histogram_distance
— FunctionOne possible histogram distance for distance_matrix_histogram
(also the default one). It calculates the 1-Wasserstein / Earth Movers Distance between the two ECDFs by first computing the ECDF and then computing the discrete integral
$\int_{-\infty}^{+\infty}|ECDF(hist\_1) - ECDF(hist\_2)| dx = \sum_i | ECDF(hist\_1)_i - ECDF(hist\_2)_i | \cdot bin\_width$.
Returns a single (real) number. The input is the ecdf.
Adopted from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html
MCBB.ecdf_hist
— Functionecdf_hist(hist::Histogram)
Returns the ECDF of a histogram (normalized) as an Array.
MCBB.cluster_distance
— Functioncluster_distance(sol::myMCSol, D::AbstractDistanceMatrix, cluster_results::ClusteringResult, cluster_1::Int, cluster_2::Int; measures::Union{AbstractArray, Nothing}=nothing, distance_func=nothing, histogram_distance=nothing, matrix_distance_func=nothing, k_bin::Number=1)
Does calculate the distance between the members of two cluster seperatly for each measure
Inputs
sol
: Solution objectD
: distance matrix fromdistance_matrix
cluster_results
: results from the clusteringcluster_1
: Index of the first cluster to be analysed (noise/outlier cluster = 1)cluster_2
: Index of the second cluster to be analysedmeasures
: Which measures should be analysed, default: all.
Output
- Array with
- Summary dictionary, mean and std of the distances
MCBB.cluster_means
— Methodcluster_means(sol::myMCSol, clusters::ClusteringResult)
Returns the mean of each measure for each cluster.
MCBB.cluster_membership
— Functioncluster_membership(par::AbstractArray, clusters::ClusteringResult)
Calculates the proportion of members for each cluster for all parameter values.
Returns an instance ClusterMembershipResult
with fields:
par
: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.data
: members of the clusters on the parameter grid
cluster_membership(prob::myMCProblem, clusters::ClusteringResult, window_size::AbstractArray, window_offset::AbstractArray; normalize::Bool=true, min_members::Int=0)
cluster_membership(prob::myMCProblem, clusters::ClusteringResult, window_size::Number, window_offset::Number; normalize::Bool=true, min_members::Int=0)
Calculates the proportion of members for each cluster within a parameter sliding window.
prob
: problemsol
: solution ofprob
clusters
: results from a DBSCAN run.window_size
: Size of the window. In case multiple paramaters being varied has to be an array.window_offset
: Offset of the sliding window. In case multiple paramaters being varied has to be an array.
Returns an instance ClusterMembershipResult
with fields:
par
: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.data
: members of the clusters on the parameter grid
The results can be plotted with directly with plot(results, kwargs...)
. See ClusterMembershipResult
for details on the plotting and operationg on this type.
MCBB.ClusterMembershipResult
— TypeClusterMembershipResult{T,S}
Stores the results of cluster_membership
and can be used for ClusterMembershipPlot
.
Fields
par
: Parameter Array or Meshdata
: Cluster Membership data onpar
-Parameter grid.multidim_flag
: Is the experiment multidimensional?
Plot
plot(cm::ClusterMembershipResult, kwargs...)
Does plot the ClusterMembershipResult
. Uses Plot recipes and thus hands over all kwargs possible from Plots.jl.
Hints
The order of the labels for the legend is reversed.
Additional Kwargs
plot_index
: Range or Array with the indices of the clusters to be plotted. Default: all.
Additonal operation defined
* can be indexed
* can be sorted, [`Base.sort!(cm::ClusterMembershipResult; ignore_first::Bool)`](@ref)
* can be summed, [`Base.sum(cm::ClusterMembershipResult, indices::AbstractArray{Int,1})`](@ref)
Base.sort!
— Methodsort!(cm::ClusterMembershipResult; ignore_first::Bool=false)
Sorts cm
inplace by the count of members of the clusters from low to high. If ignore_first
is true, the first cluster (with DBSCAN this is the outlier cluster) is ignored while sorting and remains the first cluster.
Base.sum
— MethodBase.sum(cm::ClusterMembershipResult, indices::AbstractArray{Int,1})
Returns a ClusterMembershipResult
with all indices
clusters summed together.
MCBB.get_trajectory
— Functionget_trajectory(prob::MCBBProblem, sol::MCBBSol, clusters::ClusteringResult, i::Int; only_sol::Bool=true)
Solves and returns a trajectory that is classified in cluster i
. Randomly selects one IC/Parameter configuration, so that mulitple executions of this routine will yield different results! If only_sol==true
it returns only the solution, otherwise it returns a tuple (solution, problem, i_run)
where i_run
is the number of the trial in prob
and sol
.
get_trajectory(prob::MCBBProblem, sol::MCBBSol, i::Int, only_sol::Bool=true)
Solves problem i
and returns a trajectory. If only_sol==true
it returns only the solution, otherwise it returns a tuple (solution, problem, i_run)
where i_run
is the number of the trial in prob
and sol
.
Example
Plot with e.g
using PyPlot
IM = imshow(Matrix(get_trajectory(prob,sol,db_res,1)), aspect=2)
ylabel("System Dimension i")
xlabel("Time t")
cb = colorbar(IM, orientation="horizontal")
cb[:ax][:set_xlabel]("Colorbar: Im(z)", rotation=0)
MCBB.cluster_measure_mean
— Functioncluster_measure_mean(sol::myMCSol, clusters:ClusteringResult, i::Int)
Return the Mean of measure i
for each cluster.
MCBB.cluster_measure_std
— Functioncluster_measure_std(sol::myMCSol, clusters:ClusteringResult, i::Int)
Return the std of measure i
for each cluster.
MCBB.cluster_measures
— Function cluster_measures(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, window_size::AbstractArray, window_offset::AbstractArray)
cluster_measures(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, window_size::Number, window_offset::Number)
Calculated the measures for each cluster along a sliding window. Can also handle multiple parameters being varied.
prob
: problemsol
: solution ofprob
clusters
: results from a DBSCAN run.window_size
: Size of the window. In case multiple paramaters being varied has to be an array.window_offset
: Offset of the sliding window. In case multiple paramaters being varied has to be an array.
Returns an instance of ClusterMeasureResult
with fields:
par
: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.cluster_measures
: (per dimension) measures on the parameter gridcluster_measures_global
: global measures on the parameter grid
Plot:
The
i
-th measure can be plotted withplot(res::ClusterMeasureResult, i::Int, kwargs...)
A single cluster and measure can be plotted with
plot(res::ClusterMeasureResult, i_meas::Int, i_cluster::Int, kwargs...)
.
MCBB.ClusterMeasureResult
— TypeClusterMeasureResult
Results of cluster_measures
.
Fields:
par
cluster_measures
cluster_measures_global
Plot:
The i
-th measure of the j-
th cluster can be plotted with plot(res::ClusterMeasureResult, i::Int, j::Int, kwargs...)
MCBB.cluster_measures_sliding_histograms
— Functioncluster_measures_sliding_histograms(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult, i_meas::Int, window_size::Number, window_offset::Number; kwargs...)
Calculates for each window in the sliding window array a histogram of all results of meausure i_meas
of all runs seperatly for each cluster.
Input:
prob::myMCProblem
: problemsol::myMCSol
: solution objectclusters::ClusteringResult
: cluster resultsi_meas::Int
: index of the measure to be analyzedwindow_size::AbstractArray
: size of the window, number or Array with length according to the number of parameterswindow_offset::AbstractArray
: size of the window, number or Array with length according to the number of parameters
Keyword arguments
k_bin::Number
: Bin Count Modifier.k_bin
-times the Freedman Draconis rule is used for binning the data. Default: 1normalization_mode::Symbol
, normalization mode applied to Histograms. Directly handed over tonormalize
.nbin::Int
: Uses nbins for the histograms instead of the (automatic) Freedman Draconis rulebin_edges::AbstractRange
: Uses these edges for the histograms.state_filter::AbstractArray
: Only use these system dimension as the basis for the computation, default: all. Attention: if the evalation function already used a state_filter this will be refering only to the system dimension that were measured.
Returns an instance of ClusterMeasureHistogramResult
with fields:
hist_vals
: Ncluster, Nwindows..., N_bins - sized array with the value of the histograms for each windowpar
: midpoint of the sliding windows, "x-axis-labels" of the plothist_bins
: center of the bins, "y-axis-label" of the plot
Can be plotted with plot(res::ClusterMeasureHistogramResult, kwargs...)
. See ClusterMeasureHistogramResult
for details.
MCBB.ClusterMeasureHistogramResult
— TypeClusterMeasureHistogramResult
Stores results of cluster_measures_sliding_histograms
.
Fields
hist_vals
: Ncluster, Nwindows..., N_bins - sized array with the value of the histograms for each windowpar
: midpoint of the sliding windows, "x-axis-labels" of the plothist_edges
: center of the bins, "y-axis-label" of the plotmultidim_flag
Plot
Can be plotted with plot(res::ClusterMeasureHistogramResult, i, kwargs...)
. With i
being the number of the cluster.
MCBB.ClusterICSpaces
— TypeClusterICSpaces
This function/struct returns the distributions as histograms of ICs (and Parameter) in each dimension for cluster seperatly, it also returns the data itself, means and stds. If additional keyword arguments minpar, maxpar are given, it limits the analysis to the specified parameter range.
Fields of the struct:
data
: array of array of arrays, the ICs and pars for each cluster and dimensionhistograms
: Ncluster x Ndim Array of Histograms of ICs/Parmeans
: Means of each dimension for each clusterstds
: Stds of each dimension for each clustercross_dim_means
: list of Means of ICs across IC-dimensions per Clustercross_dim_stds
: list of Std of ICs across IC-dimensions per Clustercross_dim_kurts
: list of Kurtosis of ICs across IC-dimensions per Cluster
Constructor
ClusterICSpaces(prob::myMCProblem, sol::myMCSol, clusters::ClusteringResult; min_par::Number=-Inf, max_par::Number=Inf, nbins::Int64=20)
prob
: Problemsol
: solution ofprob
clusters
: DBSCAN resultsmin_par
,max_par
: restrict the analysis to parameters within this value rangenbins
: Number of bins of the histograms
MCBB.cluster_n_noise
— Functioncluster_n_noise(clusters::ClusteringResult)
Returns the number of points assignt to the "noise" cluster (somehow this is not automaticlly returned by the routine of Clustering.jl).
MCBB.measure_on_parameter_sliding_window
— Functionmeasure_on_parameter_sliding_window
Does calculate measures (per cluster) on parameter sliding windows. This routine is called by cluster_membership
and cluster_measures
but can also be used for plotting measures on the parameter grid manually.
ATTENTION: If a cluster has no members within a window the value is set to NaN
. This should simply omit these points from beeing plotted (while missing
and nothing
are currently not compatible with most plotting packages).
measure_on_parameter_sliding_window(prob::myMCProblem, sol::myMCSol, i::Int, clusters::ClusteringResult, window_size::Number, window_offset::Number)
Does return the i
-th measure for each cluster seperatly on the parameter sliding window grid
prob
: Problemsol
: solution ofprob
i
: function returns thei
-th measureclusters
: results from a DBSCAN run.window_size
: Size of the window. In case multiple paramaters being varied has to be an array.window_offset
: Offset of the sliding window. In case multiple paramaters being varied has to be an array.measureonparameterslidingwindow(prob::myMCProblem, sol::myMCSol, i::Int, windowsize::Number, windowoffset::Number)
Does return the i
-th measure on the parameter sliding window grid (does not calculate the measure for each cluster seperatly)
All methods return a tuple with:
parameter_windows
: the center value of the sliding windows, in case multiple parameter are being varied, it is a meshgrid.cluster_measures
: members of the clusters on the parameter grid
MCBB.k_dist
— Function k_dist(D::AbstractArray, k::Int=4)
Helper function for estimating a espilon value for DBSCAN. In the original paper, Ester et al. suggest to plot the k
-dist graph (espacially for $k=4$) to estimate a value for eps
given $minPts = k$. It computes the distance to the k
-th nearast neighbour for all data points given their distance matrix.
D
: Distance matrixk
: calculate the distance to thek
-th neighbour
Returns sorted array with the k-dist of all elements of D
.
MCBB.KNN_dist
— FunctionKNN_dist(D::AbstractArray, K::Int)
Returns the cumulative K-
th nearest neighbour distance.
D
: Distance matrixK
MCBB.KNN_dist_relative
— FunctionKNN_dist_relative(D::AbstractArray, rel_K::Float64=0.005)
Returns the cumulative distance to the rel_K*N
nearest neighbour.
D
: Distance matrixrel_K
Missing docstring for load_D
. Check Documenter's build log for details.
Missing docstring for load_D_hist
. Check Documenter's build log for details.