Running it on a HPC
The heavy-lifting in terms of parallelization is done by MonteCarloProblem
/solve
from DifferentialEquations, however we have to make most parts of the setup known to all processes with @everywhere
and have a submit script.
Below, you will find an example of a script running a Kuramoto network and saving the results. Typically, (due to memory constraints) I would recommend to only solve the DEMCBBProblem
on the HPC, then save the solutions and do the remaining calculations (distance matrix and clustering) on your own PC.
For saving and loading data, I used JLD2
. It needs all functions and variables in scope while loading objects that were used during the computation of the object. That's why I work with only one script that is used on both the remote and local, depending on whether the variable cluster
is true
or false
.
Julia Script (hpc_example.jl)
The Julia script is similar to the examples shown in the other guides in this documentation. Running Julia code in parallel on a HPC needs the ClusterManagers
package to add the processes that are allocated by the regular Slurm script. If you are running this in parallel without a batch system, change addprocs(ClusterManagers(...))
to addprocs(...)
.
- For usage on the HPC set
cluster=true
- For usage on your PC (to evaluate the results) set
cluster=false
The script needs to be called with an command line argument that specifies the amount of processes that should be initiated, e.g.
$ julia hpc_example.jl 4
for starting it with 4 processes. In the SLURM script this is done with the environment variable SLURM_NTASKS
.
DISCLAIMER: In theory, not all of these @everywhere
-commands should be needed, but somehow it was not working for me without them. The scripts below are tested on a HPC running SLURM for resource allocation.
cluster = true
using Distributed
if cluster
using ClusterManagers
N_tasks = parse(Int, ARGS[1])
N_worker = N_tasks
addprocs(SlurmManager(N_worker))
else
using Plots
end
using JLD2, FileIO, Clustering, StatsBase, Parameters
@everywhere begin
using LightGraphs
using DifferentialEquations
using Distributions
using MCBB
N = 40
K = 0.5
nd = Normal(0.5, 0.1) # distribution for eigenfrequencies # mean = 0.5Hz, std = 0.1Hz
w_i_par = rand(nd,N)
net = erdos_renyi(N, 0.2)
A = adjacency_matrix(net)
ic = zeros(N)
ic_dist = Uniform(-pi,pi)
kdist = Uniform(0,8)
ic_ranges = ()->rand(ic_dist)
N_ics = 10000
K_range = ()->rand(kdist)
pars = kuramoto_network_parameters(K, w_i_par, N, A)
rp = ODEProblem(kuramoto_network, ic, (0.,2000.), pars)
tail_frac = 0.9
end
@everywhere function eval_ode_run_kuramoto(sol, i)
(N_dim, __) = size(sol)
state_filter = collect(1:N)
eval_funcs = [empirical_1D_KL_divergence_hist]
global_eval_funcs = []
eval_ode_run(sol, i, state_filter, eval_funcs, global_eval_funcs, cyclic_setback=true)
end
if cluster
ko_mcp = DEMCBBProblem(rp, ic_ranges, N_ics, pars, (:K, K_range), eval_ode_run_kuramoto, tail_frac)
kosol = solve(ko_mcp)
@save "kuramoto_sol.jld2" kosol ko_mcp
else
@load "kuramoto_sol.jld2" kosol ko_mcp
D = @time distance_matrix(kosol, parameter(ko_mcp), [1,0.75,0.5,1]);
k = 4
db_eps = median((KNN_dist_relative(D)))
db_res = dbscan(D,db_eps,k)
cluster_members = cluster_membership(ko_mcp,db_res,0.2,0.05);
plot(cluster_members)
savefig("kura-membership")
end
Slurm Script
#!/bin/bash
#SBATCH --qos=short
#SBATCH --job-name=MCBB-test
#SBATCH --account=YOURACCOUNT
#SBATCH --output=MCBB-test-%j-%N.out
#SBATCH --error=MCBB-test-%j-%N.err
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --workdir=YOUR/WORK/DIR
#SBATCH --mail-type=END
#SBATCH --mail-user=YOUR@MAIL.ADDRESS
echo "------------------------------------------------------------"
echo "SLURM JOB ID: $SLURM_JOBID"
echo "$SLURM_NTASKS tasks"
echo "------------------------------------------------------------"
module load julia/1.0.2
module load hpc/2015
julia /path/to/the/script/hpc_example.jl $SLURM_NTASKS
The module load ...
commands are for loading julia
on the HPC that I use, there might be different kind of setups for your HPC.
Various Tips & Tricks
- At least on some HPCs such a script can fail after any julia library updates when the packages are not precompiled yet. Just run the script again or force a precompilation from an interactive session.