Slurm job scheduling system
Refer to the Slurm documentation for more details.
According to the sbatch
man page, we can specify job
dependencies with --dependency=<dependency list>
.
This could allow us to avoid having a master Python process running for the
duration of the optimisation.
See, .e.g.,
afterok:job_id[:job_id...]
.But we still need to maintain the internal state of the optimisation routine, so perhaps this isn’t so helpful.
We can also --job-name=<jobname>
to slurm
when scheduling jobs, which
we can use to ensure that the jobs listed in the process queue are more
informative than mere job numbers.
We can also use --output=<filename pattern>
to record output to specific
file(s).
Finally, according to the documentation for --time=<time>
:
A time limit of zero requests that no time limit be imposed.
This argument accepts time limits in a variety of formats, including the ability to specify limits in terms of the number of days.
Note
Partitions such as “cascade” (on which we’re running MCAS) can enforce
limits, and the cascade partition’s time limit is “30-00:00:0” (i.e.,
30 days). This can be identified by running sinfo -p cascade
.