Newer
Older
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 14.11.6
==========================
-- If SchedulerParameters value of bf_min_age_reserve is configured, then
a newly submitted job can start immediately even if there is a higher
priority non-runnable job which has been waiting for less time than
bf_min_age_reserve.
-- qsub wrapper modified to export "all" with -V option
-- RequeueExit and RequeueExitHold configuration parameters modified to accept
numeric ranges. For example "RequeueExit=1,2,3,4" and "RequeueExit=1-4" are
equivalent.
-- Correct the job array specification parser to accept brackets in job array
expression (e.g. "123_[4,7-9]").
-- Fix for misleading job submit failure errors sent to users. Previous error
could indicate why specific nodes could not be used (e.g. too small memory)
when other nodes could be used, but were not for another reason.
-- Fix squeue --array to display correctly the array elements when the
% separator is specified at the array submission time.
-- Fix priority from not being calculated correctly due to memory issues.
-- Fix a transient pending reason 'JobId=job_id has invalid QOS'.
-- A non-administrator change to job priority will not be persistent except
for holding the job. User's wanting to change a job priority on a persistent
basis should reset it's "nice" value.
-- Print buffer sizes as unsigned values when failed to pack messages.
Brian Christiansen
committed
-- Fix race condition where sprio would print factors without weights applied.
-- Document the sacct option JobIDRaw which for arrays prints the jobid instead
of the arrayTaskId.
-- Allow users to modify MinCPUsNode, MinMemoryNode and MinTmpDiskNode of
their own jobs.
-- Increase the jobid print field in SQUEUE_FORMAT in
opt_modulefiles_slurm.in.
-- Enable compiling without optimizations and with debugging symbols by
default. Disable this by configuring with --disable-debug.
-- job_submit/lua plugin: Add mail_type and mail_user fields.
-- Use standard statvfs(2) syscall if available, in preference to
non-standard statfs.
-- Add a new option -U/--Users to sshare to display only users
information, parent and ancestors are not printed.
-- Purge 50000 records at a time so that locks can released periodically.
-- Fix potentially uninitialized variables
-- ALPS - Fix issue where a frontend node could become unresponsive and never
added back into the system.
-- Gate epilog complete messages as done with other messages
-- If we have more than a certain number of agents (50) wait longer when gating
rpcs.
-- FrontEnd - ping non-responding or down nodes.
-- switch/cray: If CR_PACK_NODES is configured, then set the environment
variable "PMI_CRAY_NO_SMP_ENV=1"
-- Fix invalid memory reference in SlurmDBD when putting a node up.
-- Allow opening of plugstack.conf even when a symlink.
-- Fix scontrol reboot so that rebooted nodes will not be set down with reason
'Node xyz unexpectedly rebooted' but will be correctly put back to service.
-- CRAY - Throttle the post NHC operations as to not hog the job write lock
if many steps/jobs finish at once.
-- Disable changes to GRES count while jobs are running on the node.
-- CRAY - Fix issue with scontrol reconfig.
-- slurmd: Remove wrong reporting of "Error reading step ... memory limit".
The logic was treating success as an error.
* Changes in Slurm 14.11.5
==========================
-- Correct the squeue command taking into account that a node can
have NULL name if it is not in DNS but still in slurm.conf.
-- Fix slurmdbd regression which would cause a segfault when a node is set
down with no reason.
-- BGQ - Fix issue with job arrays not being handled correctly
in the runjob_mux plugin.
-- Print FAIR_TREE, if configured, in "scontrol show config" output for
PriorityFlags.
-- Add SLURM_JOB_GPUS environment variable to those available in the Prolog.
-- Load lua-5.2 library if using lua5.2 for lua job submit plugin.
-- GRES logic: Prevent bad node_offset due to not preserving no_consume flag.
-- Fix wrong variables used in the wrapper functions needed for systems that
don't support strong_alias
-- Fix code for apple computers SOL_TCP is not defined
-- Cray/BASIL - Check for mysql credentials in /root/.my.cnf.
Brian Christiansen
committed
-- Fix sprio showing wrong priority for job arrays until priority is
recalculated.
-- Account to batch step all CPUs that are allocated to a job not
just one since the batch step has access to all CPUs like other steps.
Brian Christiansen
committed
-- Fix job getting EligibleTime set before meeting dependency requirements.
-- Correct the initialization of QOS MinCPUs per job limit.
-- Set the debug level of information messages in cgroup plugin to debug2.
-- For job running under a debugger, if the exec of the task fails, then
cancel its I/O and abort immediately rather than waiting 60 seconds for
I/O timeout.
-- Fix associations not getting default qos set until after a restart.
-- Set the value of total_cpus not to be zero before invoking
acct_policy_job_runnable_post_select.
-- MySQL - When requesting cluster resources, only return resources for the
cluster(s) requested.
-- Add TaskPluginParam=autobind=threads option to set a default binding in the
case that "auto binding" doesn't find a match.
-- Introduce a new SchedulerParameters variable nohold_on_prolog_fail.
If configured don't requeue jobs on hold is a Prolog fails.
-- Make it so sched_params isn't read over and over when an epilog complete
message comes in
-- Fix squeue -L <licenses> not filtering out jobs with licenses.
-- Changed the implementation of xcpuinfo_abs_to_mac() be identical
_abs_to_mac() to fix CPUs allocation using cpuset cgroup.
-- Improve the explanation of the unbuffered feature in the
srun man page.
-- Make taskplugin=cgroup work for core spec. needed to have task/cgroup
before.
-- Fix reports not using the month usage table.
-- BGQ - Sanity check given for translating small blocks into slurm bg_records.
-- Fix bug preventing the requeue/hold or requeue/special_exit of job from the
completing state.
-- Cray - Fix for launching batch step within an existing job allocation.
-- Cray - Add ALPS_APP_ID_ENV environment variable.
-- Increase maximum MaxArraySize configuration parameter value from 1,000,001
to 4,000,001.
-- Added new SchedulerParameters value of bf_min_age_reserve. The backfill
scheduler will not reserve resources for pending jobs until they have
been pending for at least the specified number of seconds. This can be
valuable if jobs lack time limits or all time limits have the same value.
-- Fix support for --mem=0 (all memory of a node) with select/cons_res plugin.
-- Fix bug that can permit someone to kill job array belonging to another user.
-- Don't set the default partition on a license only reservation.
-- Show a NodeCnt=0, instead of NO_VAL, in "scontrol show res" for a license
only reservation.
-- BGQ - When using static small blocks make sure when clearing the job the
block is set up to it's original state.
-- Start job allocation using lowest numbered sockets for block task
distribution for consistency with cyclic distribution.
* Changes in Slurm 14.11.4
==========================
-- Make sure assoc_mgr locks are initialized correctly.
-- Correct check of enforcement when filling in an association.
-- Make sacctmgr print out classification correctly for clusters.
-- Add array_task_str to the perlapi job info.
-- Fix for slurmctld abort with GRES types configured and no CPU binding.
-- Fix for GRES scheduling where count > 1 per topology type (or GRES types).
-- Make CR_ONE_TASK_PER_CORE work correctly with task/affinity.
-- job_submit/pbs - Fix possible deadlock.
-- job_submit/lua - Add "alloc_node" to job information available.
-- Fix memory leak in mysql accounting when usage rollup happens.
-- If users specify ALL together with other variables using the
--export sbatch/srun command line option, propagate the users'
environ to the execution side.
-- Fix job array scheduling anomaly that can stop scheduling of valid tasks.
-- Fix perl api tests for libslurmdb to work correctly.
-- Remove some misleading logs related to non-consumable GRES.
-- Allow --ignore-pbs to take effect when read as an #SBATCH argument.
Brian Christiansen
committed
-- Fix Slurmdb::clusters_get() in perl api from not returning information.
-- Fix TaskPluginParam=Cpusets from logging error message about not being able
to remove cpuset dir which was already removed by the release_agent.
-- Fix the file name substitution for job stderr when %A, %a %j and %u
are specified.
-- Remove minor warning when compiling slurmstepd.
-- Fix database resources so they can add new clusters to them after they have
initially been added.
-- Use the slurm_getpwuid_r wrapper of getpwuid_r to handle possible
interrupts.
-- Correct the scontrol man page and command listing which node states can
be set by the command.
-- Stop sacct from printing non-existent stat information for
Front End systems.
-- Correct srun and acct_gather.conf man pages, mention Filesystem instead
of Lustre.
-- When a job using multiple partition starts send to slurmdbd only
the partition in which the job runs.
-- ALPS - Fix depth for MemoryAllocation in BASIL with CLE 5.2.3.
-- Fix assoc_mgr hash to deal with users that don't have a uid yet when making
reservations.
-- When a job uses multiple partition set the environment variable
SLURM_JOB_PARTITION to be the one in which the job started.
-- Print spurious message about the absence of cgroup.conf at log level debug2
instead of info.
-- Enable CUDA v7.0+ use with a Slurm configuration of TaskPlugin=task/cgroup
ConstrainDevices=yes (in cgroup.conf). With that configuration
CUDA_VISIBLE_DEVICES will start at 0 rather than the device number.
-- Fix job array logic that can cause slurmctld to abort.
-- Report job "shared" field properly in scontrol, squeue, and sview.
-- If a job is requeued because of RequeueExit or RequeueExitHold sent event
REQUEUED to slurmdbd.
-- Fix build if hwloc is in non-standard location.
-- Fix slurmctld job recovery logic which could cause the last task in a job
array to be lost.
-- Fix slurmctld initialization problem which could cause requeue of the last
task in a job array to fail if executed prior to the slurmctld loading
the maximum size of a job array into a variable in the job_mgr.c module.
-- Fix fatal in controller when deleting a user association of a user which
Brian Christiansen
committed
had been previously removed from the system.
-- MySQL - If a node state and reason are the same on a node state change
don't insert a new row in the event table.
Brian Christiansen
committed
-- Fix issue with "sreport cluster AccountUtilizationByUser" when using
PrivateData=users.
-- Fix perlapi tests for libslurm perl module.
-- MySQL - Fix potential issue when PrivateData=Usage and a normal user
runs certain sreport reports.
* Changes in Slurm 14.11.3
==========================
-- Prevent vestigial job record when canceling a pending job array record.
-- Fix job array hash table bug, could result in slurmctld infinite loop or
invalid memory reference.
-- In srun honor ntasks_per_node before looking at cpu count when the user
doesn't request a number of tasks.
-- Fix ghost job when submitting job after all jobids are exhausted.
-- MySQL - Enhanced coordinator security checks.
-- Fix for task/affinity if an admin configures a node for having threads
but then sets CPUs to only represent the number of cores on the node.
-- Make it so previous versions of salloc/srun work with newer versions
of Slurm daemons.
-- Avoid delay on commit for PMI rank 0 to improve performance with some
MPI implementations.
-- auth/munge - Correct logic to read old format AccountingStoragePass.
-- Reset node "RESERVED" state as appropriate when deleting a maintenance
reservation.
-- Prevent a job manually suspended from being resumed by gang scheduler once
free resources are available.
-- Prevent invalid job array task ID value if a task is started using gang
scheduling.
-- Fix documentation bugs in slurm.conf.5. DenyAccount should be DenyAccounts.
-- For backward compatibility with older versions of OMPI not compiled
with --with-pmi restore the SLURM_STEP_RESV_PORTS in the job environment.
-- Update the html documentation describing the integration with openmpi.
-- Fix sacct when searching by nodelist.
-- Fix cosmetic info statements when dealing with a job array task instead of
a normal job.
-- BGQ - Put print statement under a DebugFlag. This was just an oversight.
-- BLUEGENE - Remove check that would erroneously remove the CONFIGURING
flag from a job while the job is waiting for a block to boot.
-- Fix segfault in slurmstepd when job exceeded memory limit.
-- Fix race condition that could start a job that is dependent upon a job array
before all tasks of that job array complete.
* Changes in Slurm 14.11.2
==========================
-- Fix issue with association hash not getting the correct index which
could result in seg fault.
-- Avoid huge malloc if GRES configured with "Type" and huge "Count".
-- Fix jobs from starting in overlapping reservations that won't finish before
a "maint" reservation begins.
-- When node gets drained while in state mixed display its status as draining
in sinfo output.
-- Allow priority/multifactor to work with sched/wiki(2) if all priorities
have no weight. This allows for association and QOS decay limits to work.
-- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
Brian Christiansen
committed
-- Fix scancel to be able to cancel multiple jobs that are space delimited.
-- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
a fatal error. This partially reverts logic added in version 14.03.9.
-- sview - Fix displaying of suspended steps elapsed times.
-- Increase number of messages that get cached before throwing them away
when the DBD is down.
-- Fix jobs from starting in overlapping reservations that won't finish before
a "maint" reservation begins.
-- Restore GRES functionality with select/linear plugin. It was broken in
version 14.03.10.
-- Fix bug with GRES having multiple types that can cause slurmctld abort.
Brian Christiansen
committed
-- Fix squeue issue with not recognizing "localhost" in --nodelist option.
-- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
when running from cache.
-- Add smap support for job arrays and larger job ID values.
-- Fix possible race condition when attempting to use QOS on a system running
accounting_storage/filetxt.
-- Fix issue with accounting_storage/filetxt and job arrays not being printed
correctly.
-- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
for error condition rather than errno, which might have a vestigial error
code.
-- Improve information recording for jobs deferred due to advanced
reservation.
-- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
mpi/pmi2 setup to support launching.
* Changes in Slurm 14.11.1
==========================
-- Get libs correct when doing the xtree/xhash make check.
-- Update xhash/tree make check to work correctly with current code.
-- Remove the reference 'experimental' for the jobacct_gather/cgroup
plugin.
-- Add QOS manipulation examples to the qos.html documentation page.
-- If 'squeue -w node_name' specifies an unknown host name print
an error message and return 1.
-- Fix race condition in job_submit plugin logic that could cause slurmctld to
deadlock.
-- Job wait reason of "ReqNodeNotAvail" expanded to identify unavailable nodes
(e.g. "ReqNodeNotAvail(Unavailable:tux[3-6])").
* Changes in Slurm 14.11.0
==========================
-- ALPS - Fix issue with core_spec warning.
-- Allow multiple partitions to be specified in sinfo -p.
-- Install the service files in /usr/lib/systemd/system.
-- MYSQL - Add id_array_job and id_resv keys to $CLUSTER_job_table. THIS
COULD TAKE A WHILE TO CREATE THE KEYS SO BE PATIENT.
-- CRAY - Resize bitmaps on a restart and find we have more blades
than before.
-- Add new eio API function for removing unused connections.
-- ALPS - Fix issue where batch allocations weren't correctly confirmed or
released.
-- Define DEFAULT_MAX_TASKS_PER_NODE based on MAX_TASKS_PER_NODE from
slurm.h as per documentation.
-- Update the FAQ about relocating slurmctld.
-- In the memory cgroup enable memory.use_hierarchy in the cgroup root.
-- Add SLURM_CLUSTER_NAME to job environment.
* Changes in Slurm 14.11.0rc3
=============================
-- Allow envs to override autotools binaries in autogen.sh
-- If the jobs pends with DependencyNeverSatisfied keep it pending even after
the job which it was depending upon was cleaned.
-- Let operators (in addition to user root and SlurmUser) see job script for
other user's jobs.
-- Perl API modified to return node state of MIXED rather than ALLOCATED if
only some CPUs allocated.
-- Double Munge connect retry timeout from 1 to 2 seconds.
-- sview - Remove unneeded code that was resolved globally in commit
98e24b0dedc.
-- Collect and report the accounting of the batch step and its children.
-- Add configure checks for faccessat and eaccess, and make use of one of
them if available.
-- Make configure --enable-developer also set --enable-debug
-- Introduce a SchedulerParameters variable kill_invalid_depend, if set
then jobs pending with invalid dependency are going to be terminated.
-- Move spank_user_task() call in slurmstepd after the task_g_pre_launch()
so that the task affinity information is available to spank.
-- Make /etc/init.d/slurm script return value 3 when the daemon is
not running. This is required by Linux Standard Base Core
Specification 3.1
* Changes in Slurm 14.11.0rc2
=============================
-- Logs for jobs which are explicitly requeued will say so rather than saying
that a node in their allocation failed.
-- Updated the documentation about the remote licenses served by
the Slurm database.
-- Insure that slurm_spank_exit() is only called once from srun.
-- Change the signature of net_set_low_water() to use 4 bytes instead of 8.
-- Export working_cluster_rec in libslurmdb.so as well as move some function
definitions needed for drmaa.
-- If using cons_res or serial cause a fatal in the plugin instead of causing
the SelectTypeParameters to magically set to CR_CPU.
-- Enhance task/affinity auto binding to consider tasks * cpus-per-task.
-- Fix regression the priority/multifactor which would cause memory corruption.
Issue is only in rc1.
-- Add PrivateData value of "cloud". If set, powered down nodes in the cloud
will be visible.
-- Sched/backfill - Eliminate clearing start_time of running jobs.
-- Fix various backwards compatibility issues.
-- If failed to launch a batch job, requeue it in hold.
* Changes in Slurm 14.11.0rc1
=============================
-- When using cgroup name the batch step as step_batch instead of
batch_4294967294
-- Changed LEVEL_BASED priority to be "Fair_Tree"
-- BGQ - Add cnode based reservations.
-- Alongside totalview_jobid implement totalview_stepid available
to sattach.
-- Add ability to include other files in slurm.conf based upon the ClusterName.
-- Add reservation information in the sacct and sreport output.
-- Add job priority calculation check for overflow and fix memory leak.
-- Add SchedulerParameters option of pack_serial_at_end to put serial jobs at
the end of the available nodes rather than using a best fit algorithm.
-- Allow regular users to view default sinfo output when
privatedata=reservations is set.
-- PrivateData=reservation modified to permit users to view the reservations
which they have access to (rather then preventing them from seeing ANY
reservation).
-- job_submit/lua: Fix job_desc set field logic
* Changes in Slurm 14.11.0pre5
==============================
-- Fix sbatch --export=ALL, it was treated by srun as a request to explicitly
export only the environment variable named "ALL".
-- Improve scheduling of jobs in reservations that overlap other reservations.
-- Modify sgather to make global file systems easier to configure.
-- Added sacctmgr reconfig to reread the slurmdbd.conf in the slurmdbd.
-- Modify scontrol job operations to accept comma delimited list of job IDs.
Applies to job update, hold, release, suspend, resume, requeue, and
requeuehold operations.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The
lua script no longer needs to explicitly load meta-tables, but information
is available directly using names slurm.reservations, slurm.jobs,
slurm.log_info, etc. Also, the job_submit.lua script is reloaded when
updated without restarting the slurmctld daemon.
-- Allow users to specify --resv_ports to have value 0.
-- Cray MPMD (Multiple-Program Multiple-Data) support completed.
-- Added ability for "scontrol update" to references jobs by JobName (and
filtered optionally by UserID).
-- Add support for an advanced reservation start time that remains constant
relative to the current time. This can be used to prevent the starting of
longer running jobs on select nodes for maintenance purpose. See the
reservation flag "TIME_FLOAT" for more information.
-- Enlarge the jobid field to 18 characters in squeue output.
-- Added "scontrol write config" option to save a copy of the current
configuration in a file containing a time stamp.
-- Eliminate native Cray specific port management. Native Cray systems must
now use the MpiParams configuration parameter to specify ports to be used
for commmunications. When upgrading Native Cray systems from version 14.03,
all running jobs should be killed and the switch_cray_state file (in
SaveStateLocation of the nodes where the slurmctld daemon runs) must be
explicitly deleted.
* Changes in Slurm 14.11.0pre4
==============================
-- Added job array data structure and removed 64k array size restriction.
-- Added SchedulerParameters options of bf_max_job_array_resv to control how
many tasks of a job array should have resources reserved for them.
-- Added more validity checking of incoming job submit requests.
-- Added srun --export option to set/export specific environment variables.
-- Scontrol modified to print separate error messages for job arrays with
different exit codes on the different tasks of the job array. Applies to
job suspend and resume operations.
-- Fix race condition in CPU frequency set with job preemption.
-- Always call select plugin on step termination, even if the job is also
complete.
-- Srun executable names beginning with "." will be resolved based upon the
working directory and path on the compute node rather than the submit node.
-- Add node state string suffix of "$" to identify nodes in maintenance
reservation or scheduled for reboot. This applies to scontrol, sinfo,
and sview commands.
-- Enable scontrol to clear a nodes's scheduled reboot by setting its state
to "RESUME".
-- As per sbatch and srun documentation when the --signal option is used
signal only the steps and unless, in the case, of a batch job B is
specified in which case signal only the batch script.
-- Modify AuthInfo configuration parameter to accept credential lifetime
option.
-- Modify crypto/munge plugin to use socket and timeout specified in AuthInfo.
-- If we have a state for a step on completion put that in the database
instead of guessing off the exit_code.
-- Added squeue -P/--priority option that can be used to display pending jobs
in the same order as used by the Slurm scheduler even if jobs are submitted
to multiple partitions (job is reported once per usable partition).
-- Improve the pending reason description for various QOS limits. For each
QOS limit that causes a job to be pending print its specific reason.
For example if job pends because of GrpCpus the squeue command will
print QOSGrpCpuLimit as pending reason.
-- sched/backfill - Set expected start time of job submitted to multiple
partitions to the earliest start time on any of the partitions.
-- Introduce a MAX_BATCH_REQUEUE define that indicates how many times a job
can be requeued upon prolog failure. When the number is reached the job
is put on hold with reason JobHoldMaxRequeue.
-- Add sbatch job array option to limit the number of simultaneously running
tasks from a job array (e.g. "--array=0-15%4").
-- Implemented a new QOS limit MinCPUs. Users running under a QOS must
request a minimum number of CPUs which is at least MinCPUs otherwise
their job will pend.
-- Introduced a new pending reason WAIT_QOS_MIN_CPUS to reflect the new QOS
limit.
-- Job array dependency based upon state is now dependent upon the state of
the array as a whole (e.g. afterok requires ALL tasks to complete
sucessfully, afternotok is true if ANY tasks does not complete successfully,
and after requires all tasks to at least be started).
-- The srun -u/--unbuffered options set the stdout of the task launched
by srun to be line buffered.
-- The srun options -/--label and -u/--unbuffered can be specified together.
This limitation has been removed.
-- Provide sacct display of gres accounting information per job.
-- Change the node status size from uin16_t to uint32_t.
* Changes in Slurm 14.11.0pre3
==============================
-- Move xcpuinfo.[c|h] to the slurmd since it isn't needed anywhere else
and will avoid the need for all the daemons to link to libhwloc.
-- Add memory test to job_submit/partition plugin.
-- Added new internal Slurm functions xmalloc_nz() and xrealloc_nz(), which do
not initialize the allocated memory to zero for improved performance.
-- Modify hostlist function to dynamically allocate buffer space for improved
performance.
-- In the job_submit plugin: Remove all slurmctld locks prior to job_submit()
being called for improved performance. If any slurmctld data structures are
read or modified, add locks directly in the plugin.
-- Added PriorityFlag LEVEL_BASED described in doc/html/level_based.shtml
-- If Fairshare=parent is set on an account, that account's children will be
effectively reparented for fairshare calculations to the first parent of
their parent that is not Fairshare=parent. Limits remain the same,
only it's fairshare value is affected.
* Changes in Slurm 14.11.0pre2
==============================
-- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
allows jobs to use specialized resources on nodes allocated to them if the
job designates --core-spec=0.
-- Add new SchedulerParameters option of build_queue_timeout to throttle how
much time can be consumed building the job queue for scheduling.
-- Added HealthCheckNodeState option of "cycle" to cycle through the compute
nodes over the course of HealthCheckInterval rather than running all at
the same time.
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins execution.
-- Added squeue -O/--Format option that makes all job and step fields available
for printing.
-- Improve database slurmctld entry speed dramatically.
-- Add "CPUs" count to output of "scontrol show step".
-- scancel -b signals only the batch step neither any other step nor any
children of the shell script.
-- MySQL - enforce NO_ENGINE_SUBSTITUTION
-- Added CpuFreqDef configuration parameter in slurm.conf to specify the
default CPU frequency and governor to be set at job end.
-- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and
TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and
srun commands.
-- In slurm.conf add the parameter SrunPortRange=min-max. If this is configured
then srun will use its dynamic ports only from the configured range.
-- Make debug_flags 64 bit to handle more flags.
* Changes in Slurm 14.11.0pre1
==============================
-- Modify etc/cgroup.release_common.example to set specify full path to the
scontrol command. Also find cgroup mount point by reading cgroup.conf file.
-- Improve qsub wrapper support for passing environment variables.
-- Modify sdiag to report Slurm RPC traffic by user, type, count and time
consumed.
-- In select plugins, stop triggering extra logging based upon the debug flag
-- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
to control how frequently and for how long the backfill scheduler will
relinquish its locks.
-- To support larger numbers of jobs when the StateSaveDirectory is on a
file system that supports a limited number of files in a directory, add a
subdirectory called "hash.#" based upon the last digit of the job ID.
-- More gracefully handle missing batch script file. Just kill the job and do
not drain the compute node.
-- Add support for allocation of GRES by model type for heterogenous systems
(e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
-- Record and enable display of nodes anticipated to be used for pending jobs.
-- Modify squeue --start option to print the nodes expected to be used for
pending job (in addition to expected start time, etc.).
-- Add association hash to the assoc_mgr.
-- Better logic to handle resized jobs when the DBD is down.
-- Introduce MemLimitEnforce yes|no in slurm.conf. If set no Slurm will
not terminate jobs if they exceed requested memory.
-- Add support for non-consumable generic resources for resources that are
limited, but can be shared between jobs.
-- Introduce 5 new Slurm errors in slurm_errno.h related to job to better
-- Modify scontrol to print error message for each array task when updating
the entire array.
-- Added gres_drain and gres_used fields to node_info_t.
-- Added PriorityParameters configuration parameter in slurm.conf.
-- Introduce automatic job requeue policy based on exit value. See RequeueExit
and RequeueExitHold descriptions in slurm.conf man page.
-- Modify slurmd to cache launched job IDs for more responsive job suspend and
gang scheduling.
-- Permit jobs steps full control over cpu_bind options if specialized cores
are included in the job allocation.
-- Added ChosLoc configuration parameter to specifiy the pathname of the
Chroot OS tool.
-- Sent SIGCONT/SIGTERM when a job is selected for preemption with GraceTime
configured rather than waiting for GraceTime to be reached before notifying
the job.
-- Do not resume a job with specialized cores on a node running another job
with specialized cores (only one can run at a time).
-- Add specialized core count to job suspend/resume calls.
-- task/affinity and task/cgroup - Correct specialized core task binding with
user supplied invalid CPU mask or map.
-- Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance,
PowerSave or UserSpace).
-- Add support for a job step's CPU governor and/or frequency to be reset on
suspend/resume (or gang scheduling). The default for an idle CPU will now
be "ondemand" rather than "userspace" with the lowest frequency (to recover
from hard slurmd failures and support gang scheduling).
-- Added PriorityFlags option of Calulate_Running to continue recalculating
the priority of running jobs.
-- Replace round-robin front-end node selection with least-loaded algorithm.
-- CRAY - Improve support of XC30 systems when running natively.
-- Add new node configuration parameters CoreSpecCount, CPUSpecList and
MemSpecLimit which support the reservation of resources for system use
with Linux cgroup.
-- Add child_forked() function to the slurm_acct_gather_profile plugin to
close open files, leaving application with no extra open file descriptors.
-- Cray/ALPS system - Enable backup controller to run outside of the Cray to
accept new job submissions and most other operations on the pending jobs.
-- Have sacct print job and task array id's for job arrays.
-- If <sys/prctl.h> is present name major threads in slurmctld, for
example backfill
thread: slurmctld_bckfl, the rpc manager: slurmctld_rpcmg etc.
The name can be seen for example using top -H.
-- Provide more precise error message when job allocation can not be satisfied
(e.g. memory, disk, cpu count, etc. rather than just "node configuration
not available").
-- Create a new DebugFlags named TraceJobs in slurm.conf to print detailed
information about jobs in slurmctld. The information include job ids, state
and node count.
-- When a job dependency can never be satisfied do not cancel the job but keep
pending with reason WAIT_DEP_INVALID (DependencyNeverSatisfied).
* Changes in Slurm 14.03.12
===========================
-- Make it so previous versions of salloc/srun work with newer versions
of Slurm daemons.
-- Avoid delay on commit for PMI rank 0 to improve performance with some
MPI implementations.
-- Correct the sbatch pbs parser to process -j.
-- Squeue modified to not merge tasks of a job array if their wait reasons
differ.
-- Use the slurm_getpwuid_r wrapper of getpwuid_r to handle possible
interrupts.
-- Allow --ignore-pbs to take effect when read as an #SBATCH argument.
* Changes in Slurm 14.03.11
===========================
-- ALPS - Fix depth for Memory items in BASIL with CLE 5.2
(changed starting in 5.2.3).
-- ALPS - Fix issue when tracking memory on a PerNode basis instead of
PerCPU.
-- Modify assoc_mgr_fill_in_qos() to allow for a flag to know if the QOS read
lock was locked outside of the function or not.
-- Give even better estimates on pending node count if no node count
is requested.
-- Fix jobcomp/mysql plugin for MariaDB 10+/Mysql 5.6+ to work with reserved
work "partition".
-- If requested (scontrol reboot node_name) reboot a node even if it has
an maintenance reservation that is not active yet.
-- Fix issue where exclusive allocations wouldn't lay tasks out correctly
with CR_PACK_NODES.
-- Do not requeue a batch job from slurmd daemon if it is killed while in
the process of being launched (a race condition introduced in v14.03.9).
-- Do not let srun overwrite SLURM_JOB_NUM_NODES if already in an allocation.
Brian Christiansen
committed
-- Prevent a job's end_time from being too small after a basil reservation
error.
Brian Christiansen
committed
-- Fix sbatch --ntasks-per-core option from setting invalid
SLURM_NTASKS_PER_CORE environment value.
-- Prevent scancel abort when no job satisfies filter options.
-- ALPS - Fix --ntasks-per-core option on multiple nodes.
-- Double max string that Slurm can pack from 16MB to 32MB to support
larger MPI2 configurations.
-- Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
a fatal error. This partially reverts logic added in version 14.03.9.
-- sview - Fix displaying of suspended steps elapsed times.
-- Increase number of messages that get cached before throwing them away
when the DBD is down.
Brian Christiansen
committed
-- Fix jobs from starting in overlapping reservations that won't finish before
a "maint" reservation begins.
-- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
-- Restore GRES functionality with select/linear plugin. It was broken in
version 14.03.10.
-- Fix possible race condition when attempting to use QOS on a system running
accounting_storage/filetxt.
-- Sanity check for Correct QOS on startup.
* Changes in Slurm 14.03.10
===========================
-- Treat non-zero SlurmSchedLogLevel without SlurmSchedLogFile as a fatal
error.
-- Correct sched_config.html documentation SchedulingParameters
should be SchedulerParameters.
-- When using gres and cgroup ConstrainDevices set correct access
permission for the batch step.
-- Fix minor memory leak in jobcomp/mysql on slurmctld reconfig.
-- Fix bug that prevented preservation of a job's GRES bitmap on slurmctld
restart or reconfigure (bug was introduced in 14.03.5 "Clear record of a
job's gres when requeued" and only applies when GRES mapped to specific
files).
-- BGQ: Fix race condition when job fails due to hardware failure and is
requeued. Previous code could result in slurmctld abort with NULL pointer.
-- Prevent negative job array index, which could cause slurmctld to crash.
-- Fix issue with squeue/scontrol showing correct node_cnt when only tasks
are specified.
-- Check the status of the database connection before using it.
-- ALPS - If an allocation requests -n set the BASIL -N option to the
amount of tasks / number of node.
-- ALPS - Don't set the env var APRUN_DEFAULT_MEMORY, it is not needed anymore.
-- Give better estimates on pending node count if no node count is requested.
-- BLUEGENE - Fix issue where requeuing jobs could cause an assert.
* Changes in Slurm 14.03.9
==========================
-- If slurmd fails to stat(2) the configuration print the string describing
the error code.
-- Fix for mixing core base reservations with whole node based reservations
to avoid overlapping erroneously.
-- BLUEGENE - Remove references to Base Partition.
-- sview - If compiled on a non-bluegene system then used to view a BGQ fix
to allow sview to display blocks correctly.
-- Fix bug in update reservation. When modifying the reservation the end time
was set incorrectly.
-- The start time of a reservation that is in ACTIVE state cannot be modified.
-- Update the cgroup documentation about release agent for devices.
-- MYSQL - fix for setting up preempt list on a QOS for multiple QOS.
-- Correct a minor error in the scancel.1 man page related to the
--signal option.
-- Enhance the scancel.1 man page to document the sequence of signals sent
-- Fix slurmstepd core dump if the cgroup hierarchy is not completed
when terminating the job.
-- Fix hostlist_shift to be able to give correct node names on names with a
different number of dimensions than the cluster.
-- BLUEGENE - Fix invalid pointer in corner case in the plugin.
-- Make sure on a reconfigure the select information for a node is preserved.
-- Correct logic to support job GRES specification over 31 bits (problem
in logic converting int to uint32_t).
-- Remove logic that was creating GRES bitmap for node when not needed (only
needed when GRES mapped to specific files).
-- BLUEGENE - Fix sinfo -tr before it would only print idle nodes correctly.
-- BLUEGENE - Fix for licenses_only reservation on bluegene systems.
-- sview - Verify pointer before using strchr.
-- -M option on tools talking to a Cray from a non-Cray fixed.
-- CRAY - Fix rpmbuild issue for missing file slurm.conf.template.
-- Fix race condition when dealing with removing many associations at
different times when reservations are using the associations that are
being deleted.
-- When a node's state is set to power_down/power_up, then execute
SuspendProgram/ResumeProgram even if previously executed for that node.
-- Fix logic determining when job configuration (i.e. running node power up
logic) is complete.
-- Setting the state of a node in powered down state node to "resume" will
no longer cause it to reboot, but only clear the "drain" state flag.
Brian Christiansen
committed
-- Fix srun documentation to remove SLURM_NODELIST being equivalent as the -w
option (since it isn't).
-- Fix issue with --hint=nomultithread and allocations with steps running
arbitrary layouts (test1.59).
-- PrivateData=reservation modified to permit users to view the reservations
which they have access to (rather then preventing them from seeing ANY
reservation). Backport from 14.11 commit 77c2bd25c.
-- Fix PrivateData=reservation when using associations to give privileges to
a reservation.
-- Better checking to see if select plugin is linear or not.
-- Add support for time specification of "fika" (3 PM).
-- Provide better estimate of minimum node count for pending jobs using more
job parameters.
-- ALPS - Add SubAllocate to cray.conf file for those who like the way <=2.5
did the ALPS reservation.
-- Safer check to avoid invalid reads when shutting down the slurmctld with
lots of jobs.
-- Fix minor memory leak in the backfill scheduler when shutting down.
-- Add ArchiveResvs to the output of sacctmgr show config and init the variable
on slurmdbd startup.
-- SLURMDBD - Only set the archive flag if purging the object
(i.e ArchiveJobs PurgeJobs). This is only a cosmetic change.
-- Fix for job step memory allocation logic if step requests GRES and memory
is not allocations are not managed.
-- Fix sinfo to display mixed nodes as allocated in '%F' output.
-- Sview - Fix cpu and node counts for partitions.
-- Ignore NO_VAL in SLURMDB_PURGE_* macros.
-- ALPS - Don't drain nodes if epilog fails. It leaves them in drain state
with no way to get them out.
-- Fix issue with task/affinity oversubscribing cpus erroneously when
using --ntasks-per-node.
-- MYSQL - Fix load of archive files.
-- Treat Cray MPI job calling exit() without mpi_fini() as fatal error for
that specific task and let srun handle all timeout logic.
-- Fix small memory leak in jobcomp/mysql.
-- Correct tracking of licenses for suspended jobs on slurmctld reconfigure or
restart.
-- If failed to launch a batch job requeue it in hold.
* Changes in Slurm 14.03.8
==========================
-- Fix minor memory leak when Job doesn't have nodes on it (Meaning the job
has finished)
-- Fix sinfo/sview to be able to query against nodes in reserved and other
states.
-- Make sbatch/salloc read in (SLURM|(SBATCH|SALLOC))_HINT in order to
handle sruns in the script that will use it.
-- srun properly interprets a leading "." in the executable name based upon
the working directory of the compute node rather than the submit host.
-- Fix Lustre misspellings in hdf5 guide
Kilian Cavalotti
committed
-- Fix wrong reference in slurm.conf man page to what --profile option should
be used for AcctGatherFilesystemType.
-- Update HDF5 document to point out the SlurmdUser is who creates the
ProfileHDF5Dir directory as well as all it's sub-directories and files.
-- CRAY NATIVE - Remove error message for srun's ran inside an salloc that
had --network= specified.
-- Defer job step initiation of required GRES are in use by other steps rather
than immediately returning an error.
-- Deprecate --cpu_bind from sbatch and salloc. These never worked correctly
and only caused confusion since the cpu_bind options mostly refer to a
step we opted to only allow srun to set them in future versions.
-- Modify sgather to work if Nodename and NodeHostname differ.
-- Changed use of JobContainerPlugin where it should be JobContainerType.
-- Fix for possible error if job has GRES, but the step explicitly requests a
GRES count of zero.
-- Make "srun --gres=none ..." work when executed without a job allocation.
-- Change the global eio_shutdown_time to a field in eio handle.
-- Advanced reservation fixes for heterogeneous systems, especially when
reserving cores.
-- If --hint=nomultithread is used in a job allocation make sure any srun's
ran inside the allocation can read the environment correctly.
-- If batchdir can't be made set errno correctly so the slurmctld is notified
correctly.
-- Remove repeated batch complete if batch directory isn't able to be made
since the slurmd will send the same message.
-- sacctmgr fix default format for list transactions.
-- BLUEGENE - Fix backfill issue with backfilling jobs on blocks already
reserved for higher priority jobs.
-- When creating job arrays the job specification files for each elements
are hard links to the first element specification files. If the controller
fails to make the links the files are copied instead.
-- Fix error handling for job array create failure due to inability to copy
job files (script and environment).
-- Added patch in the contribs directory for integrating make version 4.0 with
Slurm and renamed the previous patch "make-3.81.slurm.patch".
-- Don't wait for an update message from the DBD to finish before sending rc
message back. In slow systems with many associations this could speed
responsiveness in sacctmgr after adding associations.
-- Eliminate race condition in enforcement of MaxJobCount limit for job arrays.
-- Fix anomaly allocating cores for GRES with specific device/CPU mapping.
-- cons_res - When requesting exclusive access make sure we set the number
of cpus in the job_resources_t structure so as nodes finish the correct
cpu count is displayed in the user tools.
-- If the job_submit plugin calls take longer than 1 second to run, print a
warning.
-- Make sure transfer_s_p_options transfers all the portions of the
s_p_options_t struct.
-- Correct the srun man page, the SLURM_CPU_BIND_VERBOSE, SLURM_CPU_BIND_TYPE
SLURM_CPU_BIND_LIST environment variable are set only when task/affinity
plugin is configured.
-- sacct - Initialize variables correctly to avoid incorrect structure
reference.
-- Performance adjustment to avoid calling a function multiple times when it
only needs to be called once.
-- Give more correct waiting reason if job is waiting on association/QOS
MaxNode limit.
-- DB - When sending lft updates to the slurmctld only send non-deleted lfts.
-- BLUEGENE - Fix documentation on how to build a reservation less than
a midplane.
-- If Slurmctld fails to read the job environment consider it an error
and abort the job.
-- Add the name of the node a job is running on to the message printed by
slurmstepd when terminating a job.
-- Remove unsupported options from sacctmgr help and the dump function.
-- Update sacctmgr man page removing reference to obsolete parameter
MaxProcSecondsPerJob.
-- Added more validity checking of incoming job submit requests.
* Changes in Slurm 14.03.7
==========================
-- Add note to MaxNodesPerUser and multiple jobs running on the same node
counting as multiple nodes.
-- PerlAPI - fix renamed call from slurm_api_set_conf_file to
slurm_conf_reinit.
-- Fix gres race condition that could result in job deallocation error message.
-- Correct NumCPUs count for jobs with --exclusive option.
-- When creating reservation with CoreCnt, check that Slurm uses
SelectType=select/cons_res, otherwise don't send the request to slurmctld
and return an error.
-- Save the state of scheduled node reboots so they will not be lost should the
slurmctld restart.
-- In select/cons_res plugin - Insure the node count does not exceed the task
count.
-- switch/nrt - Do not explicitly unload windows for a job on termination,
only unload its table (which automatically unloads its windows).
-- When HealthCheckNodeState is configured as IDLE don't run the
HealthCheckProgram for nodes in any other states than IDLE.
-- Remove all slurmctld locks prior to job_submit() being called in plugins.
If any slurmctld data structures are read or modified, add locks directly
in the plugin.
-- Minor sanity check to verify the string sent in isn't NULL when using
bit_unfmt.
-- CRAY NATIVE - Fix issue on heavy systems to only run the NHC once per
job/step completion.
-- Remove unneeded step cleanup for pending steps.
-- Fix issue where if a batch job was manually requeued the batch step
information wasn't stored in accounting.
-- When job is release from a requeue hold state clean up its previous
exit code.
-- Correct the srun man page about how the output from the user application
is sent to srun.
-- Increase the timeout of the main thread while waiting for the i/o thread.
Allow up to 180 seconds for the i/o thread to complete.
-- When using sacct -c to read the job completion data compute the correct
job elapsed time.
-- Perl package: Define some missing node states.
-- When using AccountingStorageType=accounting_storage/mysql zero out the
database index for the array elements avoiding duplicate database values.
-- Reword the explanation of cputime and cputimeraw in the sacct man page.
-- JobCompType allows "jobcomp/mysql" as valid name but the code used
"job_comp/mysql" setting an incorrect default database.
-- Try to load libslurm.so only when necessary.
-- When nodes scheduled for reboot, set state to DOWN rather than FUTURE so
they are still visible to sinfo. State set to IDLE after reboot completes.
-- Apply BatchStartTimeout configuration to task launch and avoid aborting
srun commands due to long running Prolog scripts.
-- Fix minor memory leaks when freeing node_info_t structure.
-- If a batch script is requeued and running steps get correct exit code/signal
previous it was always -2.
-- If step exitcode hasn't been set display with sacct the -2 instead
of acting like it is a signal and exitcode.
-- Send calculated step_rc for batch step instead of raw status as
done for normal steps.
-- If a job times out, set the exit code in accounting to 1 instead of the
signal 1.
-- Update the acct_gather.conf.5 man page removing the reference to
InfinibandOFEDFrequency.
-- Fix gang scheduling for jobs submitted to multiple partitions.
-- Enable srun to submit job to multiple partitions.
-- Update slurm.conf man page. When Epilog or Prolog fail the node state
is set ro DRAIN.
-- Start a job in the highest priority partition possible, even if it requires
preempting other jobs and delaying initiation, rather than using a lower
priority partition. Previous logic would preempt lower priority jobs, but
then might start the job in a lower priority partition and not use the
resources released by the preempted jobs.
-- Fix SelectTypeParameters=CR_PACK_NODES for srun making both job and step
resource allocation.
-- BGQ - Make it possible to pack multiple tasks on a core when not using
the entire cnode.
-- MYSQL - if unable to connect to mysqld close connection that was inited.
-- DBD - when connecting make sure we wait MessageTimeout + 5 since the
timeout when talking to the Database is the same timeout so a race
condition could occur in the requesting client when receiving the response
if the database is unresponsive.
* Changes in Slurm 14.03.6
==========================
-- Added examples to demonstrate the use of the sacct -T option to the man
page.
-- Fix for regression in 14.03.5 with sacctmgr load when Parent has "'"
around it.
-- Update comments in sacctmgr dump header.
-- Fix for possible abort on change in GRES configuration.
-- CRAY - fix modules file, (backport from 14.11 commit 78fe86192b.
-- Fix race condition which could result in requeue if batch job exit and node
registration occur at the same time.
-- switch/nrt - Unload job tables (in addition to windows) in user space mode.
-- Differentiate between two identical debug messages about purging vestigial
job scripts.
-- If the socket used by slurmstepd to communicate with slurmd exist when
slurmstepd attempts to create it, for example left over from a previous
requeue or crash, delete it and recreate it.
* Changes in Slurm 14.03.5
==========================
-- If a srun runs in an exclusive allocation and doesn't use the entire
allocation and CR_PACK_NODES is set layout tasks appropriately.
-- Correct Shared field in job state information seen by scontrol, sview, etc.
-- Print Slurm error string in scontrol update job and reset the Slurm errno
before each call to the API.
-- Fix task/cgroup to handle -mblock:fcyclic correctly
-- Fix for core-based advanced reservations where the distribution of cores
across nodes is not even.
-- Fix issue where association maxnodes wouldn't be evaluated correctly if a
QOS had a GrpNodes set.
-- GRES fix with multiple files defined per line in gres.conf.
-- When a job is requeued make sure accounting marks it as such.
-- Print the state of requeued job as REQUEUED.
-- Fix if a job's partition was taken away from it don't allow a requeue.
-- Make sure we lock on the conf when sending slurmd's conf to the slurmstepd.
-- Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted
file.
-- sched/backfill: Correct job start time estimate with advanced reservations.
-- Error message added when in proctrack/cgroup the step freezer path isn't
able to be destroyed for debug.
-- Added extra index's into the database for better performance when
deleting users.
-- Fix issue with wckeys when tracking wckeys, but not enforcing them,
you could get multiple '*' wckeys.
-- Fix bug which could report to squeue the wrong partition for a running job
that is submitted to multiple partitions.
-- Report correct CPU count allocated to job when allocated whole node even if
not using all CPUs.
-- If job's constraints cannot be satisfied put it in pending state with reason
BadConstraints and don't remove it.
-- sched/backfill - If job started with infinite time limit, set its end_time
one year in the future.
-- Clear QOS GrpUsedCPUs when resetting raw usage if QOS is not using any cpus.
-- Remove log message left over from debugging.
-- When using CR_PACK_NODES fix make --ntasks-per-node work correctly.
-- Report correct partition associated with a step if the job is submitted to
multiple partitions.
-- Fix to allow removing of preemption from a QOS
-- If the proctrack plugins fail to destroy the job container print an error
message and avoid to loop forever, give up after 120 seconds.
-- Make srun obey POSIX convention and increase the exit code by 128 when the
process terminated by a signal.
-- Sanity check for acct_gather_energy/rapl
-- If the proctrack plugins fail to destroy the job container print an error
message and avoid to loop forever, give up after 120 seconds.
-- If the sbatch command specifies the option --signal=B:signum sent the signal
to the batch script only.
-- If we cancel a task and we have no other exit code send the signal and
exit code.
-- Added note about InnoDB storage engine being used with MySQL.
-- Set the job exit code when the job is signaled and set the log level to
debug2() when processing an already completed job.
-- Reset diagnostics time stamp when "sdiag --reset" is called.
-- squeue and scontrol to report a job's "shared" value based upon partition