Newer
Older
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 17.02.4
==========================
-- Do not attempt to schedule jobs after changing the power cap if there are
already many active threads.
-- Job expansion example in FAQ enhanced to demonstrate operation in
heterogeneous environments.
Alejandro Sanchez
committed
-- Prevent scontrol crash when operating on array and no-array jobs at once.
-- knl_cray plugin: Log incomplete capmc output for a node.
-- knl_cray plugin: Change capmc parsing of mcdram_pct from string to number.
-- When rebooting a node and using the PrologFlags=alloc make sure the
prolog is ran after the reboot.
-- node_features/knl_generic - If a node is rebooted for a pending job, but
fails to enter the desired NUMA and/or MCDRAM mode then drain the node and
requeue the job.
-- node_features/knl_generic disable mode change unless RebootProgram
configured.
-- Add new burst_buffer function bb_g_job_revoke_alloc() to be executed
if there was a failure after the initial resource allocation. Does not
release previously allocated resources.
-- Test if the node_bitmap on a job is NULL when testing if the job's nodes
are ready. This will be NULL is a job was revoked while beginning.
-- Fix incorrect lock levels when testing when job will run or updating a job.
-- Add missing locks to job_submit/pbs plugin when updating a jobs
dependencies.
-- Add min_memory_per_node|cpu to the job_submit/lua plugin to deal with lua
not being able to deal with pn_min_memory being a uint64_t. Scripts are
urged to change to these new variables avoid issue. If not set the
variables will be 'nil'.
-- Calculate priority correctly when 'nice' is given.
-- Fix minor typos in the documentation.
-- node_features/knl_cray: Preserve non-KNL active features if slurmctld
reconfigured while node boot in progress.
-- node_features/knl_generic: Do not repeatedly log errors when trying to read
KNL modes if not KNL system.
-- Add missing QOS read lock to backfill scheduler.
-- When doing a dlopen on liblua only attempt the version compiled against.
-- Fix null-dereference in sreport cluster ulitization when configured with
memory-leak-debug.
-- Fix Partition info in 'scontrol show node'. Previously duplicate partition
names, or Partitions the node did not belong to could be displayed.
-- Fix it so the backup slurmdbd will take control correctly.
* Changes in Slurm 17.02.3
==========================
-- Increase --cpu_bind and --mem_bind field length limits.
-- Fix segfault when using AdminComment field with job arrays.
-- Clear Dependency field when all dependencies are satisfied.
-- Add --array-unique to squeue which will display one unique pending job
array element per line.
-- Reset backfill timers correctly without skipping over them in certain
circumstances.
-- When running the "scontrol top" command, make sure that all of the user's
jobs have a priority that is lower than the selected job. Previous logic
would permit other jobs with equal priority (no jobs with higher priority).
-- Fix perl api so we always get an allocation when calling Slurm::new().
-- Fix issue with cleaning up cpuset and devices cgroups when multiple steps
end at the same time.
-- Document that PriorityFlags option of DEPTH_OBLIVIOUS precludes the use of
FAIR_TREE.
-- Fix issue if an invalid message came in a Slurm daemon/command may abort.
-- Make it impossible to use CR_CPU* along with CR_ONE_TASK_PER_CORE. The
options are mutually exclusive.
-- ALPS - Fix scheduling when ALPS doesn't agree with Slurm on what nodes
are free.
-- When removing a partition make sure it isn't part of a reservation.
-- Fix seg fault if loading attempting to load non-existent burstbuffer plugin.
-- Fix to backfill scheduling with respect to QOS and association limits. Jobs
submitted to multiple partitions are most likley to be effected.
-- sched/backfill: Improve assoc_limit_stop configuration parameter support.
-- sched/backfill: Fix bug related to advanced reservations and the need to
reboot nodes to change KNL mode.
-- Preempt plugins - fix check for 'preempt_youngest_first' option.
-- Preempt plugins - fix incorrect casts in preempt_youngest_first mode.
-- Preempt/job_prio - fix incorrect casts in sort function.
-- Fix to make task/affinity work with ldoms where there are more than 64
cpus on the node.
-- When using node_features/knl_generic make it so the slurmd doesn't segfault
when shutting down.
Tim Wickberg
committed
-- Fix potential double-xfree() when using job arrays that can lead to
slurmctld crashing.
-- Fix priority/multifactor priorities on a slurmctld restart if not using
accounting_storage/[mysql|slurmdbd].
-- Fix NULL dereference reported by CLANG.
-- Update proctrack documentation to strongly encourage use of
proctrack/cgroup.
-- Fix potential memory leak if job fails to begin after nodes have been
selected for a job.
-- Handle a job that made it out of the select plugin without a job_resrcs
pointer.
-- Fix potential race condition when persistent connections are being closed at
shutdown.
-- Fix incorrect locks levels when submitting a batch job or updating a job
in general.
-- CRAY - Move delay waiting for job cleanup to after we check once.
-- MYSQL - Fix memory leak when loading archived jobs into the database.
-- Fix potential race condition when starting the priority/multifactor plugin's
decay thread.
-- Sanity check to make sure we have started a job in acct_policy.c before we
clear it as started.
-- Allow reboot program to use arguments.
-- Message Aggr - Remove race condition on slurmd shutdown with respects to
destroying a mutex.
-- Fix updating job priority on multiple partitions to be correct.
-- Don't remove admin comment when updating a job.
Dominik Bartkiewicz
committed
-- Return error when bad separator is given for scontrol update job licenses.
* Changes in Slurm 17.02.2
==========================
-- Update hyperlink to LBNL Node Health Check program.
-- burst_buffer/cray - Add support for line continuation.
-- If a job is cancelled by the user while it's allocated nodes are being
reconfigured (i.e. the capmc_resume program is rebooting nodes for the job)
and the node reconfiguration fails (i.e. the reboot fails), then don't
requeue the job but leave it in a cancelled state.
-- capmc_resume (Cray resume node script) - Do not disable changing a node's
active features if SyscfgPath is configured in the knl.conf file.
-- Improve the srun documentation for the --resv-ports option.
-- burst_buffer/cray - Fix parsing for discontinuous allocated nodes. A job
allocation of "20,22" must be expressed as "20\n22".
-- Fix rare segfault when shutting down slurmctld and still sending data to
the database.
-- Fix gres output of a job if it is updated while pending to be displayed
correctly with Slurm tools.
-- Fix missing unlock when job_list doesn't exist when starting priority/
multifactor.
-- Fix segfault if slurmctld is shutting down and the slurmdbd plugin was
in the middle of setting db_indexes.
-- Add ESLURM_JOB_SETTING_DB_INX to errno to note when a job can't be updated
because the dbd is setting a db_index.
-- Fix possible double insertion into database when a job is updated at the
moment the dbd is assigning a db_index.
-- Fix memory error when updating a job's licenses.
Danny Auble
committed
-- Fix seff to work correctly with non-standard perl installs.
-- Export missing slurmdbd_defs_[init|fini] needed for libslurmdb.so to work.
-- Fix sacct from returning way more than requested when querying against a job
array task id.
-- Fix double read lock of tres when updating gres or licenses on a job.
-- Make sure locks are always in place when calling
assoc_mgr_make_tres_str_from_array.
-- Prevent slurmctld SEGV when creating reservation with duplicated name.
-- Consider QOS flags Partition[Min|Max]Nodes when doing backfill.
-- Fix slurmdbd_defs.c to not have half symbols go to libslurm.so and the
other half go to libslurmdb.so.
-- Fix 'scontrol show jobs' to remove an errant newline when 'Switches' is
printed.
-- Better code for handling memory required by a task on a heterogeneous
system.
-- Fix regression in 17.02.0 with respects to GrpTresMins on a QOS or
Association.
-- Schedule interactive jobs quicker.
-- Perl API - correct value of MEM_PER_CPU constant to correctly handle
memory values.
-- Fix 'flags' variable to be 32 bit from the old 16 bit value in the perl api.
-- Export sched_nodes for a job in the perl api.
-- Improve error output when updating a reservation that has already started.
-- Fix --ntasks-per-node issue with srun so DenyOnLimit would work correctly.
-- node_features/knl_cray plugin - Fix memory leak.
Dominik Bartkiewicz
committed
-- Fix wrong cpu_per_task count issue on heterogeneous system when dealing with
steps.
-- Fix double free issue when removing usage from an association with sacctmgr.
-- Fix issue with SPANK plugins attempting to set null values as environment
variables, which leads to the command segfaulting on newer glibc versions.
-- Fix race condition on slurmctld startup when plugins have not gone through
init() ahead of the rpc_manager processing incoming messages.
-- job_submit/lua - expose admin_comment field.
-- Allow AdminComment field to be set by the job_submit plugin.
-- Allow AdminComment field to be changed by any Administrator.
Alejandro Sanchez
committed
-- MYSQL - Streamline job flush sql when doing a clean start on the slurmctld.
-- Fix potential infinite loop when talking to the DBD when shutting down
the slurmctld.
-- Fix MCS filter.
-- Make it so pmix can be included in the plugin rpm without having to
specify --with-pmix.
-- MYSQL - Fix initial load when not using he DBD.
-- Fix scontrol top to not make jobs priority 0 (held).
-- Downgrade info message about exceeding partition time limit to a debug2.
* Changes in Slurm 17.02.1-2
============================
-- Replace clock_gettime with time(NULL) for very old systems without the call.
* Changes in Slurm 17.02.1
==========================
-- Modify pam module to work when configured NodeName and NodeHostname differ.
-- Update to sbatch/srun man pages to explain the "filename pattern" clearer
-- Add %x to sbatch/srun filename pattern to represent the job name.
-- job_submit/lua - Add job "bitflags" field.
-- Update slurm.spec file to note obsolete RPMs.
-- Fix deadlock scenario when dumping configuration in the slurmctld.
-- Remove unneeded job lock when running assoc_mgr cache. This lock could
cause potential deadlock when/if TRES changed in the database and the
slurmctld wasn't made aware of the change. This would be very rare.
-- Fix missing locks in gres logic to avoid potential memory race.
Dominik Bartkiewicz
committed
-- If gres is NULL on a job don't try to process it when returning detailed
information about a job to scontrol.
-- Fix print of consumed energy in sstat when no energy is being collected.
-- Print formatted tres string when creating/updating a reservation.
-- Fix issues with QOS flags Partition[Min|Max]Nodes to work correctly.
-- Prevent manipulation of the cpu frequency and governor for batch or
extern steps. This addresses an issue where the batch step would
inadvertently set the cpu frequency maximum to the minimum value
supported on the node.
-- Convert a slurmctd power management data structure from array to list in
order to eliminate the possibility of zombie child suspend/resume
processes.
-- Burst_buffer/cray - Prevent slurmctld daemon abort if "paths" operation
fails. Now job will be held. Update job update time when held.
-- Fix issues with QOS flags Partition[Min|Max]Nodes to work correctly.
-- Refactor slurmctld agent logic to eliminate some pthreads.
-- Added "SyscfgTimeout" parameter to knl.conf configuration file.
-- Fix for CPU binding for job steps run under a batch job.
* Changes in Slurm 17.02.0
==========================
-- job_submit/lua - Make "immediate" parameter available.
-- Fix srun I/O race condtion to eliminate a error message that might be
generated if the application exits with outstanding stdin.
-- Fix regression when purging/archiving jobs/events.
-- Add new job state JOB_OOM indicating Out Of Memory condition as detected
by task/cgroup plugin.
-- If QOS has been added to the system go refigure out Deny/AllowQOS on
partitions.
-- Deny job with duplicate GRES requested.
-- Fix loading super old assoc_mgr usage without segfaulting.
-- CRAY systems: Restore TaskPlugins order of task/cray before task/cgroup.
-- Task/cray: Treat missing "mems" cgroup with "debug" messages rather than
"error" messages. The file may be missing at step termination due to a
change in how cgroups are released at job/step end.
-- Fix for job constraint specification with counts, --ntasks-per-node value,
and no node count.
-- Fix ordering of step task allocation to fill in a socket before going into
another one.
-- Fix configure to not require C++
Tim Wickberg
committed
-- job_submit/lua - Remove access to slurmctld internal reservation fields of
job_pend_cnt and job_run_cnt.
-- Prevent job_time_limit enforcement from blocking other internal operations
if a large number of jobs need to be cancelled.
-- Add 'preempt_youngest_order' option to preempt/partition_prio plugin.
-- Fix controller being able to talk to a pre-released DBD.
-- Added ability to override the invoking uid for "scontrol update job"
by specifying "--uid=<uid>|-u <uid>".
-- Changed file broadcast "offset" from 32 to 64 bits in order to support files
over 2 GB.
-- slurm.spec - do not install init scripts alongside systemd service files.
Alejandro Sanchez
committed
-- Add port info to 'sinfo' and 'scontrol show node'.
-- Fix errant definition of USE_64BIT_BITSTR which can lead to core dumps.
-- Move BatchScript to end of each job's information when using
"scontrol -dd show job" to make it more readable.
-- Add SchedulerParameters configuration parameter of "default_gbytes", which
treats numeric only (no suffix) value for memory and tmp disk space as being
in units of Gigabytes. Mostly for compatability with LSF.
-- Fix race condtion in srun/sattach logic which would prevent srun from
terminating.
-- Bitstring operations are now 64bit instead of 32bit.
-- Replace hweight() function in bitstring with faster version.
-- scancel would treat a non-numeric argument as the name of jobs to be
cancelled (a non-documented feature). Cancelling jobs by name now require
the "--jobname=" command line argument.
-- scancel modified to note that no jobs satisfy the filter options when the
--verbose option is used along with one or more job filters (e.g. "--qos=").
-- Change _pack_cred to use pack_bit_str_hex instead of pack_bit_fmt for
better scalability and performance.
-- Add BootTime configuration parameter to knl.conf file to optimize resource
allocations with respect to required node reboots.
-- Add node_features_p_boot_time() to node_features plugin to optimize
scheduling with respect to node reboots.
-- Avoid allocating resources to a job in the event that its run time plus boot
time (if needed) extent into an advanced reservation.
-- Burst_buffer/cray - Avoid stage-out operation if job never started.
-- node_features/knl_cray - Add capability to detected Uncorrectable Memory
Errors (UME) and if detected then log the event in all job and step stderr
with a message of the form:
error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 ***
Similar logic added to node_features/knl_generic in version 17.02.0pre4.
-- If job is allocated nodes which are powered down, then reset job start time
when the nodes are ready and do not charge the job for power up time.
-- Add the ability to purge transactions from the database.
-- Add support for requeue'ing of federated jobs (BETA).
-- Add support for interactive federated jobs (BETA).
-- Add the ability to purge rolled up usage from the database.
-- Properly set SLURM_JOB_GPUS environment variable for Prolog.
* Changes in Slurm 17.02.0pre4
==============================
-- Add support for per-partitiion OverTimeLimit configuration.
-- Add --mem_bind option of "sort" to run zonesort on KNL nodes at step start.
-- Add LaunchParameters=mem_sort option to configure running of zonesort
-- Add "FreeSpace" information for each pool to the "scontrol show burstbuffer"
output. Required changes to the burst_buffer_info_t data structure.
-- Add new node state flag of NODE_STATE_REBOOT for node reboots triggered by
"scontrol reboot" commands. Previous logic re-used NODE_STATE_MAINT flag,
which could lead to inconsistencies. Add "ASAP" option to "scontrol reboot"
command that will drain a node in order to reboot it as soon as possible,
then return it to service.
-- Allow unit conversion routine to convert 1024M to 1G.
-- switch/cray plugin - change legacy spool directory location.
-- Add new PriorityFlags option of INCR_ONLY, which prevents a job's priority
from being decremented.
-- Make it so we don't purge job start messages until after we purge step
messages. Hopefully this will reduce the number of messages lost when
filling up memory when the database/DBD is down.
-- Added SchedulingParameters option of "bf_job_part_count_reserve". Jobs below
the specified threshold will not have resources reserved for them.
-- If GRES are configured with file IDs, then "scontrol -d show node" will
not only identify the count of currently allocated GRES, but their specific
index numbers (e.g. "GresUsed=gpu:alpha:2(IDX:0,2),gpu:beta:0(IDX:N/A)").
Ditto for job information with "scontrol -d show job".
-- Add "GresEnforceBind=Yes" to "scontrol show job" output if so configured.
-- Add support for SALLOC_CONSTRAINT, SBATCH_CONSTRAINT and SLURM_CONSTRAINT
environment variables to set default constraints for salloc, sbatch and
srun commands respectively.
-- Provide limited support for the MemSpecLimit configuration parameter without
the task/cgroup plugin.
-- node_features/knl_generic - Add capability to detected Uncorrectable Memory
Errors (UME) and if detected then log the event in all job and step stderr
with a message of the form:
error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 ***
-- Add SLURM_JOB_GID to TaskProlog environment.
-- burst_buffer/cray - Remove leading zeros from node ID lists passed to
dw_wlm_cli program.
-- Add "Partitions" field to "scontrol show node" output.
-- Remove sched/wiki and sched/wiki2 plugins and associated code.
-- Remove SchedulerRootFilter option and slurm_get_root_filter() API call.
-- Add SchedulerParameters option of spec_cores_first to select specialized
cores from the lowest rather than highest number cores and sockets.
-- Add PrologFlags option of Serial to disable concurrent launch of
Prolog and Epilog scripts.
-- Fix security issue caused by insecure file path handling triggered by the
failure of a Prolog script. To exploit this a user needs to anticipate or
cause the Prolog to fail for their job. CVE-2016-10030.
* Changes in Slurm 17.02.0pre3
==============================
-- Add srun host & PID to job step data structures.
-- Avoid creating duplicate pending step records for the same srun command.
-- Rewrite srun's logic for pending steps for better efficiency (fewer RPCs).
-- Added new SchedulerParameters options step_retry_count and step_retry_time
to control scheduling behaviour of job steps waiting for resources.
-- Optimize resource allocation logic for --spread-job job option.
-- Modify cpu_bind and mem_bind map and mask options to accept a repetition
count to better support large task count. For example:
"mask_mem:0x0f*2,0xf0*2" is equivalent to "mask_mem:0x0f,0x0f,0xf0,0xf0".
-- Add support for --mem_bind=prefer option to prefer, but not restrict memory
-- Add mechanism to constrain kernel memory allocation using cgroups. New
cgroup.conf parameters added: ConstrainKmemSpace, MaxKmemPercent, and
MinKmemSpace.
-- Correct invokation of man2html, which previously could cause FreeBSD builds
to hang.
-- MYSQL - Unconditionally remove 'ignore' clause from 'alter ignore'.
-- Modify service files to not start Slurm daemons until after Munge has been
started.
NOTE: If you are not using Munge, but are using the "service" scripts to
start Slurm daemons, then you will need to remove this check from the
etc/slurm*service scripts.
-- Do not process SALLOC_HINT, SBATCH_HINT or SLURM_HINT environment variables
if any of the following salloc, sbatch or srun command line options are
specified: -B, --cpu_bind, --hint, --ntasks-per-core, or --threads-per-core.
-- burst_buffer/cray: Accept new jobs on backup slurmctld daemon without access
to dw_wlm_cli command. No burst buffer actions will take place.
-- Do not include SLURM_JOB_DERIVED_EC, SLURM_JOB_EXIT_CODE, or
SLURM_JOB_EXIT_CODE in PrologSlurmctld environment (not available yet).
-- Cray - set task plugin to fatal() if task/cgroup is not loaded after
task/cray in the TaskPlugin settings.
-- Remove separate slurm_blcr package. If Slurm is built with BLCR support,
the files will now be part of the main Slurm packages.
-- Replace sjstat, seff and sjobexit RPM packages with a single "contribs"
package.
-- Remove long since defunct slurmdb-direct scripts.
-- Add SbcastParameters configuration option to control default file
destination directory and compression algorithm.
-- Add new SchedulerParameter (max_array_tasks) to limit the maximum number of
tasks in a job array independently from the maximum task ID (MaxArraySize).
Alejandro Sanchez
committed
-- Fix issue where number of nodes is not properly allocated when sbatch and
salloc are requested with -n tasks < hosts from -w hostlist or from -N.
* Changes in Slurm 17.02.0pre2
==============================
-- Add new RPC (REQUEST_EVENT_LOG) so that slurmd and slurmstepd can log events
through the slurmctld daemon.
-- Remove sbatch --bb option. That option was never supported.
-- Automatically clean up task/cgroup cpuset and devices cgroups after steps
are completed.
-- Add federation read/write locks.
-- Limit job purge run time to 1 second at a time.
-- The database index for jobs is now 64 bits. If you happen to be close to
4 billion jobs in your database you will want to update your slurmctld at
the same time as your slurmdbd to prevent roll over of this variable as
it is 32 bit previous versions of Slurm.
-- Optionally lock slurmstepd in memory for performance reasons and to avoid
possible SIGBUS if the daemon is paged out at the time of a Slurm upgrade
(changing plugins). Controlled via new LaunchParameters options of
slurmstepd_memlock and slurmstepd_memlock_all.
-- Add event trigger on burst buffer errors (see strigger man page,
--burst_buffer option).
-- Add job AdminComment field which can only be set by a Slurm administrator.
-- Add salloc, sbatch and srun option of --delay-boot=<time>, which will
temporarily delay booting nodes into the desired state for a job in the
hope of using nodes already in the proper state which will be available at
a later time.
-- Add job burst_buffer_state and delay_boot fields to scontrol and squeue
output. Also add ability to modify delay_boot from scontrol.
-- Fix for node's available TRES array getting filled in with configured GRES
-- Log if job --bb option contains any unrecognized content.
-- Display configured and allocated TRES for nodes in scontrol show nodes.
-- Change all memory values (in MB) to uint64_t to accommodate > 2TB per node.
-- Add MailDomain configuration parameter to qualify email addresses.
-- Refactor the persistent connections within the federation code to use
the same logic that was found in the slurmdbd. Now both functionalities
share the same code.
-- Remove BlueGene/L and BlueGene/P support.
-- Add "flag" field to launch_tasks_request_msg. Remove the following fields
(moved into flags): multi_prog, task_flags, user_managed_io, pty,
buffered_stdio, and labelio.
-- Add protocol version to slurmd startup communications for slurmstepd to
permit changes in the protocol.
* Changes in Slurm 17.02.0pre1
==============================
-- burst_buffer/cray - Add support for rounding up the size of a buffer reqeust
if the DataWarp configuration "equalize_fragments" is used.
-- Rename "in" to "input" in slurm_step_io_fds data structure defined in
slurm.h. This is needed to avoid breaking Python with by using one of its
keywords in a Slurm data structure.
-- Remove eligible_time from jobcomp/elasticsearch.
-- Enable the deletion of a QOS, even if no clusters have been added to the
database.
-- SlurmDBD - change all timestamps to bigint from int to solve Y2038 problem.
-- Add salloc/sbatch/srun --spread-job option to distribute tasks over as many
nodes as possible. This also treats the --ntasks-per-node option as a
maximum value.
-- Add ConstrainKmemSpace to cgroup.conf, defaulting to yes, to allow
cgroup Kmem enforcement to be disabled while still using ConstrainRAMSpace.
-- Add support for sbatch --bbf option to specify a burst buffer input file.
-- Added burst buffer support for job arrays. Add new SchedulerParameters
configuration parameter of bb_array_stage_cnt=# to indicate how many pending
tasks of a job array should be made available for burst buffer resource
allocation.
-- Fix small memory leak when a job fails to load from state save.
-- Fix invalid read when attempting to delete clusters from database with
running jobs.
-- Fix small memory leak when deleting clusters from database.
-- Add SLURM_ARRAY_TASK_COUNT environment variable. Total number of tasks in a
job array (e.g. "--array=2,4,8" will set SLURM_ARRAY_TASK_COUNT=3).
-- Add new sacctmgr commands: "shutdown" (shutdown the server), "list stats"
(get server statistics) "clear stats" (clear server statistics).
Tim Wickberg
committed
-- Restructure job accounting query to use 'id_job in (1, 2, .. )' format
instead of logically equivalent 'id_job = 1 || id_job = 2 || ..' .
-- Added start_delay field to jobcomp/elasticsearch.
-- In order to support federated jobs, the MaxJobID configuration parameter
default value has been reduced from 2,147,418,112 to 67,043,328 and its
maximum value is now 67,108,863. Upon upgrading, any pre-existing jobs that
have a job ID above the new range will continue to run and new jobs will get
job IDs in the new range.
-- Added infrastructure for setting up federations in database and establishing
connections between federation clusters.
* Changes in Slurm 16.05.11
===========================
-- burst_buffer/cray - Add support for line continuation.
-- If a job is cancelled by the user while it's allocated nodes are being
reconfigured (i.e. the capmc_resume program is rebooting nodes for the job)
and the node reconfiguration fails (i.e. the reboot fails), then don't
requeue the job but leave it in a cancelled state.
-- capmc_resume (Cray resume node script) - Do not disable changing a node's
active features if SyscfgPath is configured in the knl.conf file.
-- Fix memory error when updating a job's licenses.
-- Fix double read lock of tres when updating gres or licenses on a job.
-- Fix regression in 16.05.10 with respects to GrpTresMins on a QOS or
Association.
-- ALPS - Fix scheduling when ALPS doesn't agree with Slurm on what nodes
are free.
-- Fix seg fault if loading attempting to load non-existent burstbuffer plugin.
-- Fix to backfill scheduling with respect to QOS and association limits. Jobs
submitted to multiple partitions are most likley to be effected.
* Changes in Slurm 16.05.10-2
=============================
-- Replace clock_gettime with time(NULL) for very old systems without the call.
* Changes in Slurm 16.05.10
===========================
Alejandro Sanchez
committed
-- Record job state as PREEMPTED instead of TIMEOUT when GraceTime is reached.
Dominik Bartkiewicz
committed
-- task/cgroup - print warnings to stderr when --cpu_bind=verbose is enabled
and the requested processor affinity cannot be set.
-- power/cray - Disable power cap get and set operations on DOWN nodes.
-- Jobs preempted with PreemptMode=REQUEUE were incorrectly recorded as
REQUEUED in the accounting.
-- PMIX - Use volatile specifier to avoid flag caching and lock the flag to
make sure it is protected.
-- PMIX/PMI2 - Make it possible to use %n or %h in a spool dir.
-- burst_buffer/cray - Support default pool which is not the first pool
reported by DataWarp and log in Slurm when pools that are added or removed
from DataWarp.
-- Insure job does not start running before PrologSlurmctld is complete and
node is booted (all nodes for interactive job, at least first node for batch
job without burst buffers).
-- Fix minor memory leak in the slurmctld when removing a QOS.
-- burst_buffer/cray - Do not execute "pre_run" operation until after all nodes
are booted and ready for use.
Dominik Bartkiewicz
committed
-- scontrol - return an error when attempting to use the +=/-+ syntax to
update a field where this is not appropriate.
-- Fix task/affinity to work correctly with --ntasks-per-socket.
-- Honor --ntasks-per-node and --ntasks option when used with job constraints
that contain node counts.
-- Prevent deadlocked slurmstepd processes due to unsafe use of regcomp with
older glibc versions.
-- Fix squeue when SLURM_BITSTR_LEN=0 is set in the user environment.
-- Fix comments in acct_policy.c to reflect actual variables instead of
old ones.
-- Fix correct variables when validating GrpTresMins on a QOS.
-- Better debug output when a job is being held because of a GrpTRES[Run]Min
limits.
-- Fix correct state reason when job can't run 'safely' because of an
association GrpWall limit.
-- Squeue always loads new data if user_id option specified
-- Fix for possible job ID parsing failure and abort.
-- If node boot in progress when slurmctld daemon is restarted, then allow
sufficient time for reboot to complete and not prematurely DOWN the node as
"Not responding".
-- For job resize, correct logic to build "resize" script with new values.
Previously the scripts were based upon the original job size.
-- Fix squeue to not limit the size of partition, burst_buffer, exec_host, or
reason to 32 chars.
-- Fix potential packing error when packing a NULL slurmdb_clus_res_rec_t.
-- Fix potential packing errors when packing a NULL slurmdb_reservation_cond_t.
-- Burst_buffer/cray - Prevent slurmctld daemon abort if "paths" operation
fails. Now job will be held. Update job update time when held.
-- Fix issues with QOS flags Partition[Min|Max]Nodes to work correctly.
-- Increase number of ResumePrograms that can be managed without leaving
zombie/orphan processes from 10 to 100.
-- Refactor slurmctld agent logic to eliminate some pthreads.
* Changes in Slurm 16.05.9
==========================
-- Fix parsing of SBCAST_COMPRESS environment variable in sbcast.
-- Change some debug messages to errors in task/cgroup plugin.
-- backfill scheduler: Stop trying to determine expected start time for a job
after 2 seconds of wall time. This can happen if there are many running jobs
and a pending job can not be started soon.
-- Improve performance of cr_sort_part_rows() in cons_res plugin.
-- CRAY - Fix dealock issue when updating accounting in the slurmctld and
scheduling a Datawarp job.
-- Correct the job state accounting information for jobs requeued due to burst
buffer errors.
-- burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
just creating or deleting a persistent burst buffer).
-- Fix slurm.spec file support for BlueGene builds.
-- Fix missing TRES read lock in acct_policy_job_runnable_pre_select() code.
Josh Samuelson
committed
-- Fix debug2 message printing value using wrong array index in
_qos_job_runnable_post_select().
-- Prevent job timeout on node power up.
-- MYSQL - Fix minor memory leak when querying steps and the sql fails.
-- Make it so sacctmgr accepts column headers like MaxTRESPU and not MaxTRESP.
-- Only look at SLURM_STEP_KILLED_MSG_NODE_ID on startup, to avoid race
condition later when looking at a steps env.
-- Make backfill scheduler behave like regular scheduler in respect to
'assoc_limit_stop'.
-- Allow a lower version client command to talk to a higher version contoller
using the multi-cluster options (e.g. squeue -M<clsuter>).
-- slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
daemon is running or node boot in progress.
-- MYSQL - Fix a few other minor memory leaks when uncommon failures occur.
-- burst_buffer/cray - Fix race condition that could cause multiple batch job
launch requests resulting in drained nodes.
-- Correct logic to purge old reservations.
-- Fix DBD cache restore from previous versions.
-- Fix to logic for getting expected start time of existing job ID with
explicit begin time that is in the past.
-- Clear job's reason of "BeginTime" in a more timely fashion and/or prevents
them from being stuck in a PENDING state.
Alejandro Sanchez
committed
-- Make sure acct policy limits imposed on a job are correct after requeue.
* Changes in Slurm 16.05.8
==========================
-- Remove StoragePass from being printed out in the slurmdbd log at debug2
level.
-- Defer PATH search for task program until launch in slurmstepd.
-- Modify regression test1.89 to avoid leaving vestigial job. Also reduce
logging to reduce likelyhood of Expect buffer overflow.
-- Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
enabled.
-- Fix for possible infinite loop in select/cons_res plugin when trying to
satisfy a job's ntasks_per_core or socket specification.
-- If job is held for bad constraints make it so once updated the job doesn't
go into JobAdminHeld.
-- sched/backfill - Fix logic to reserve resources for jobs that require a
node reboot (i.e. to change KNL mode) in order to start.
-- When unpacking a node or front_end record from state and the protocol
version is lower than the min version, set it to the min.
-- Remove redundant lookup for part_ptr when updating a reservation's nodes.
-- Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
-- Do not allocate specialized cores to jobs using the --exclusive option.
-- Cancel interactive job if Prolog failure with "PrologFlags=contain" or
"PrologFlags=alloc" configured. Send new error prolog failure message to
the salloc or srun command as needed.
-- Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
Dominik Bartkiewicz
committed
-- Fix check for PluginDir within slurmctld to work with multiple directories.
-- Cancel interactive jobs automatically on communication error to launching
srun/salloc process.
-- Fix security issue caused by insecure file path handling triggered by the
failure of a Prolog script. To exploit this a user needs to anticipate or
cause the Prolog to fail for their job. CVE-2016-10030.
* Changes in Slurm 16.05.7
==========================
-- Fix issue in the priority/multifactor plugin where on a slurmctld restart,
where more time is accounted for than should be allowed.
-- cray/busrt_buffer - If total_space in a pool decreases, reset used_space
rather than trying to account for buffer allocations in progress.
-- cray/busrt_buffer - Fix for double counting of used_space at slurmctld
startup.
-- Fix regression in 16.05.6 where if you request multiple cpus per task (-c2)
and request --ntasks-per-core=1 and only 1 task on the node
the slurmd would abort on an infinite loop fatal.
-- cray/busrt_buffer - Internally track both allocated and unusable space.
The reported UsedSpace in a pool is now the allocated space (previously was
unusable space). Base available space on whichever value leaves least free
space.
-- cray/burst_buffer - Preserve job ID and don't translate to job array ID.
-- cray/burst_buffer - Update "instance" parsing to match updated dw_wlm_cli
output.
-- sched/backfill - Insure we don't try to start a job that was already started
and requeued by the main scheduling logic.
-- job_submit/lua - add access to the job features field in job_record.
-- select/linear plugin modified to better support heterogeneous clusters when
topology/none is also configured.
-- Permit cancellation of jobs in configuring state.
-- acct_gather_energy/rapl - prevent segfault in slurmd from race to gather
data at slurmd startup.
-- Integrate node_feature/knl_generic with "hbm" GRES information.
-- Fix output routines to prevent rounding the TRES values for memory or BB.
-- switch/cray plugin - fix use after free error.
-- docs - elaborate on how way to clear TRES limits in sacctmgr.
-- knl_cray plugin - Avoid abort from backup slurmctld at start time.
-- cgroup plugins - fix two minor memory leaks.
-- If a node is booting for some job, don't allocate additional jobs to the
node until the boot completes.
-- testsuite - fix job id output in test17.39.
-- Modify backfill algorithm to improve performance with large numbers of
running jobs. Group running jobs that end in a "similar" time frame using a
time window that grows exponentially rather than linearly. After one second
of wall time, simulate the termination of all remaining running jobs in
order to respond in a reasonable time frame.
-- Fix slurm_job_cpus_allocated_str_on_node_id() API call.
-- sched/backfill plugin: Make malloc match data type (defined as uint32_t and
allocated as int).
Dominik Bartkiewicz
committed
-- srun - prevent segfault when terminating job step before step has launched.
Dominik Bartkiewicz
committed
-- sacctmgr - prevent segfault when trying to reset usage for an invalid
account name.
-- Make the openssl crypto plugin compile with openssl >= 1.1.
-- Fix SuspendExcNodes and SuspendExcParts on slurmctld reconfiguration.
-- sbcast - prevent segfault in slurmd due to race condition between file
transfers from separate jobs using zlib compression
-- cray/burst_buffer - Increase time to synchronize operations between threads
from 5 to 60 seconds ("setup" operation time observed over 17 seconds).
-- node_features/knl_cray - Fix possible race condition when changing node
state that could result in old KNL mode as an active features.
Dominik Bartkiewicz
committed
-- Make sure if a job can't run because of resources we also check accounting
limits after the node selection to make sure it doesn't violate those limits
and if it does change the reason for waiting so we don't reserve resources
on jobs violating accounting limits.
-- NRT - Make it so a system running against IBM's PE will work with PE
version 1.3.
-- NRT - Make it so protocols pgas and test are allowed to be used.
-- NRT - Make it so you can have more than 1 protocol listed in MP_MSG_API.
-- cray/burst_buffer - If slurmctld daemon restarts with pending job and burst
buffer having unknown file stage-in status, teardown the buffer, defer the
job, and start stage-in over again.
-- On state restore in the slurmctld don't overwrite the mem_spec_limit given
from the slurm.conf when using FastSchedule=0.
-- Recognize a KNL's proper NUMA count (rather than setting it to the value
in slurm.conf) when using FastSchedule=0.
-- Fix parsing in regression test1.92 for some prompts.
-- sbcast - use slurmd's gid cache rather than a separate lookup.
-- slurmd - return error if setgroups() call fails in _drop_privileges().
-- Remove error messages about gres counts changing when a job is resized on
a slurmctld restart or reconfig, as they aren't really error messages.
-- Fix possible memory corruption if a job is using GRES and changing size.
-- jobcomp/elasticsearch - fix printf format for a value on 32-bit builds.
-- task/cgroup - Change error message if CPU binding can not take place to
better identify the root cause of the problem.
-- Fix issue where task/cgroup would not always honor --cpu_bind=threads.
-- Fix race condition in with getgrouplist() in slurmd that can lead to
user accounts being granted access to incorrect group memberships during
job launch.
* Changes in Slurm 16.05.6
==========================
-- Docs - the correct default value for GroupUpdateForce is 0.
-- mpi/pmix - improve point to point communication performance.
-- SlurmDB - include pending jobs in search during 'sacctmgr show runawayjobs'.
-- Add client side out-of-range checks to --nice flag.
-- Fix support for sbatch "-W" option, previously eeded to use "--wait".
-- node_features/knl_cray plugin and capmc_suspend/resume programs modified to
sleep and retry capmc operations if the Cray State Manager is down. Added
CapmcRetries configuration parameter to knl_cray.conf.
-- node_features/knl_cray plugin: Remove any KNL MCDRAM or NUMA features from
node's configuration if capmc does NOT report the node as being KNL.
-- node_features/knl_cray plugin: drain any node not reported by
"capmc node_status" on startup or reconfig.
-- node_features/knl_cray plugin: Substantially streamline and speed up logic
to load current node state on reconfigure failure or unexpected node boot.
-- node_features/knl_cray plugin: Add separate thread to interact with capmc
in response to unexpected node reboots.
-- node_features plugin - Add "mode" argument to node_features_p_node_xlate()
function to fix some bugs updating a node's features using the node update
RPC.
-- node_features/knl_cray plugin: If the reconfiguration of nodes for an
interactive job fails, kill the job (it can't be requeued like a batch job).
-- Testsuite - Added srun/salloc/sbatch tests with --use-min-nodes option.
-- Fix typo when an error occurs when discovering pmix version on
configure.
-- Fix configuring pmix support when you have your lib dir symlinked to lib64.
-- Fix waiting reason if a job is waiting for a specific limit instead of
always just AccountingPolicy.
-- Correct SchedulerParameters=bf_busy_nodes logic with respect to the job's
minimum node count. Previous logic would not decremement counter in some
locations and reject valid job request for not reaching minimum node count.
-- Fix FreeBSD-11 build by using llabs() function in place of abs().
-- Cray: The slurmd can manipulate the socket/core/thread values reported based
upon the configuration. The logic failed to consider select/cray with
SelectTypeParameters=other_cons_res as equivalent to select/cons_res.
-- If a node's socket or core count are changed at registration time (e.g. a
KNL node's NUMA mode is changed), change it's board count to match.
-- Prevent possible divide by zero in select/cons_res if a node's board count
is higher than it's socket count.
-- Allow an advanced reservation to contain a license count of zero.
-- Preserve non-KNL node features when updating the KNL node features for a
multi-node job in which the non-KNL node features vary by node.
-- task/affinity plugin: Honor a job's --ntasks-per-socket and
--ntasks-per-core options in task binding.
-- slurmd - do not print ClusterName when using 'slurmd -C'.
-- Correct a bitmap test function (used only by the select/bluegene plugin).
-- Do not propagate SLURM_UMASK environment variable to batch script.
-- Added node_features/knl_generic plugin for KNL support on non-Cray systems.
-- Cray: Prevent abort in backfill scheduling logic for requeued job that has
been cancelled while NHC is running.
-- Improve reported estimates of start and end times for pending jobs.
-- pbsnodes: Show OS value as "unknown" for down nodes.
-- BlueGene - correctly scale node counts when enforcing MaxNodes limit take 2.
-- Fix "sbatch --hold" to set JobHeldUser correctly instead of JobHeldAdmin.
-- Cray - print warning that task/cgroup is required, and must be after
task/cray in the TaskPlugin settings.
-- Document that node Weight takes precedence over load with LLN scheduling.
Dominik Bartkiewicz
committed
-- Fix issue where gang scheduling could happen even with OverSubscribe=NO.
-- Expose JOB_SHARED_* values to job_submit/lua plugin.
-- Fix issue where number of nodes is not properly allocated when srun is
requested with -n tasks < hosts from -w hostlist.
-- Update srun documentation for -N, -w and -m arbitrary.
-- Fix bug that was clearing MAINT mode on nodes scheduled for reboot (bug
introduced in version 16.05.5 to address bug in overlapping reservations).
-- Add logging of node reboot requests.
-- Docs - remove recommendation for ReleaseAgent setting in cgroup.conf.
-- Make sure a job cleans up completely if it has a node fail. Mostly an
issue with gang scheduling.
* Changes in Slurm 16.05.5
==========================
-- Fix accounting for jobs requeued after the previous job was finished.
-- slurmstepd modified to pre-load all relevant plugins at startup to avoid
the possibility of modified plugins later resulting in inconsistent API
or data structures and a failure of slurmstepd.
-- Export functions from parse_time.c in libslurm.so.
-- Export unit convert functions from slurm_protocol_api.c in libslurm.so.
-- Fix scancel to allow multiple steps from a job to be cancelled at once.
-- Update and expand upgrade guide (in Quick Start Administrator web page).
-- burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run
operation.
-- Insure reported expected job start time is not in the past for pending jobs.
-- Add support for PMIx v2.
-- mpi/pmix: support for passing TMPDIR path through info key
-- Cray: update slurmconfgen_smw.py script to correctly identify service nodes
versus compute nodes.
-- FreeBSD - fix build issue in knl_cray plugin.
-- Corrections to gres.conf parsing logic.
-- Make partition State independent of EnforcePartLimits value.
-- Fix multipart srun submission with EnforcePartLimits=NO and job violating
the partition limits.
-- Fix problem updating job state_reason.
-- pmix - Provide HWLOC topology in the job-data if Slurm was configured
with hwloc.
-- Cray - Fix issue restoring jobs when blade count increases due to hardware
reconfiguration.
-- burst_buffer/cray - Hold job after 3 failed pre-run operations.
-- sched/backfill - Check that a user's QOS is allowed to use a partition
before trying to schedule resources on that partition for the job.
-- sacctmgr - Fix displaying nodenames when printing out events or
reservations.
-- Fix mpiexec wrapper to accept task count with more than one digit.
-- Add mpiexec man page to the script.
-- Add salloc_wait_nodes option to the SchedulerParameters parameter in the
slurm.conf file controlling when the salloc command returns in relation to
when nodes are ready for use (i.e. booted).
-- Handle case when slurmctld daemon restart while compute node reboot in
progress. Return node to service rather than setting DOWN.
-- Preserve node "RESERVATION" state when one of multiple overlapping
reservations ends.
-- Restructure srun command locking for task_exit processing logic for improved
parallelism.
-- Modify srun task completion handling to only build the task/node string for
logging purposes if it is needed. Modified for performance purposes.
Alejandro Sanchez
committed
-- Docs - update salloc/sbatch/srun man pages to mention corresponding
environment variables for --mem/--mem-per-cpu and allowed suffixes.
Alejandro Sanchez
committed
-- Silence srun warning when overriding the job ntasks-per-node count
with a lower task count for the step.
-- node_features/knl_cray: Fix bug where MCDRAM state could be taken from
capmc rather than cnselect.
-- node_features/knl_cray: If a node is rebooted outside of Slurm's direction,
update it's active features with current MCDRAM and NUMA mode information.
-- Restore ability to manually power down nodes, broken in 15.08.12.
-- Don't log error for job end_time being zero if node health check is still
running.
-- When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode)
then pass to the ResumeProgram the job ID assigned to the nodes in the
SLURM_JOB_ID environment variable.
-- Allow a node's PowerUp state flag to be cleared using update_node RPC.
-- capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of
nodes or reboot a set of nodes fails then just requeue the job and abort the
entire operation rather than trying to operate on individual nodes.
-- node_features/knl_cray plugin: Increase default CapmcTimeout parameter from
10 to 60 seconds.
-- Fix squeue filter by job license when a job has requested more than 1
license of a certain type.
-- Fix bug in PMIX_Ring in the pmi2 plugin so that it supports singleton mode.
It also updates the testpmixring.c test program so it can be used to check
singleton runs.
-- Automically clean up task/cgroup cpuset and devices cgroups after steps are
completed.
-- Testsuite - Fix test1.83 to handle gaps in node names properly.
Dominik Bartkiewicz
committed
-- BlueGene - correctly scale node counts when enforcing MaxNodes limit.
-- Make sure no attempt is made to schedule a requeued job until all steps are
cleaned (Node Health Check completes for all steps on a Cray).
-- KNL: Correct task affinity logic for some NUMA modes.
-- Add salloc/sbatch/srun --priority option of "TOP" to set job priority to
the highest possible value. This option is only available to Slurm operators
and administrators.
-- Add salloc/sbatch/srun option --use-min-nodes to prefer smaller node counts
when a range of node counts is specified (e.g. "-N 2-4").
-- Validate salloc/sbatch --wait-all-nodes argument.
-- Add "sbatch_wait_nodes" to SchedulerParameters to control default sbatch
behaviour with respect to waiting for all allocated nodes to be ready for
use. Job can override the configuration option using the --wait-all-nodes=#
option.
Tim Wickberg
committed
-- Prevent partition group access updates from resetting last_part_update when
no changes have been made. Prevents backfill scheduler from restarting
mid-cycle unnecessarily.
-- Cray - add NHC_ABSOLUTELY_NO to never run NHC, even on certain edge cases
that it would otherwise be run on with NHC_NO.
-- Ignore GRES/QOS updates that maintain the same value as before.
-- mpi/pmix - prepare temp directory for application.
-- Fix display for the nice and priority values in sprio/scontrol/squeue.
* Changes in Slurm 16.05.4
==========================
-- Fix potential deadlock if running with message aggregation.
-- Streamline when schedule() is called when running with message aggregation
on batch script completes.
-- Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
-- Document that persistent burst buffers can not be created or destroyed using
the salloc or srun --bb options.
-- Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
SLURM_JOB_RESERVAION environment variables are set for the salloc command.
Document the same environment variables for the salloc, sbatch and srun
commands in their man pages.
-- Fix issue where sacctmgr load cluster.cfg wouldn't load associations
that had a partition in them.
-- Don't return the extern step from sstat by default.
-- In sstat print 'extern' instead of 4294967295 for the extern step.
-- Make advanced reservations work properly with core specialization.
-- Fix race condition in the account_gather plugin that could result in job
stuck in COMPLETING state.
-- Regression test fixes if SelectTypePlugin not managing memory and no node
memory size set (defaults to 1 MB per node).
Dominik Bartkiewicz
committed
-- Add missing partition write locks to _slurm_rpc_dump_nodes/node_single to
prevent a race condition leading to inconsistent sinfo results.
-- Fix task:CPU binding logic for some processors. This bug was introduced
in version 16.05.1 to address KNL bunding problem.
-- Fix two minor memory leaks in slurmctld.
-- Improve partition-specific limit logging from slurmctld daemon.
-- Fix incorrect access check when using MaxNodes setting on the partition.
Dominik Bartkiewicz
committed
-- Fix issue with sacctmgr when specifying a list of clusters to query.
-- Fix issue when calculating future StartTime for a job.
-- Make EnforcePartLimit support logic work with any ordering of partitions
in job submit request.
Tim Wickberg
committed
-- Prevent restoration of wrong CPU governor and frequency when using
multiple task plugins.
-- Prevent slurmd abort if hwloc library fails to populate the "children"
arrays (observed with hwloc version "dev-333-g85ea6e4").
-- burst_buffer/cray: Add "--groupid" to DataWarp "setup" command.
Iakovos Panourgias
committed
-- Fix lustre profiling putting it in the Filesystem dataset instead of the
Network dataset.
-- Fix profiling documentation and code to match be consistent with
Filesystem instead of Lustre.
-- Correct the way watts is calculated in the rapl plugin when using a poll
frequency other than AcctGatherNodeFreq.
-- Don't about step launch if job reaches expected end time while node is
configuring/booting (NOTE: The job end time will be adjusted after node
becomes ready for use).
Tim Wickberg
committed
-- Fix several print routines to respect a custom output delimiter when
printing NO_VAL or INFINITE.
-- Correct documented configurations where --ntasks-per-core and
--ntasks-per-socket are supported.
-- task/affinity plugin buffer allocated too small, can corrupt memory.
* Changes in Slurm 16.05.3
==========================
-- Make it so the extern step uses a reverse tree when cleaning up.
-- If extern step doesn't get added into the proctrack plugin make sure the
sleep is killed.
-- Fix areas the slurmctld can segfault if an extern step is in the system
cleaning up on a restart.
-- Prevent possible incorrect counting of GRES of a given type if a node has
the multiple "types" of a given GRES "name", which could over-subscribe
GRES of a given type.
-- Add web links to Slurm Diamond Collectors (from Harvard University) and
collectd (from EDF).
-- Add job_submit plugin for the "reboot" field.
-- Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to
job_submit/lua plugins.
-- Send in a -1 for a taskid into spank_task_post_fork for the extern_step.
-- MYSQL - Sightly better logic if a job completion comes in with an end time
of 0.
-- task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft
memory limit to allocated memory limit (previously no soft limit was set).
-- Document limitations in burst buffer use by the salloc command (possible
access problems from a login node).
-- Fix proctrack plugin to only add the pid of a process once
(regression in 16.05.2).
-- Fix for sstat to print correct info when requesting jobid.batch as part of
a comma-separated list.
-- CRAY - Fix issue if pid has already been added to another job container.
-- CRAY - Fix add of extern step to AELD.
-- burstbufer/cray: avoid batch submit error condition if waiting for stagein.
-- CRAY - Fix for reporting steps lingering after they are already finished.
-- Testsuite - fix test1.29 / 17.15 for limits with values above 32-bits.
-- CRAY - Simplify when a NHC is called on a step that has unkillable
processes.
-- CRAY - If trying to kill a step and you have NHC_NO_STEPS set run NHC
anyway to attempt to log the backtraces of the potential
unkillable processes.
-- Fix gang scheduling and license release logic if single node job killed on
bad node.
-- Make scontrol show steps show the extern step correctly.
-- Do not scheduled powered down nodes in FAILED state.
-- Do not start slurmctld power_save thread until partition information is read
in order to prevent race condition that can result invalid pointer when
trying to resolve configured SuspendExcParts.
-- Add SLURM_PENDING_STEP id so it won't be confused with SLURM_EXTERN_CONT.
-- Fix for core selection with job --gres-flags=enforce-binding option.
Previous logic would in some cases allocate a job zero cores, resulting in
slurmctld abort.
-- Minimize preempted jobs for configurations with multiple jobs per node.
-- Improve partition AllowGroups caching. Update the table of UIDs permitted to
use a partition based upon it's AllowGroups configuration parameter as new
valid UIDs are found rather than looking up that user's group information
for every job they submit. If the user is now allowed to use the partition,
then do not check that user's group access again for 5 seconds.
-- Add routing queue information to Slurm FAQ web page.
-- Do not select_g_step_finish() a SLURM_PENDING_STEP step, as nothing has
been allocated for the step yet.
-- Fixed race condition in PMIx Fence logic.
-- Prevent slurmctld abort if job is killed or requeued while waiting for
reboot of its allocated compute nodes.
-- Treat invalid user ID in AllowUserBoot option of knl.conf file as error
rather than fatal (log and do not exit).
-- qsub - When doing the default output files for an array in qsub style
make them using the master job ID instead of the normal job ID.
-- Create the extern step while creating the job instead of waiting until the
end of the job to do it.
-- Always report a 0 exit code for the extern step instead of being canceled
or failed based on the signal that would always be killing it.
-- Fix to allow users to update QOS of pending jobs.
-- CRAY - Fix minor memory leak in switch plugin.
-- CRAY - Change slurmconfgen_smw.py to skip over disabled nodes.
-- Fix eligible_time for elasticsearch as well as add queue_wait
(difference between start of job and when it was eligible).
* Changes in Slurm 16.05.2
==========================
-- CRAY - Fix issue where the proctrack plugin could hang if the container