This file describes changes in recent versions of Slurm. It primarily documents those changes that are of interest to users and administrators. * Changes in Slurm 17.02.0rc2 ============================== -- job_submit/lua - Make "immediate" parameter available. -- Fix srun I/O race condtion to eliminate a error message that might be generated if the application exits with outstanding stdin. -- Fix regression when purging/archiving jobs/events. -- Add new job state JOB_OOM indicating Out Of Memory condition as detected by task/cgroup plugin. -- If QOS has been added to the system go refigure out Deny/AllowQOS on partitions. -- Deny job with duplicate GRES requested. -- Fix loading super old assoc_mgr usage without segfaulting. -- CRAY systems: Restore TaskPlugins order of task/cray before task/cgroup. -- Task/cray: Treat missing "mems" cgroup with "debug" messages rather than "error" messages. The file may be missing at step termination due to a change in how cgroups are released at job/step end. -- Fix for job constraint specification with counts, --ntasks-per-node value, and no node count. -- Fix ordering of step task allocation to fill in a socket before going into another one. -- Fix configure to not require C++ -- job_submit/lua - Remove access to slurmctld internal reservation fields of job_pend_cnt and job_run_cnt. -- Prevent job_time_limit enforcement from blocking other internal operations if a large number of jobs need to be cancelled. * Changes in Slurm 17.02.0rc1 ============================== -- Add port info to 'sinfo' and 'scontrol show node'. -- Fix errant definition of USE_64BIT_BITSTR which can lead to core dumps. -- Move BatchScript to end of each job's information when using "scontrol -dd show job" to make it more readable. -- Add SchedulerParameters configuration parameter of "default_gbytes", which treats numeric only (no suffix) value for memory and tmp disk space as being in units of Gigabytes. Mostly for compatability with LSF. -- Fix race condtion in srun/sattach logic which would prevent srun from terminating. -- Bitstring operations are now 64bit instead of 32bit. -- Replace hweight() function in bitstring with faster version. -- scancel would treat a non-numeric argument as the name of jobs to be cancelled (a non-documented feature). Cancelling jobs by name now require the "--jobname=" command line argument. -- scancel modified to note that no jobs satisfy the filter options when the --verbose option is used along with one or more job filters (e.g. "--qos="). -- Change _pack_cred to use pack_bit_str_hex instead of pack_bit_fmt for better scalability and performance. -- Add BootTime configuration parameter to knl.conf file to optimize resource allocations with respect to required node reboots. -- Add node_features_p_boot_time() to node_features plugin to optimize scheduling with respect to node reboots. -- Avoid allocating resources to a job in the event that its run time plus boot time (if needed) extent into an advanced reservation. -- Burst_buffer/cray - Avoid stage-out operation if job never started. -- node_features/knl_cray - Add capability to detected Uncorrectable Memory Errors (UME) and if detected then log the event in all job and step stderr with a message of the form: error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 *** Similar logic added to node_features/knl_generic in version 17.02.0pre4. -- If job is allocated nodes which are powered down, then reset job start time when the nodes are ready and do not charge the job for power up time. -- Add the ability to purge transactions from the database. -- Add support for requeue'ing of federated jobs (BETA). -- Add support for interactive federated jobs (BETA). -- Add the ability to purge rolled up usage from the database. -- Properly set SLURM_JOB_GPUS environment variable for Prolog. * Changes in Slurm 17.02.0pre4 ============================== -- Add support for per-partitiion OverTimeLimit configuration. -- Add --mem_bind option of "sort" to run zonesort on KNL nodes at step start. -- Add LaunchParameters=mem_sort option to configure running of zonesort by default at step startup. -- Add "FreeSpace" information for each pool to the "scontrol show burstbuffer" output. Required changes to the burst_buffer_info_t data structure. -- Add new node state flag of NODE_STATE_REBOOT for node reboots triggered by "scontrol reboot" commands. Previous logic re-used NODE_STATE_MAINT flag, which could lead to inconsistencies. Add "ASAP" option to "scontrol reboot" command that will drain a node in order to reboot it as soon as possible, then return it to service. -- Allow unit conversion routine to convert 1024M to 1G. -- switch/cray plugin - change legacy spool directory location. -- Add new PriorityFlags option of INCR_ONLY, which prevents a job's priority from being decremented. -- Make it so we don't purge job start messages until after we purge step messages. Hopefully this will reduce the number of messages lost when filling up memory when the database/DBD is down. -- Added SchedulingParameters option of "bf_job_part_count_reserve". Jobs below the specified threshold will not have resources reserved for them. -- If GRES are configured with file IDs, then "scontrol -d show node" will not only identify the count of currently allocated GRES, but their specific index numbers (e.g. "GresUsed=gpu:alpha:2(IDX:0,2),gpu:beta:0(IDX:N/A)"). Ditto for job information with "scontrol -d show job". -- Add new mcs/account plugin. -- Add "GresEnforceBind=Yes" to "scontrol show job" output if so configured. -- Add support for SALLOC_CONSTRAINT, SBATCH_CONSTRAINT and SLURM_CONSTRAINT environment variables to set default constraints for salloc, sbatch and srun commands respectively. -- Provide limited support for the MemSpecLimit configuration parameter without the task/cgroup plugin. -- node_features/knl_generic - Add capability to detected Uncorrectable Memory Errors (UME) and if detected then log the event in all job and step stderr with a message of the form: error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 *** -- Add SLURM_JOB_GID to TaskProlog environment. -- burst_buffer/cray - Remove leading zeros from node ID lists passed to dw_wlm_cli program. -- Add "Partitions" field to "scontrol show node" output. -- Remove sched/wiki and sched/wiki2 plugins and associated code. -- Remove SchedulerRootFilter option and slurm_get_root_filter() API call. -- Add SchedulerParameters option of spec_cores_first to select specialized cores from the lowest rather than highest number cores and sockets. -- Add PrologFlags option of Serial to disable concurrent launch of Prolog and Epilog scripts. -- Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030. * Changes in Slurm 17.02.0pre3 ============================== -- Add srun host & PID to job step data structures. -- Avoid creating duplicate pending step records for the same srun command. -- Rewrite srun's logic for pending steps for better efficiency (fewer RPCs). -- Added new SchedulerParameters options step_retry_count and step_retry_time to control scheduling behaviour of job steps waiting for resources. -- Optimize resource allocation logic for --spread-job job option. -- Modify cpu_bind and mem_bind map and mask options to accept a repetition count to better support large task count. For example: "mask_mem:0x0f*2,0xf0*2" is equivalent to "mask_mem:0x0f,0x0f,0xf0,0xf0". -- Add support for --mem_bind=prefer option to prefer, but not restrict memory use to the identified NUMA node. -- Add mechanism to constrain kernel memory allocation using cgroups. New cgroup.conf parameters added: ConstrainKmemSpace, MaxKmemPercent, and MinKmemSpace. -- Correct invokation of man2html, which previously could cause FreeBSD builds to hang. -- MYSQL - Unconditionally remove 'ignore' clause from 'alter ignore'. -- Modify service files to not start Slurm daemons until after Munge has been started. NOTE: If you are not using Munge, but are using the "service" scripts to start Slurm daemons, then you will need to remove this check from the etc/slurm*service scripts. -- Do not process SALLOC_HINT, SBATCH_HINT or SLURM_HINT environment variables if any of the following salloc, sbatch or srun command line options are specified: -B, --cpu_bind, --hint, --ntasks-per-core, or --threads-per-core. -- burst_buffer/cray: Accept new jobs on backup slurmctld daemon without access to dw_wlm_cli command. No burst buffer actions will take place. -- Do not include SLURM_JOB_DERIVED_EC, SLURM_JOB_EXIT_CODE, or SLURM_JOB_EXIT_CODE in PrologSlurmctld environment (not available yet). -- Cray - set task plugin to fatal() if task/cgroup is not loaded after task/cray in the TaskPlugin settings. -- Remove separate slurm_blcr package. If Slurm is built with BLCR support, the files will now be part of the main Slurm packages. -- Replace sjstat, seff and sjobexit RPM packages with a single "contribs" package. -- Remove long since defunct slurmdb-direct scripts. -- Add SbcastParameters configuration option to control default file destination directory and compression algorithm. -- Add new SchedulerParameter (max_array_tasks) to limit the maximum number of tasks in a job array independently from the maximum task ID (MaxArraySize). -- Fix issue where number of nodes is not properly allocated when sbatch and salloc are requested with -n tasks < hosts from -w hostlist or from -N. -- Add infrastructure for submitting federated jobs. * Changes in Slurm 17.02.0pre2 ============================== -- Add new RPC (REQUEST_EVENT_LOG) so that slurmd and slurmstepd can log events through the slurmctld daemon. -- Remove sbatch --bb option. That option was never supported. -- Automatically clean up task/cgroup cpuset and devices cgroups after steps are completed. -- Add federation read/write locks. -- Limit job purge run time to 1 second at a time. -- The database index for jobs is now 64 bits. If you happen to be close to 4 billion jobs in your database you will want to update your slurmctld at the same time as your slurmdbd to prevent roll over of this variable as it is 32 bit previous versions of Slurm. -- Optionally lock slurmstepd in memory for performance reasons and to avoid possible SIGBUS if the daemon is paged out at the time of a Slurm upgrade (changing plugins). Controlled via new LaunchParameters options of slurmstepd_memlock and slurmstepd_memlock_all. -- Add event trigger on burst buffer errors (see strigger man page, --burst_buffer option). -- Add job AdminComment field which can only be set by a Slurm administrator. -- Add salloc, sbatch and srun option of --delay-boot=