This file describes changes in recent versions of SLURM. It primarily documents those changes that are of interest to users and admins. * Changes in SLURM 2.2.0.pre0 ============================= -- Added RunTime field to scontrol show job report -- Added SLURM_VERSION_NUMBER and removed SLURM_API_VERSION from slurm/slurm.h. -- Added support to handle communication with SLURM 2.1 clusters. Job's should not be lost in the future when upgrading to higher versions of SLURM. -- Added withdeleted options for listing clusters, users, and accounts -- Remove PLPA task affinity functions due to that package being deprecated. -- Preserve current partition state information rather than use contents of slurm.conf file after slurmctld restart or reconfiguration. -- Preserve current node Feature state information rather than use contents of slurm.conf file after slurmctld restart or reconfiguration. -- Modify SLURM's PMI library (for MPICH2) to properly execute an executable program stand-alone (single MPI task launched without srun). -- Made GrpCPUs and MaxCPUs limits work for select/cons_res. -- Moved all SQL dependant plugins into a seperate rpm slurm-sql. This should be needed only where a connection to a database is needed (i.e. where the slurmdbd is running) * Changes in SLURM 2.1.2 ============================= -- Added nodelist to sview for jobs on non-bluegene systems -- Correction in value of batch job environment variable SLURM_TASKS_PER_NODE under some conditions. -- When a node silently fails which is already drained/down the reason for draining for the node is not changed. -- Srun will ignore SLURM_NNODES environment variable and use the count of currently allocated nodes if that count changes during the job's lifetime (e.g. job allocation uses the --no-kill option and a node goes DOWN, job step would previously always fail). -- Made it so sacctmgr can't add blank user or account. The MySQL plugin will also reject such requests. -- Revert libpmi.so version for compatibility with SLURM version 2.0 and earlier to avoid forcing applications using a specific libpmi.so version to rebuild unnecessarily (revert from libpmi.so.21.0.0 to libpmi.so.0.0.0). -- Restore support for a pending job's constraints (required node features) when slurmctld is restarted (internal structure needed to be rebuilt). -- Removed checkpoint_blcr.so from the plugin rpm in the slurm.spec since it is also in the blcr rpm. * Changes in SLURM 2.1.1 ============================= -- Fix for case sensitive databases when a slurmctld has a mixed case clustername to lower case the string to easy compares. -- Fix squeue if job is completing and failed to print remaining nodes instead of failed message. -- Fix sview core when searching for partitions by state. -- Fixed setting the start time when querying in sacct to the beginning of the day if not set previously. -- Defined slurm_free_reservation_info_msg and slurm_free_topo_info_msg in common/slurm_protocol_defs.h -- Avoid generating error when a job step includes a memory specification and memory is not configured as a consumable resource. -- Patch for small memory leak in src/common/plugstack.c -- Fix sview search on node state. -- Fix bug in which improperly formed job dependency specification can cause slurmctld to abort. -- Fixed issue where slurmctld wouldn't always get a message to send cluster information when registering for the first time with the slurmdbd. -- Add slurm_*_trigger.3 man pages for event trigger APIs. -- Fix bug in job preemption logic that would free allocated memory twice. -- Fix spelling issues (from Gennaro Oliva) -- Fix issue when changing parents of an account in accounting all childern weren't always sent to their respected slurmctlds until a restart. -- Restore support for srun/salloc/sbatch option --hint=nomultithread to bind tasks to cores rather than threads (broken in slurm v2.1.0-pre5). -- Fix issue where a 2.0 sacct could not talk correctly to a 2.1 slurmdbd. -- BLUEGENE - Fix issue where no partitions have any nodes assigned them to alert user no blocks can be created. -- BLUEGENE - Fix smap to put BGP images when using -Dc on a Blue Gene/P system -- Set SLURM_SUBMIT_DIR environment variable for srun and salloc commands to match behavior of sbatch command. -- Report WorkDir from "scontrol show job" command for jobs launched using salloc and srun. -- Update correctly the wckey when changing it on a pending job. -- Set wckeyid correctly in accounting when cancelling a pending job. -- BLUEGENE - critical fix where jobs would be killed incorrectly. -- BLUEGENE - fix for sview putting multiple ionodes on to nodelists when viewing the jobs tab. * Changes in SLURM 2.1.0 ============================= -- Improve sview layout of blocks in use. -- A user can now change the dimensions of the grid in sview. -- BLUEGENE - improved startup speed further for large numbers of defined blocks -- Fix to _get_job_min_nodes() in wiki2/get_jobs.c suggested by Michal Novotny -- BLUEGENE - fixed issues when updating a pending job when a node count was incorrect for the asked for connection type. -- BLUEGENE - fixed issue when combining blocks that are in ready states to make a larger block from those or make multiple smaller blocks by splitting the larger block. Previously this would only work with block in a free state. -- Fix bug in wiki(2) plugins where if HostFormat=2 and the task list is greater than 64 we don't truncate. Previously this would mess up Moab by sending a truncated task list when doing a get jobs. -- Added update slurmctld debug level to sview when in admin mode. -- Added logic to make sure if enforcing a memory limit when using the jobacct_gather plugin a user can no longer turn off the logic to enforce the limit. -- Replaced many calls to getpwuid() with reentrant uid_to_string() -- The slurmstepd will now refresh it's log file handle on a reconfig, previously if a log was rolled any output from the stepd was lost. * Changes in SLURM 2.1.0-pre9 ============================= -- Added the "scontrol update SlurmctldDebug" as the preferred alternative to the "scontrol setdebug" command. -- BLUEGENE - made it so when removing a block in an error state the nodes in the block are set correctly in accounting as not in error. -- Fixed issue where if slurmdbd is not up qos' are set up correctly for associations off of cache. -- scontrol, squeue, sview all display the correct node, cpu count along with correct corresponding nodelist on completing jobs. -- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in the spank_job_env functionality. -- Improve spank_job_env interface and documentation -- Add ESPANK_NOT_LOCAL error code to spank_err_t -- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin a slurm.conf variable (PriorityCalcPeriod) -- Added new macro SLURM_VERSION for use in autoconf scripts to determine current version of slurm installed on system when building against the api. -- Patch from Matthieu Hautreux that adds an entry into the error file when a job or step receives a TERM or KILL signal. -- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in existence in the slurmd. * Changes in SLURM 2.1.0-pre8 ============================= -- Rearranged the "scontrol show job" output into functional groupings -- Change the salloc/sbatch/srun -P option to -d (dependency) -- Removed the srun -d option; must use srun --slurmd-debug instead -- When running the mysql plugin natively MUNGE errors are now eliminated when sending updates to slurmctlds. -- Check to make sure we have a default account before looking to fill in default association. -- Accounting - Slurmctld and slurmdbd will now set uids of users which were created after the start of the daemons on reconfig. Slurmdbd will attempt to set previously non-existant uids every hour. -- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted #SBATCH options in a batch script. -- job_desc_msg_t - in, out, err have been changed to std_in, std_out, and std_err respectfully. Needed for PySLURM, since Python sees (in) as a keyword. -- Changed the type of addr to struct sockaddr_in in _message_socket_accept() in sattach.c, step_launch.c, and allocate_msg.c, and moved the function into a common place for all the calls since the code was very similar. -- proctrack/lua support has been added see contribs/lua/protrack.lua -- replaced local gtk m4 test with AM_PATH_GTK_2_0 -- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in compile lines. -- Patch from Matthieu Hautreux to improve error message in slurmd/req.c -- Added support for split groups from (Matthiu Hautreux CEA) -- Patch from Mark Grondona to move blcr scripts into pkglibexecdir -- Patch from Doug Parisek to calculate a job's projected start time under the builtin scheduler. -- Removed most global variables out of src/common/jobacct_common.h * Changes in SLURM 2.1.0-pre7 ============================= -- BLUEGENE - make 2.1 run correctly on a real bluegene cluster -- sacctmgr - Display better debug for when an admin specifies a non-existant parent account when changing parent accounts. -- Added a mechanism to the slurmd to defer the epilog from starting until after a running prolog has finished. -- If a node reboots inbetween checking status the node is marked down unless ReturnToService=2 -- Added -R option to slurmctld to recover partition state also when restarting or reconfiguring. * Changes in SLURM 2.1.0-pre6 ============================= -- When getting information about nodes in hidden partitions, return a node name of NULL rather than returning no information about the node so that node index information is still valid. -- When querying database for jobs in certain state and a time period is given only jobs in that state during the period will be returned, previously if a time period was given in sacct jobs eligible to run or running would be displayed, which is still the default if no states are requested. -- One can now query jobs based on size (nodes and or cpus) (mysql plugin only) -- Applied patch from Mark Grondona that tests for a missing config file before any other processing in spank_init(). This now prevents fatal errors from being mistakenly treated as recoverable. -- --enable-debug no longer has to be stated at configure time to have the slurmctld or slurmstepd dump core on a seg fault. -- Moved the errant slurm_job_node_ready() declaration from job_info.h to slurm.h and deleted job_info.h. -- Added the slurm_job_cpus_allocated_on_node_id() slurm_job_cpus_allocated_on_node() API for working with the job_resources_t structure. -- BLUEGENE - speed up start up for systems that have many blocks (100+) configured on the system. * Changes in SLURM 2.1.0-pre5 ============================= -- Add squeue option "--start" to report expected start time of pending jobs. -- Sched/backfill plugin modified to set expected start time of pending jobs. -- Add SchedulerParameters option of "max_job_bf=#" to control how far down the queue of pending jobs that SLURM searches in an attempt backfill schedule them. The default value is 50 jobs. -- Fixed cause of squeue -o "%C" seg fault. -- Add -"-signal=@