This file describes changes in recent versions of SLURM. It primarily documents those changes that are of interest to users and admins. * Changes in SLURM 2.2.0.pre0 ============================= -- Added RunTime field to scontrol show job report -- Added SLURM_VERSION_NUMBER and removed SLURM_API_VERSION from slurm/slurm.h. -- Added support to handle communication with SLURM 2.1 clusters. Job's should not be lost in the future when upgrading to higher versions of SLURM. -- Added withdeleted options for listing clusters, users, and accounts -- Remove PLPA task affinity functions due to that package being deprecated. -- Preserve current partition state information rather than use contents of slurm.conf file after slurmctld restart or reconfiguration. * Changes in SLURM 2.1.1 ============================= -- Fix for case sensitive databases when a slurmctld has a mixed case clustername to lower case the string to easy compares. -- Fix squeue if job is completing and failed to print remaining nodes instead of failed message. -- Fix sview core when searching for partitions by state. -- Fixed setting the start time when querying in sacct to the beginning of the day if not set previously. -- Defined slurm_free_reservation_info_msg and slurm_free_topo_info_msg in common/slurm_protocol_defs.h -- Avoid generating error when a job step includes a memory specification and memory is not configured as a consumable resource. -- Patch for small memory leak in src/common/plugstack.c -- Fix sview search on node state. -- Fix bug in which improperly formed job dependency specification can cause slurmctld to abort. -- Fixed issue where slurmctld wouldn't always get a message to send cluster information when registering for the first time with the slurmdbd. * Changes in SLURM 2.1.0 ============================= -- Improve sview layout of blocks in use. -- A user can now change the dimensions of the grid in sview. -- BLUEGENE - improved startup speed further for large numbers of defined blocks -- Fix to _get_job_min_nodes() in wiki2/get_jobs.c suggested by Michal Novotny -- BLUEGENE - fixed issues when updating a pending job when a node count was incorrect for the asked for connection type. -- BLUEGENE - fixed issue when combining blocks that are in ready states to make a larger block from those or make multiple smaller blocks by splitting the larger block. Previously this would only work with block in a free state. -- Fix bug in wiki(2) plugins where if HostFormat=2 and the task list is greater than 64 we don't truncate. Previously this would mess up Moab by sending a truncated task list when doing a get jobs. -- Added update slurmctld debug level to sview when in admin mode. -- Added logic to make sure if enforcing a memory limit when using the jobacct_gather plugin a user can no longer turn off the logic to enforce the limit. -- Replaced many calls to getpwuid() with reentrant uid_to_string() -- The slurmstepd will now refresh it's log file handle on a reconfig, previously if a log was rolled any output from the stepd was lost. * Changes in SLURM 2.1.0-pre9 ============================= -- Added the "scontrol update SlurmctldDebug" as the preferred alternative to the "scontrol setdebug" command. -- BLUEGENE - made it so when removing a block in an error state the nodes in the block are set correctly in accounting as not in error. -- Fixed issue where if slurmdbd is not up qos' are set up correctly for associations off of cache. -- scontrol, squeue, sview all display the correct node, cpu count along with correct corresponding nodelist on completing jobs. -- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in the spank_job_env functionality. -- Improve spank_job_env interface and documentation -- Add ESPANK_NOT_LOCAL error code to spank_err_t -- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin a slurm.conf variable (PriorityCalcPeriod) -- Added new macro SLURM_VERSION for use in autoconf scripts to determine current version of slurm installed on system when building against the api. -- Patch from Matthieu Hautreux that adds an entry into the error file when a job or step receives a TERM or KILL signal. -- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in existence in the slurmd. * Changes in SLURM 2.1.0-pre8 ============================= -- Rearranged the "scontrol show job" output into functional groupings -- Change the salloc/sbatch/srun -P option to -d (dependency) -- Removed the srun -d option; must use srun --slurmd-debug instead -- When running the mysql plugin natively MUNGE errors are now eliminated when sending updates to slurmctlds. -- Check to make sure we have a default account before looking to fill in default association. -- Accounting - Slurmctld and slurmdbd will now set uids of users which were created after the start of the daemons on reconfig. Slurmdbd will attempt to set previously non-existant uids every hour. -- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted #SBATCH options in a batch script. -- job_desc_msg_t - in, out, err have been changed to std_in, std_out, and std_err respectfully. Needed for PySLURM, since Python sees (in) as a keyword. -- Changed the type of addr to struct sockaddr_in in _message_socket_accept() in sattach.c, step_launch.c, and allocate_msg.c, and moved the function into a common place for all the calls since the code was very similar. -- proctrack/lua support has been added see contribs/lua/protrack.lua -- replaced local gtk m4 test with AM_PATH_GTK_2_0 -- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in compile lines. -- Patch from Matthieu Hautreux to improve error message in slurmd/req.c -- Added support for split groups from (Matthiu Hautreux CEA) -- Patch from Mark Grondona to move blcr scripts into pkglibexecdir -- Patch from Doug Parisek to calculate a job's projected start time under the builtin scheduler. -- Removed most global variables out of src/common/jobacct_common.h * Changes in SLURM 2.1.0-pre7 ============================= -- BLUEGENE - make 2.1 run correctly on a real bluegene cluster -- sacctmgr - Display better debug for when an admin specifies a non-existant parent account when changing parent accounts. -- Added a mechanism to the slurmd to defer the epilog from starting until after a running prolog has finished. -- If a node reboots inbetween checking status the node is marked down unless ReturnToService=2 -- Added -R option to slurmctld to recover partition state also when restarting or reconfiguring. * Changes in SLURM 2.1.0-pre6 ============================= -- When getting information about nodes in hidden partitions, return a node name of NULL rather than returning no information about the node so that node index information is still valid. -- When querying database for jobs in certain state and a time period is given only jobs in that state during the period will be returned, previously if a time period was given in sacct jobs eligible to run or running would be displayed, which is still the default if no states are requested. -- One can now query jobs based on size (nodes and or cpus) (mysql plugin only) -- Applied patch from Mark Grondona that tests for a missing config file before any other processing in spank_init(). This now prevents fatal errors from being mistakenly treated as recoverable. -- --enable-debug no longer has to be stated at configure time to have the slurmctld or slurmstepd dump core on a seg fault. -- Moved the errant slurm_job_node_ready() declaration from job_info.h to slurm.h and deleted job_info.h. -- Added the slurm_job_cpus_allocated_on_node_id() slurm_job_cpus_allocated_on_node() API for working with the job_resources_t structure. -- BLUEGENE - speed up start up for systems that have many blocks (100+) configured on the system. * Changes in SLURM 2.1.0-pre5 ============================= -- Add squeue option "--start" to report expected start time of pending jobs. -- Sched/backfill plugin modified to set expected start time of pending jobs. -- Add SchedulerParameters option of "max_job_bf=#" to control how far down the queue of pending jobs that SLURM searches in an attempt backfill schedule them. The default value is 50 jobs. -- Fixed cause of squeue -o "%C" seg fault. -- Add -"-signal=@