This file describes changes in recent versions of SLURM. It primarily documents those changes that are of interest to users and admins. * Changes in SLURM 2.2.0.pre2 ============================= -- Add support for spank_get_item() to get S_STEP_ALLOC_CORES and S_STEP_ALLOC_MEM. Support will remain for S_JOB_ALLOC_CORES and S_JOB_ALLOC_MEM. -- Kill individual job steps that exceed their memory limit rather than killing an entire job if one step exceeds its memory limit. -- Added configuration parameter VSizeFactor to enforce virtual memory limits for jobs and job steps as a percentage of their real memory allocation. -- Add scontrol ability to update job step's time limits. -- Add scontrol ability to update job's NumCPUs count. * Changes in SLURM 2.2.0.pre1 ============================= -- Added RunTime field to scontrol show job report -- Added SLURM_VERSION_NUMBER and removed SLURM_API_VERSION from slurm/slurm.h. -- Added support to handle communication with SLURM 2.1 clusters. Job's should not be lost in the future when upgrading to higher versions of SLURM. -- Added withdeleted options for listing clusters, users, and accounts -- Remove PLPA task affinity functions due to that package being deprecated. -- Preserve current partition state information and node Feature and Weight information rather than use contents of slurm.conf file after slurmctld restart with -R option or SIGHUP. Replace information with contents of slurm.conf after slurmctld restart without -R or "scontrol reconfigure". See RELEASE_NOTES file fore more details. -- Modify SLURM's PMI library (for MPICH2) to properly execute an executable program stand-alone (single MPI task launched without srun). -- Made GrpCPUs and MaxCPUs limits work for select/cons_res. -- Moved all SQL dependant plugins into a seperate rpm slurm-sql. This should be needed only where a connection to a database is needed (i.e. where the slurmdbd is running) -- Add command line option "no_sys_info" to PAM module to supress system logging of "access granted for user ...", access denied and other errors will still be logged. -- sinfo -R now has the user and timestamp in separate fields from the reason. -- Much functionality has been added to account_storage/pgsql. The plugin is still in a very beta state. It is still highly advised to use the mysql plugin, but if you feel like living on the edge or just really like postgres over mysql for some reason here you go. (Work done primarily by Hongjia Cao, NUDT.) * Changes in SLURM 2.1.3 ============================= -- BLUEGENE - Fix issues on static/overlap systems where if a midplane was drained you would not be able to create new blocks on it. -- In sched/wiki2 (for Moab): Add excluded host list to job information using new keyword "EXCLUDE_HOSTLIST". -- Correct slurmd reporting of incorrect socket/core/thread counts. -- For sched/wiki2 (Moab): Do not extend a job's end time for suspend/resume or startup delay due to node boot time. A job's end time will always be its start time plus time limit. -- Added build-time option (to configure program) of --with-pam_dir to specify the directory into which PAM modules get installed, although it should pick the proper directory by default. "make install" and "rpmbuild" should now put the pam_slurm.so file in the proper directory. -- Modify PAM module to link against SLURM API shared library and use exported slurm_hostlist functions. -- Do not block new jobs with --immediate option while another job is in the process of being requeued (which can take a long time for some node failure modes). -- For topology/tree, log invalid hostnames in a single hostlist expression rather than one per line. -- A job step's default time limit will be UNLIMITED rather than partition's default time limit. The step will automatically be cancelled as part of the job termination logic when the job's time limit is reached. -- sacct - fixed bug when checking jobs against a reservation -- In select/cons_res, fix support for job allocation with --ntasks_per_node option. Previously could allocate too few CPUs on some nodes. -- Adjustment made to init message to the slurmdbd to allow backwards compatibility with future 2.2 release. YOU NEED TO UPGRADE SLURMDBD BEFORE ANYTHING ELSE. * Changes in SLURM 2.1.2 ============================= -- Added nodelist to sview for jobs on non-bluegene systems -- Correction in value of batch job environment variable SLURM_TASKS_PER_NODE under some conditions. -- When a node silently fails which is already drained/down the reason for draining for the node is not changed. -- Srun will ignore SLURM_NNODES environment variable and use the count of currently allocated nodes if that count changes during the job's lifetime (e.g. job allocation uses the --no-kill option and a node goes DOWN, job step would previously always fail). -- Made it so sacctmgr can't add blank user or account. The MySQL plugin will also reject such requests. -- Revert libpmi.so version for compatibility with SLURM version 2.0 and earlier to avoid forcing applications using a specific libpmi.so version to rebuild unnecessarily (revert from libpmi.so.21.0.0 to libpmi.so.0.0.0). -- Restore support for a pending job's constraints (required node features) when slurmctld is restarted (internal structure needed to be rebuilt). -- Removed checkpoint_blcr.so from the plugin rpm in the slurm.spec since it is also in the blcr rpm. -- Fixed issue in sview where you were unable to edit the count of jobs to share resources. -- BLUEGENE - Fixed issue where tasks on steps weren't being displayed correctly with scontrol and sview. -- BLUEGENE - fixed wiki2 plugin to report correct task count for pending jobs. -- BLUEGENE - Added /etc/ld.so.conf.d/slurm.conf to point to the directory holding libsched_if64.so when building rpms. -- Adjust get_wckeys call in slurmdbd to allow operators to list wckeys. * Changes in SLURM 2.1.1 ============================= -- Fix for case sensitive databases when a slurmctld has a mixed case clustername to lower case the string to easy compares. -- Fix squeue if job is completing and failed to print remaining nodes instead of failed message. -- Fix sview core when searching for partitions by state. -- Fixed setting the start time when querying in sacct to the beginning of the day if not set previously. -- Defined slurm_free_reservation_info_msg and slurm_free_topo_info_msg in common/slurm_protocol_defs.h -- Avoid generating error when a job step includes a memory specification and memory is not configured as a consumable resource. -- Patch for small memory leak in src/common/plugstack.c -- Fix sview search on node state. -- Fix bug in which improperly formed job dependency specification can cause slurmctld to abort. -- Fixed issue where slurmctld wouldn't always get a message to send cluster information when registering for the first time with the slurmdbd. -- Add slurm_*_trigger.3 man pages for event trigger APIs. -- Fix bug in job preemption logic that would free allocated memory twice. -- Fix spelling issues (from Gennaro Oliva) -- Fix issue when changing parents of an account in accounting all childern weren't always sent to their respected slurmctlds until a restart. -- Restore support for srun/salloc/sbatch option --hint=nomultithread to bind tasks to cores rather than threads (broken in slurm v2.1.0-pre5). -- Fix issue where a 2.0 sacct could not talk correctly to a 2.1 slurmdbd. -- BLUEGENE - Fix issue where no partitions have any nodes assigned them to alert user no blocks can be created. -- BLUEGENE - Fix smap to put BGP images when using -Dc on a Blue Gene/P system. -- Set SLURM_SUBMIT_DIR environment variable for srun and salloc commands to match behavior of sbatch command. -- Report WorkDir from "scontrol show job" command for jobs launched using salloc and srun. -- Update correctly the wckey when changing it on a pending job. -- Set wckeyid correctly in accounting when cancelling a pending job. -- BLUEGENE - critical fix where jobs would be killed incorrectly. -- BLUEGENE - fix for sview putting multiple ionodes on to nodelists when viewing the jobs tab. * Changes in SLURM 2.1.0 ============================= -- Improve sview layout of blocks in use. -- A user can now change the dimensions of the grid in sview. -- BLUEGENE - improved startup speed further for large numbers of defined blocks -- Fix to _get_job_min_nodes() in wiki2/get_jobs.c suggested by Michal Novotny -- BLUEGENE - fixed issues when updating a pending job when a node count was incorrect for the asked for connection type. -- BLUEGENE - fixed issue when combining blocks that are in ready states to make a larger block from those or make multiple smaller blocks by splitting the larger block. Previously this would only work with block in a free state. -- Fix bug in wiki(2) plugins where if HostFormat=2 and the task list is greater than 64 we don't truncate. Previously this would mess up Moab by sending a truncated task list when doing a get jobs. -- Added update slurmctld debug level to sview when in admin mode. -- Added logic to make sure if enforcing a memory limit when using the jobacct_gather plugin a user can no longer turn off the logic to enforce the limit. -- Replaced many calls to getpwuid() with reentrant uid_to_string() -- The slurmstepd will now refresh it's log file handle on a reconfig, previously if a log was rolled any output from the stepd was lost. * Changes in SLURM 2.1.0-pre9 ============================= -- Added the "scontrol update SlurmctldDebug" as the preferred alternative to the "scontrol setdebug" command. -- BLUEGENE - made it so when removing a block in an error state the nodes in the block are set correctly in accounting as not in error. -- Fixed issue where if slurmdbd is not up qos' are set up correctly for associations off of cache. -- scontrol, squeue, sview all display the correct node, cpu count along with correct corresponding nodelist on completing jobs. -- Patch (Mark Grondona) fixes serious security vulnerability in SLURM in the spank_job_env functionality. -- Improve spank_job_env interface and documentation -- Add ESPANK_NOT_LOCAL error code to spank_err_t -- Made the #define DECAY_INTERVAL used in the priority/multifactor plugin a slurm.conf variable (PriorityCalcPeriod) -- Added new macro SLURM_VERSION for use in autoconf scripts to determine current version of slurm installed on system when building against the api. -- Patch from Matthieu Hautreux that adds an entry into the error file when a job or step receives a TERM or KILL signal. -- Make it so env var SLURM_SRUN_COMM_HOST is overwritten if already in existence in the slurmd. * Changes in SLURM 2.1.0-pre8 ============================= -- Rearranged the "scontrol show job" output into functional groupings -- Change the salloc/sbatch/srun -P option to -d (dependency) -- Removed the srun -d option; must use srun --slurmd-debug instead -- When running the mysql plugin natively MUNGE errors are now eliminated when sending updates to slurmctlds. -- Check to make sure we have a default account before looking to fill in default association. -- Accounting - Slurmctld and slurmdbd will now set uids of users which were created after the start of the daemons on reconfig. Slurmdbd will attempt to set previously non-existant uids every hour. -- Patch from Aaron Knister and Mark Grondona, to parse correctly quoted #SBATCH options in a batch script. -- job_desc_msg_t - in, out, err have been changed to std_in, std_out, and std_err respectfully. Needed for PySLURM, since Python sees (in) as a keyword. -- Changed the type of addr to struct sockaddr_in in _message_socket_accept() in sattach.c, step_launch.c, and allocate_msg.c, and moved the function into a common place for all the calls since the code was very similar. -- proctrack/lua support has been added see contribs/lua/protrack.lua -- replaced local gtk m4 test with AM_PATH_GTK_2_0 -- changed AC_CHECK_LIB to AC_SEARCH_LIBS to avoid extra libs in compile lines. -- Patch from Matthieu Hautreux to improve error message in slurmd/req.c -- Added support for split groups from (Matthiu Hautreux CEA) -- Patch from Mark Grondona to move blcr scripts into pkglibexecdir -- Patch from Doug Parisek to calculate a job's projected start time under the builtin scheduler. -- Removed most global variables out of src/common/jobacct_common.h * Changes in SLURM 2.1.0-pre7 ============================= -- BLUEGENE - make 2.1 run correctly on a real bluegene cluster -- sacctmgr - Display better debug for when an admin specifies a non-existant parent account when changing parent accounts. -- Added a mechanism to the slurmd to defer the epilog from starting until after a running prolog has finished. -- If a node reboots inbetween checking status the node is marked down unless ReturnToService=2 -- Added -R option to slurmctld to recover partition state also when restarting or reconfiguring. * Changes in SLURM 2.1.0-pre6 ============================= -- When getting information about nodes in hidden partitions, return a node name of NULL rather than returning no information about the node so that node index information is still valid. -- When querying database for jobs in certain state and a time period is given only jobs in that state during the period will be returned, previously if a time period was given in sacct jobs eligible to run or running would be displayed, which is still the default if no states are requested. -- One can now query jobs based on size (nodes and or cpus) (mysql plugin only) -- Applied patch from Mark Grondona that tests for a missing config file before any other processing in spank_init(). This now prevents fatal errors from being mistakenly treated as recoverable. -- --enable-debug no longer has to be stated at configure time to have the slurmctld or slurmstepd dump core on a seg fault. -- Moved the errant slurm_job_node_ready() declaration from job_info.h to slurm.h and deleted job_info.h. -- Added the slurm_job_cpus_allocated_on_node_id() slurm_job_cpus_allocated_on_node() API for working with the job_resources_t structure. -- BLUEGENE - speed up start up for systems that have many blocks (100+) configured on the system. * Changes in SLURM 2.1.0-pre5 ============================= -- Add squeue option "--start" to report expected start time of pending jobs. -- Sched/backfill plugin modified to set expected start time of pending jobs. -- Add SchedulerParameters option of "max_job_bf=#" to control how far down the queue of pending jobs that SLURM searches in an attempt backfill schedule them. The default value is 50 jobs. -- Fixed cause of squeue -o "%C" seg fault. -- Add -"-signal=@