This file describes changes in recent versions of SLURM. It primarily documents those changes that are of interest to users and admins. * Changes in SLURM 0.3.0-pre4 ============================= -- Fix bug where early launch failures (such as invalid UID/GID) resulted in jobs not terminating properly. -- Initial support for BNR committed (not yet functional). -- QsNet: SLURM now uses /etc/elanhosts exclusively for converting hostnames to ElanIDs. * Changes in SLURM 0.3.0-pre3 ============================= -- Fixes for reported problems: - slurm/328: Slurmd was restarting with a new shared memory segment and losing track of jobs - slurm/329: Job processing may be left running when one task dies - slurm/333: Slurmd fails to launch a job and deletes a step, due to a race condition in shared memory management - slurm/334: Slurmd was getting a segv due to a race condition in shared memory management - slurm/342: Properly handle nodes being removed from configuration even when there are partitions, nodes, or job steps still associated with them -- Srun properly terminates jobs/steps upon node failure (used to hang waiting for I/O completion) -- Job time limits enforced even if InactiveLimit configured as zero -- Support the sending of an arbitrary signal to a batch script (but not the processses in its job steps) -- Re-read slurm configuration file whenever changed, needed by users of SLURM APIs -- Scancel was generating a assert failure -- Slurmctld sends a launch response message upon scheduling of a queued job (for immediate srun response) -- Maui scheduler plugin added -- Backfill scheduler plugin added -- Batch scripts can now have arguments that are propogated -- MPICH support added (via patch, not in SLURM CVS) -- New SLURM environment variables added SLMR_CPUS_ON_NODE and SLURM_LAUNCH_NODE_IPADDR, these provide support needed for LAM/MPI (version 7.0.4+) -- The TMPDIR directory is created as needed before job launch -- Do not create duplicate SLURM environment variables with the same name -- Insure proper enforcement of node sharing by job -- Treat lack of SpoolDir or StateSaveDir as a fatal error -- Quickstart.html guide expanded -- Increase maximum jobs steps per node from 16 to 64 -- Delete correct shared memory segment on slurmd -c (clean start) * Changes in SLURM 0.3.0-pre2 ============================= -- Fixes for reported problems: - slurm/326: Properly clean-up jobs terminating on non-responding nodes -- Move all configuration data structure into common/read_config, scontrol now always shows default values if not specified in slurm.conf file -- Remove the unused "Prioritize" configuration parameter * Changes in SLURM 0.3.0-pre1 ============================= -- Fixes for reported problems: - slurm/252: "jobs left orphaned when using TotalView:" SLURM controller now pings srun and kills defunct jobs. - slurm/253: "srun fails to accept new IO connection." - slurm/317: "Lack of default partition in config file causes errors." - slurm/319: Socket errors on multiple simultaneous job launches fixed - slurm/321: slurmd shared memory synchronization error. -- Removed slurm_tv_clean daemon which has been obsoleted by slurm/252 fix. -- New scontrol command ``delete'' and RPC added to delete a partition -- Squeue can now print and sort by group id/name -- Scancel has new option -q,--quiet to not report an error if a job is already complete -- Add the excluded node list to job information reported. -- RPC version mis-match now properly handled -- New job completion plugin interface added for logging completed jobs. -- Fixed lost digit in scontrol job priority specification. -- Remove restriction in the number of consecutive node sets (no longer needed after DPCS upgrade) -- Incomplete state save write now properly handled. -- Modified slurmd setrlimit error for greater clarity. -- Slurmctld performs load-leveling across shared nodes. -- New user function added slurm_get_end_time for user jobs. -- Always compile srun with stabs debug section when TotalView support is requested. * Changes in SLURM 0.2.21 ========================= -- Fixes for reported problems: - slurm/253: Try using different port if connect() fails (was rarely failing when an existing defunct connection was in TIME_WAIT state) - slurm/300: Possibly killing wrong job on slurmd restart - slurm/312: Freeing non-allocated memory and killing slurmd -- Assorted changes to support RedHat Enterprise Linux 3.0 and IA64 -- Initial Elan4 and libelanctrl support (--with-elan). -- Slurmctld was sometimes inappropriately setting a job's priority to 1 when a node was down (even if up nodes could be used for the job when a running job completes) -- Convert all user commands from use of popt library to getopt_long() -- If TotalView support is requested, srun exports "totalview_jobid" variable for `%J' expansion in TV bulk launch string. -- Fix several locking bugs in slurmd IO layer. -- Throttle back repetitious error messages in slurmd to avoid filling log files. * Changes in SLURM 0.2.20 ========================= -- Fixes for reported problems: - slurm/298: Elan initialization error (Invalid vp 2147483674). - slurm/299: srun fails to exit with multiple ^C's. -- Temporarily prevent DPCS from allocating jobs with more than eight sets of consecutive nodes. This was likely causing user applications to fail with libelan errors. This will be removed after DPCS is updated. -- Fix bug in popt use, was failing in some versions of Linux. -- Resend KILL_JOB messages as needed to clear COMPLETING jobs. -- Install dummy SIGCHLD handler in slurmd to fix problem on NPTL systems where slurmd was not notified of terminated tasks. * Changes in SLURM 0.2.19 ========================= -- Memory corruption bug fixed, it was causing slurmctld to seg-fault * Changes in SLURM 0.2.18 ========================= -- Fixes for reported problems: - slurm/287: slurm protocol timeouts when using TotalView. - slurm/291: srun fails using ``-n 1'' under multi-node allocation. - slurm/294: srun IO buffer reports ENOSPC. -- Memory corruption bug fixed, it was causing slurmctld to seg-fault -- Non-responding nodes now go from DRAINING to DRAINED state when jobs complete -- Do not schedule pending jobs while any job is actively COMPLETING unless the submitted job specifically identifies its nodes (like DPCS) -- Reset priority of jobs with priority==1 when a non-responding node starts to respond again -- Ignore jobs with priority==1 when establishing new baseline upon slurmctld restart -- Make slurmctld/message retry be timer based rather than queue based for better scalability -- Slurmctld logging is more concise, using hostlists more -- srun --no-allocate used special job_id range to avoid conflicts or premature job termination (purging by slurmctld) -- New --jobid=id option in srun to initiate job step under an existing allocation. -- Support in srun for TotalView bulk launch. * Changes in SLURM 0.2.17 ========================= -- Fixes for reported problems: - slurm/279: Hold jobs that can't execute due to DOWN or DRAINED nodes and release when nodes are returned to service. - slurm/285: "srun killed due to SIGPIPE" -- Support for running job steps on nodes relative to current allocation via srun -r, --relative=n option. -- SIGKILL no longer broadcasted to job via srun on task failure unless --no-allocate option is used. -- Re-enabled "chkconfig --add" in default RPMs. -- Backup controller setting proper PID into slurmctld.pid file. -- Backup controller restores QSW state each time it assumes control -- Backup controller purges old job records before assuming control to avoid resurrecting defunct jobs. -- Kill jobs on non-responding DRAINING nodes and make their state DRAINED. -- Save state upon completion of a job's last EPILOG_COMPLETION to reduce possibility of inconsistent job and node records when the controller is transitioning between primary and backup. -- Change logging level of detailed communication errors to not print them unless detailed debugging is requested. -- Increase number of concurrent controller server threads from 20 to 50 and restructure code to handle backlogs more efficiently. -- Partition state at controller startup is based upon slurm.conf rather than previously saved state. Additional improvements to avoid inconsistent job/node/partition states at restart. Job state information is used to arbitrate conflicts. -- Orphaned file descriptors eliminated. * Changes in SLURM 0.2.16 ========================= -- Fixes for reported problems: - slurm/265: Early termination of srun could cause job to remain in queue. - slurm/268: Slurmctld could deadlock if there was a delay in the termination of a large node-count job. An EPILOG_COMPLETE RPC was added so that slurmd could notify slurmctld whenever the job termination was completed. - slurm/270: Segfault in sinfo if a configured node lacked a partition. - slurm/278: Exit code in scontrol did not indicate failure. -- Fixed bug in slurmd that caused the daemon to occaisionally kill itself. -- Fixed bug in srun when running with --no-allocate and >1 process per node. -- Small fixes and updates for srun manual. * Changes in SLURM 0.2.15 ========================= -- Fixes for reported problems: - slurm/265: Job was orphaned when allocation response message could not be sent. Job is now killed on allocation response message transmit failure and socket error details are logged. - Fix for slurm/267: "Job epilog may run multiple times." -- Squeue job TIMELIMIT format changed from "h:mm" to "d:h:mm:ss". -- DPCS initiated jobs have steps execute properly without explicit specification of node count. * Changes in SLURM 0.2.14 ========================= -- Fixes for reported problems: - slurm/194: "srun doesn't handle most options when run under an allocation." - slurm/244: "REQ: squeue shows requested size of pending jobs." -- SLURM_NODELIST environment variable now exported to all jobs, not only batch jobs. -- Nodelist displayed in squeue for completing jobs is now restricted to completing nodes. -- Node "reason" field properly displayed in sinfo even with filtering. -- ``slurm_tv_clean'' daemon now supports a log file. -- Batch jobs are now re-queued on launch failure. -- Controller confirms job scripts for batch jobs are still running on node zero at node registration. -- Default RPMs no longer stop/start SLURM daemons on upgrade or install. * Changes in SLURM 0.2.13 ========================= -- Fixes for reported problems: - Fixed bug in slurmctld where "drained" nodes would go back into the "idle" state under some conditions (slurm/228). - Added possible fix for slurm/229: "slurmd occasionally fails to reap all children." -- Fixed memory leak in auth_munge plugin. -- Added fix to slurmctld to allow arbitrarily large job specifications to be saved and recovered in the state file. -- Allow "updates" in the configuration file of previously defined node state and reason. -- On "forceful termination" of a running job step, srun now exits unconditionally, instead of waiting for all I/O. -- Slurmctld now uses pidfile to kill old daemon when a new one is started. -- Addition of new daemon "slurm_tv_clean" used to clean up jobs orphaned due to use of the TotalView parallel debugger. * Changes in SLURM 0.2.12 ========================= -- Fixes for reported problems: - Fix for "waitpid: No child processes" when using TotalView (slurm/217). - Implemented temporary workaround for slurm/223: "Munge decode failed: Munged communication error." - Temporary fix for slurm/222: "elan3_create(0): Invalid argument." -- Fixed memory leaks in slurmctld (mostly due to reconfigure). -- More squeue/sinfo interface changes (see squeue(1), sinfo(1)). -- Sinfo now accepts list of node states to -t,--state option. -- Node "reason" field now available via sinfo command (see sinfo(1)). -- Wrapper source for srun (srun.wrapper.c) now installed and available for TotalView support. -- Improved retry login in user commands for periods when slurmctld primary is down and backup has not yet taken over. * Changes in SLURM 0.2.11 ========================= -- Changes in srun: - Fixed bug in signal handling that occaisonally resulted in orphaned jobs when using Ctrl-C. - Return non-zero exit code when remote tasks are killed by a signal. - SIGALRM is now blocked by default. -- Added ``reason'' string for down, drained, or draining nodes. -- Added -V,--version option to squeue and sinfo. -- Improved some error messages from user utilities. * Changes in SLURM 0.2.10 ========================= -- New slurm.conf configuration parameters: - WaitTime: Default for srun -w,--wait parameter. - MaxJobCount: Maximum number of jobs SLURM can handle at one time. - MinJobAge: Minimum time since completing before job is purged from slurmctld memory. -- Block user defined signals USR1 and USR2 in slurmd session manager. -- More squeue cleanup. -- Support for passing options to sinfo via environment variables. -- Added option to scontrol to find intersection of completing jobs and nodes. -- Added fix in auth_munge to prevent "Munged communication error" message. * Changes in SLURM 0.2.9 ======================== -- Fixes for reported problems: - Argument to srun `-n' option was taken as octal if preceeded with a `0'. -- New format for Elan hosts config file (/etc/elanhosts. See README) -- Various fixes for managing COMPLETING jobs. -- Support for passing options to squeue via environment variables (see squeue(1)) * Changes in SLURM 0.2.8 ========================= -- Fix for bug in slurmd that could make debug messages appear in job output. -- Fix for bug in slurmctld retry count computation. -- Srun now times out slow launch threads. -- "Time Used" output in squeue now includes seconds. * Changes in SLURM 0.2.7 ========================= -- Fix for bug in Elan module that results in slurmd hang. -- Added completing job state to default list of states to print with squeue. * Changes in SLURM 0.2.6 ========================= -- More fixes for handling cleanup of slow terminating jobs. -- Fixed bug in srun that might leave nodes allocated after a Ctrl-C. * Changes in SLURM 0.2.5 ========================= -- Various fixes for cleanup of slow terminating or unkillable jobs. -- Fixed some small memory leaks in communications code. -- Added hack for synchronized exit of jobs on large node count. -- Long lists of nodes are no longer truncated in sinfo. -- Print more descriptive error message when tasks exit with nonzero status. -- Fixed bug in srun where unsuccessful launch attempts weren't detected. -- Elan network error resolver thread now runs from elan module in slurmd. -- Slurmctld uses consecutive Elan context and program description numbers instead of choosing them randomly. * Changes in SLURM 0.2.4 ========================== -- Fix for file descriptor leak in slurmctld. -- auth_munge plugin now prints credential info on decode failure. -- Minor changes to scancel interface. -- Filename format option "%J" now works again for srun --output and --error. * Changes in SLURM 0.2.3 ========================== -- Fix bug in srun when using per-task files for stderr. -- Better error reporting on failure to open per-task input/output files. -- Update auth_munge plugin for munge 0.1. -- Minor changes to squeue interface. -- New srun option `--hold' to submit job in "held" state. * Changes in SLURM 0.2.2 ========================== -- Fixes for reported problems: - Execution of script allocate mode fails in some cases. (gnats:161) - Errors using per-task input files with Elan support. (gnats:162) - srun doesn't handle all environment variables properly. (gnats:164) -- Parallel job is now terminated if a task is killed by a signal. -- Exit status of srun is set based on exit codes of tasks. -- Redesign of sinfo interface and options. -- Shutdown of slurmctld no longer propagates shutdown to all nodes. * Changes in SLURM 0.2.1 =========================== -- Fix bug where reconfigure request to slurmctld killed the daemon. * Changes in SLURM 0.2.0 ============================ -- SlurmdTimeout of 0 means never set a non-responding node to DOWN. -- New srun option, -u,--unbuffered, for unbuffered stdout. -- Enhancements for sinfo - Non-responding nodes show "*" character appended instead of "NoResp+". - Node states show abbreviated variant by default -- Enhancements for scontrol. - Added "ping" command to show current state of SLURM controllers. - Job dump in scontrol shows user name as well as UID. - Node state of DRAIN is appropriately mapped to DRAINING or DRAINED. -- Fix for bug where request for task count greater than partition limit was queued anyway. -- Fix for bugs in job end time handling. -- Modifications for error free builds on 64 bit architectures. -- Job cancel immediately deallocates nodes instead of waiting on srun. -- Attempt to create slurmd spool if it does not exist. -- Fixed signal handling bug in srun allocate mode. -- Earlier error detection in slurmd startup. -- "fatal: _shm_unlock: Numerical result out of range" bug fixed in slurmd. -- Config file parsing is now case insensitive. -- SLURM_NODELIST environment variable now set in allocate mode. * Changes in SLURM 0.2.0-pre2 ============================= -- Fix for reconfigure when public/private key path is changed. -- Shared memory fixes in slurmd. - fix for infinite semaphore incrementation bug. -- Semaphore fixes in slurmctld. -- Slurmctld now remembers which nodes have registered after recover. -- Fixed reattach bug when tasks have exited. -- Change directory to /tmp in slurmd if daemonizing. -- Logfiles are reopened on reconfigure. $Id$