NEWS 148 KB
Newer Older
Christopher J. Morrone's avatar
Christopher J. Morrone committed
This file describes changes in recent versions of SLURM. It primarily
documents those changes that are of interest to users and admins.
* Changes in SLURM 2.4.3
========================
 -- Accounting - Fix so complete 32 bit numbers can be put in for a priority.
 -- cgroups - fix if initial directory is non-existent SLURM creates it
    correctly.  Before the errno wasn't being checked correctly
 -- BGQ - fixed srun when only requesting a task count and not a node count
    to operate the same way salloc or sbatch did and assign a task per cpu
    by default instead of task per node.
* Changes in SLURM 2.4.2
========================
 -- BLUEGENE - Correct potential deadlock issue when hardware goes bad and
    there are jobs running on that hardware.
 -- If job is submitted to more than one partition, it's partition pointer can
    be set to an invalid value. This can result in the count of CPUs allocated
    on a node being bad, resulting in over- or under-allocation of its CPUs.
    Patch by Carles Fenoy, BSC.
 -- Fix bug in task layout with select/cons_res plugin and --ntasks-per-node
    option. Patch by Martin Perry, Bull.
 -- BLUEGENE - remove race condition where if a block is removed while waiting
    for a job to finish on it the number of unused cpus wasn't updated
    correctly.
 -- BGQ - make sure we have a valid block when creating or finishing a step
    allocation.
 -- BLUEGENE - If a large block (> 1 midplane) is in error and underlying
    hardware is marked bad remove the larger block and create a block over
    just the bad hardware making the other hardware available to run on.
 -- BLUEGENE - Handle job completion correctly if an admin removes a block
    where other blocks on an overlapping midplane are running jobs.
 -- BLUEGENE - correctly remove running jobs when freeing a block.
 -- BGQ - correct logic to place multiple (< 1 midplane) steps inside a
    multi midplane block allocation.
 -- BGQ - Make it possible for a multi midplane allocation to run on more
    than 1 midplane but not the entire allocation.
 -- BGL - Fix for syncing users on block from Tim Wickberg
 -- Fix initialization of protocol_version for some messages to make sure it
    is always set when sending or receiving a message.
 -- Reset backfilled job counter only when explicitly cleared using scontrol.
    Patch from Alejandro Lucero Palau, BSC.
 -- BLUEGENE - Fix for handling blocks when a larger block will not free and
    while it is attempting to free underlying hardware is marked in error
    making small blocks overlapping with the freeing block.  This only
    applies to dynamic layout mode.
 -- Cray and BlueGene - Do not treat lack of usable front-end nodes when
    slurmctld deamon starts as a fatal error. Also preserve correct front-end
    node for jobs when there is more than one front-end node and the slurmctld
    daemon restarts.
 -- Correct parsing of srun/sbatch input/output/error file names so that only
    the name "none" is mapped to /dev/null and not any file name starting
    with "none" (e.g. "none.o").
 -- BGQ - added version string to the load of the runjob_mux plugin to verify
    the current plugin has been loaded when using runjob_mux_refresh_config
 -- CGROUPS - Use system mount/umount function calls instead of doing fork
    exec of mount/umount from Janne Blomqvist.
 -- BLUEGENE - correct start time setup when no jobs are blocking the way
    from Mark Nelson
 -- Fixed sacct --state=S query to return information about suspended jobs
    current or in the past.
 -- FRONTEND - Made error warning more apparent if a frontend node isn't
    configured correctly.
 -- BGQ - update documentation about runjob_mux_refresh_config which works
    correctly as of IBM driver V1R1M1 efix 008.
* Changes in SLURM 2.4.1
========================
 -- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved
    correctly when transitioning.  This also applies for 2.4.0 -> 2.4.1, no
    state will be lost. (Thanks to Carles Fenoy)

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.4.0
========================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.
 -- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug
 -- Modify scontrol to require "-dd" option to report batch job's script. Patch
    from Don Albert, Bull.
 -- Modify SchedulerParamters option to match documentation: "bf_res="
    changed to "bf_resolution=". Patch from Rod Schultz, Bull.
 -- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
 -- In etc/init.d/slurm move check for scontrol after sourcing
    /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
 -- Fix in scheduling logic that can delay jobs with min/max node counts.
 -- BGQ - fix issue where if a step uses the entire allocation and then
    the next step in the allocation only uses part of the allocation it gets
    the correct cnodes.
 -- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
    function didn't always work correctly.
 -- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
    to make a larger small block and are running with sub-blocks.
 -- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
 -- BGQ - When using an old IBM driver cnodes that go into error because of
    a job kill timeout aren't always reported to the system.  This is now
    handled by the runjob_mux plugin.
 -- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
 -- Improve memory consumption on step layouts with high task count.
 -- BGQ - quiter debug when the real time server comes back but there are
    still messages we find when we poll but haven't given it back to the real
    time yet.
 -- BGQ - fix for if a request comes in smaller than the smallest block and
    we must use a small block instead of a shared midplane block.
 -- Fix issues on large jobs (>64k tasks) to have the correct counter type when
    packing the step layout structure.
 -- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
    but not node count the node count is correctly figured out.
 -- Move logic to always use the 1st alphanumeric node as the batch host for
    batch jobs.
 -- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
    same time a block is destroyed and that block just happens to be the
    smallest overlapping block over the bad hardware.
 -- Fix bug when querying accounting looking for a job node size.
 -- BLUEGENE - fix possible race condition if cleaning up a block and the
    removal of the job on the block failed.
 -- BLUEGENE - fix issue if a cable was in an error state make it so we can
    check if a block is still makable if the cable wasn't in error.
 -- Put nodes names in alphabetic order in node table.
 -- If preempted job should have a grace time and preempt mode is not cancel
    but job is going to be canceled because it is interactive or other reason
    it now receives the grace time.
 -- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
    in order for the runjob_mux to run correctly.
 -- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.4.0.rc1
Morris Jette's avatar
Morris Jette committed
=============================
Morris Jette's avatar
Morris Jette committed
 -- Improve task binding logic by making fuller use of HWLOC library,
    especially with respect to Opteron 6000 series processors. Work contributed
    by Komoto Masahiro.
 -- Add new configuration parameter PriorityFlags, based upon work by
    Carles Fenoy (Barcelona Supercomputer Center).
 -- Modify the step completion RPC between slurmd and slurmstepd in order to
    eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
 -- Change the owner of slurmctld and slurmdbd log files to the appropriate
    user. Without this change the files will be created by and owned by the
    user starting the daemons (likely user root).
 -- Reorganize the slurmstepd logic in order to better support NFS and
    Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA.
 -- Fix bug in allocating GRES that are associated with specific CPUs. In some
    cases the code allocated first available GRES to job instead of allocating
    GRES accessible to the specific CPUs allocated to the job.
 -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
    and job epilog/prolog: slurm_spank_job_{prolog,epilog}
 -- spank: Add spank_option_getopt() function to api
 -- Change resolution of switch wait time from minutes to seconds.
 -- Added CrpCPUMins to the output of sshare -l for those using hard limit
    accounting.  Work contributed by Mark Nelson.
 -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
    additional resources for newly launched tasks. Contributed by Hongjia Cao,
    NUDT.
 -- BGQ - fixed issue where if a user asked for a specific node count and more
    tasks than possible without overcommit the request would be allowed on more
    nodes than requested.
 -- Add support for new SchedulerParameters of bf_max_job_user, maximum number
    of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
    University of Oslo.
 -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
    larger than midplane jobs.
 -- Added cpu_run_min to the output of sshare --long.  Work contributed by
    Mark Nelson.
 -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
 -- Add sinfo output format option of "%R" for partition name without "*"
    appended for default partition.
 -- Cray - Add support for zero compute note resource allocation to run batch
    script on front-end node with no ALPS reservation. Useful for pre- or post-
    processing.
 -- Support for cyclic distribution of cpus in task/cgroup plugin from Martin
    Perry, Bull.
 -- GrpMEM limit for QOSes and associations added Patch from Bjørn-Helge Mevik,
    University of Oslo.
 -- Various performance improvements for up to 500% higher throughput depending
    upon configuration. Work supported by the Oak Ridge National Laboratory
    Extreme Scale Systems Center.
 -- Added jobacct_gather/cgroup plugin.  It is not advised to use this in
    production as it isn't currently complete and doesn't provide an equivalent
    substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre4
=============================
 -- Add logic to cache GPU file information (bitmap index mapping to device
    file number) in the slurmd daemon and transfer that information to the
    slurmstepd whenever a job step is initiated. This is needed to set the
    appropriate CUDA_VISIBLE_DEVICES environment variable value when the
    devices are not in strict numeric order (e.g. some GPUs are skipped).
    Based upon work by Nicolas Bigaouette.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - Remove ability to make a sub-block with a geometry with one or more
    of it's dimensions of length 3.  There is a limitation in the IBM I/O
    subsystem that is problematic with multiple sub-blocks with a dimension
    of length 3, so we will disallow them to be able to be created.  This
    mean you if you ask the system for an allocation of 12 c-nodes you will
    be given 16.  If this is ever fix in BGQ you can remove this patch.
 -- BLUEGENE - Better handling blocks that go into error state or deallocate
    while jobs are running on them.
 -- BGQ - fix for handling mix of steps running at same time some of which
    are full allocation jobs, and others that are smaller.
 -- BGQ - fix for core dump after running multiple sub-block jobs on static
    blocks.
 -- BGQ - fixed sync issue where if a job finishes in SLURM but not in mmcs
    for a long time after the SLURM job has been flushed from the system
    we don't have to worry about rebooting the block to sync the system.
 -- BGQ - In scontrol/sview node counts are now displayed with
    CnodeCount/CnodeErrCount so to point out there are cnodes in an error state
    on the block.  Draining the block and having it reboot when all jobs are
    gone will clear up the cnodes in Software Failure.
 -- Change default SchedulerParameters max_switch_wait field value from 60 to
    300 seconds.
 -- BGQ - catch errors from the kill option of the runjob client.
 -- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is
    gone.  Previously it had a timelimit which has proven to not be the right
    thing.
 -- FRONTEND - fix issue where if a compute node was in a down state and
    an admin updates the node to idle/resume the compute nodes will go
    instantly to idle instead of idle* which means no response.
Danny Auble's avatar
Danny Auble committed
 -- Fix regression in 2.4.0.pre3 where number of submitted jobs limit wasn't
    being honored for QOS.
 -- Cray - Enable logging of BASIL communications with environment variables.
    Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
    or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log".
    Patch from Steve Tronfinoff, CSCS.
 -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
    mark front end node down.
 -- FRONTEND - don't down a front end node if you have an epilog error
 -- BLUEGENE - if a job has an epilog error don't down the midplane it was
    running on.
 -- BGQ - added new DebugFlag (NoRealTime) for only printing debug from
    state change while the realtime server is running.
 -- Fix multi-cluster mode with sview starting on a non-bluegene cluster going
    to a bluegene cluster.
 -- BLUEGENE - ability to show Rack Midplane name of midplanes in sview and
    scontrol.
* Changes in SLURM 2.4.0.pre3
=============================
 -- Let a job be submitted even if it exceeds a QOS limit. Job will be left
    in a pending state until the QOS limit or job parameters change. Patch by
    Phil Eckert, LLNL.
Morris Jette's avatar
Morris Jette committed
 -- Add sacct support for the option "--name". Work by Yuri D'Elia, Center for
    Biomedicine, EURAC Research, Italy.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - handle preemption.
 -- Add an srun shepard process to cancel a job and/or step of the srun process
    is killed abnormally (e.g. SIGKILL).
 -- BGQ - handle deadlock issue when a nodeboard goes into an error state.
 -- BGQ - more thorough handling of blocks with multiple jobs running on them.
 -- Fix man2html process to compile in the build directory instead of the
    source dir.
 -- Behavior of srun --multi-prog modified so that any program arguments
    specified on the command line will be appended to the program arguments
    specified in the program configuration file.
jette's avatar
jette committed
 -- Add new command, sdiag, which reports a variety of job scheduling
    statistics. Based upon work by Alejandro Lucero Palau, BSC.
 -- BLUEGENE - Added DefaultConnType to the bluegene.conf file.  This makes it
    so you can specify any connection type you would like (TORUS or MESH) as
    the default in dynamic mode.  Previously it always defaulted to TORUS.
 -- Made squeue -n and -w options more consistent with salloc, sbatch, srun,
    and scancel. Patch by Don Lipari, LLNL.
 -- Have sacctmgr remove user records when no associations exist for that user.
jette's avatar
jette committed
 -- Several header file changes for clean build with NetBSD. Patches from
    Aleksej Saushev.
 -- Fix for possible deadlock in accounting logic: Avoid calling
    jobacct_gather_g_getinfo() until there is data to read from the socket.
 -- Fix race condition that could generate "job_cnt_comp underflow" errors on
    front-end architectures.
 -- BGQ - Fix issue where a system with missing cables could cause core dump.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre2
=============================
 -- CRAY - Add support for GPU memory allocation using SLURM GRES (Generic
    RESource) support. Work by Steve Trofinoff, CSCS.
 -- Add support for job allocations with multiple job constraint counts. For
    example: salloc -C "[rack1*2&rack2*4]" ... will allocate the job 2 nodes
    from rack1 and 4 nodes from rack2. Support for only a single constraint
    name been added to job step support.
 -- BGQ - Remove old method for marking cnodes down.
 -- BGQ - Remove BGP images from view in sview.
 -- BGQ - print out failed cnodes in scontrol show nodes.
 -- BGQ - Add srun option of "--runjob-opts" to pass options to the runjob
    command.
 -- FRONTEND - handle step launch failure better.
 -- BGQ - Added a mutex to protect the now changing ba_system pointers.
 -- BGQ - added new functionality for sub-block allocations - no preemption
    for this yet though.
 -- Add --name option to squeue to filter output by job name. Patch from Yuri
    D'Elia.
 -- BGQ - Added linking to runjob client libary which gives support to totalview
    to use srun instead of runjob.
 -- Add numeric range checks to scontrol update options. Patch from Phil
    Eckert, LLNL.
 -- Add ReconfigFlags configuration option to control actions of "scontrol
    reconfig". Patch from Don Albert, Bull.
 -- BGQ - handle reboots with multiple jobs running on a block.
 -- BGQ - Add message handler thread to forward signals to runjob process.
* Changes in SLURM 2.4.0.pre1
=============================
 -- BGQ - use the ba_geo_tables to figure out the blocks instead of the old
    algorithm.  The improves timing in the worst cases and simplifies the code
    greatly.
 -- BLUEGENE - Change to output tools labels from BP to Midplane
    (i.e. BP List -> MidplaneList).
 -- BLUEGENE - read MPs and BPs from the bluegene.conf
 -- Modify srun's SIGINT handling logic timer (two SIGINTs within one second) to
    be based microsecond rather than second timer.
 -- Modify advance reservation to accept multiple specific block sizes rather
    than a single node count.
 -- Permit administrator to change a job's QOS to any value without validating
    the job's owner has permission to use that QOS. Based upon patch by Phil
    Eckert (LLNL).
 -- Add trigger flag for a permanent trigger. The trigger will NOT be purged
    after an event occurs, but only when explicitly deleted.
 -- Interpret a reservation with Nodes=ALL and a Partition specification as
    reserving all nodes within the specified partition rather than all nodes
    on the system. Based upon patch by Phil Eckert (LLNL).
 -- Add the ability to reboot all compute nodes after they become idle. The
    RebootProgram configuration parameter must be set and an authorized user
    must execute the command "scontrol reboot_nodes". Patch from Andriy
    Grytsenko (Massive Solutions Limited).
 -- Modify slurmdbd.conf parsing to accept DebugLevel strings (quiet, fatal,
    info, etc.) in addition to numeric values. The parsing of slurm.conf was
    modified in the same fashion for SlurmctldDebug and SlurmdDebug values.
    The output of sview and "scontrol show config" was also modified to report
    those values as strings rather than numeric values.
 -- Changed default value of StateSaveLocation configuration parameter from
    /tmp to /var/spool.
 -- Prevent associations from being deleted if it has any jobs in running,
    pending or suspended state. Previous code prevented this only for running
    jobs.
 -- If a job can not run due to QOS or association limits, then do not cancel
    the job, but leave it pending in a system held state (priority = 1). The
    job will run when its limits or the QOS/association limits change. Based
    upon a patch by Phil Ekcert (LLNL).
 -- BGQ - Added logic to keep track of cnodes in an error state inside of a
    booted block.
 -- Added the ability to update a node's NodeAddr and NodeHostName with
    scontrol. Also enable setting a node's state to "future" using scontrol.
 -- Add a node state flag of CLOUD and save/restore NodeAddr and NodeHostName
    information for nodes with a flag of CLOUD.
 -- Cray: Add support for job reservations with node IDs that are not in
    numeric order. Fix for Bugzilla #5.
 -- BGQ - Fix issue with smap -R
 -- Fix association limit support for jobs queued for multiple partitions.
 -- BLUEGENE - fix issue for sub-midplane systems to create a full system
    block correctly.
 -- BLUEGENE - Added option to the bluegene.conf to tell you are running on
    a sub midplane system.
 -- Added the UserID of the user issuing the RPC to the job_submit/lua
    functions.
 -- Fixed issue where if a job ended with ESLURMD_UID_NOT_FOUND and
    ESLURMD_GID_NOT_FOUND where slurm would be a little over zealous
    in treating missing a GID or UID as a fatal error.
 -- If job time limit exceeds partition maximum, but job's minimum time limit
    does not, set job's time limit to partition maximum at allocation time.
* Changes in SLURM 2.3.6
========================
 -- Fix DefMemPerCPU for partition definitions.
 -- Fix to create a reservation with licenses and no nodes.
 -- Fix issue with assoc_mgr if a bad state file is given and the database
    isn't up at the time the slurmctld starts, not running the
    priority/multifactor plugin, and then the database is started up later.
 -- Gres: If a gres has a count of one and an associated file then when doing
    a reconfiguration, the node's bitmap was not cleared resulting in an
    underflow upon job termination or removal from scheduling matrix by the
    backfill scheduler.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.3.5
========================
 -- Improve support for overlapping advanced reservations. Patch from
    Bill Brophy, Bull.
 -- Modify Makefiles for support of Debian hardening flags. Patch from
    Simon Ruderich.
 -- CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
    node that is DOWN in ALPS as DOWN in SLURM).
 -- Fixed the setting of SLURM_SUBMIT_DIR for jobs submitted by Moab (BZ#1467).
    Patch by Don Lipari, LLNL.
 -- Correction to init.d/slurmdbd exit code for status option. Patch by Bill
    Brophy, Bull.
 -- When the optional max_time is not specified for --switches=count, the site
    max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
    Based on patch from Rod Schultz.
 -- Fix bug in select/cons_res plugin when used with topology/tree and a node
    range count in job allocation request.
 -- Fixed moab_2_slurmdb.pl script to correctly work for end records.
 -- Add support for new SchedulerParameters of max_depend_depth defining the
    maximum number of jobs to test for circular dependencies (i.e. job A waits
    for job B to start and job B waits for job A to start). Default value is
    10 jobs.
 -- Fix potential race condition if MinJobAge is very low (i.e. 1) and using
    slurmdbd accounting and running large amounts of jobs (>50 sec).  Job
    information could be corrupted before it had a chance to reach the DBD.
 -- Fix state restore of job limit set from admin value for min_cpus.
 -- Fix clearing of limit values if an admin removes the limit for max cpus
    and time limit where it was previously set by an admin.
 -- Fix issue where log message is more than 256 chars and then has a format.
 -- Fix sched/wiki2 to support job account name, gres, partition name, wckey,
    or working directory that contains "#" (a job record separator). Also fix
    for wckey or working directory that contains a double quote '\"'.
 -- CRAY - fix for handling memory requests from user for an allocation.
 -- Add support for switches parameter to the job_submit/lua plugin. Work by
    Par Andersson, NSC.
 -- Fix to job preemption logic to preempt multiple jobs at the same time.
 -- Fix minor issue where uid and gid were switched in sview for submitting
    batch jobs.
 -- Fix possible illegal memory reference in slurmctld for job step with
    relative option. Work by Matthieu Hautreux (CEA).
 -- Reset priority of system held jobs when dependency is satisfied. Work by
    Don Lipari, LLNL.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.3.4
========================
 -- Set DEFAULT flag in partition structure when slurmctld reads the
    configuration file. Patch from Rémi Palancher.
 -- Fix for possible deadlock in accounting logic: Avoid calling
    jobacct_gather_g_getinfo() until there is data to read from the socket.
 -- Fix typo in accounting when using reservations. Patch from Alejandro
    Lucero Palau.
 -- Fix to the multifactor priority plugin to calculate effective usage earlier
    to give a correct priority on the first decay cycle after a restart of the
    slurmctld. Patch from Martin Perry, Bull.
 -- Permit user root to run a job step for any job as any user. Patch from
    Didier Gazen, Laboratoire d'Aerologie.
 -- BLUEGENE - fix for not allowing jobs if all midplanes are drained and all
    blocks are in an error state.
 -- Avoid slurmctld abort due to bad pointer when setting an advanced
    reservation MAINT flag if it contains no nodes (only licenses).
Morris Jette's avatar
Morris Jette committed
 -- Fix bug when requeued batch job is scheduled to run on a different node
    zero, but attemts job launch on old node zero.
 -- Fix bug in step task distribution when nodes are not configured in numeric
    order. Patch from Hongjia Cao, NUDT.
 -- Fix for srun allocating running within existing allocation with --exclude
    option and --nnodes count small enough to remove more nodes. Patch from
    Phil Eckert, LLNL.
 -- Work around to handle certain combinations of glibc/kernel
    (i.e. glibc-2.14/Linux-3.1) to correctly open the pty of the slurmstepd
    as the job user. Patch from Mark Grondona, LLNL.
 -- Modify linking to include "-ldl" only when needed. Patch from Aleksej
    Saushev.
 -- Fix smap regression to display nodes that are drained or down correctly.
 -- Several bug fixes and performance improvements with related to batch
    scripts containing very large numbers of arguments. Patches from Par
    Andersson, NSC.
 -- Fixed extremely hard to reproduce threading issue in assoc_mgr.
 -- Correct "scontrol show daemons" output if there is more than one
    ControlMachine configured.
 -- Add node read lock where needed in slurmctld/agent code.
Morris Jette's avatar
Morris Jette committed
 -- Added test for LUA library named "liblua5.1.so.0" in addition to
    "liblua5.1.so" as needed by Debian. Patch by Remi Palancher.
 -- Added partition default_time field to job_submit LUA plugin. Patch by
    Remi Palancher.
 -- Fix bug in cray/srun wrapper stdin/out/err file handling.
 -- In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
    option is used.
 -- BLUEGENE - fix issue where if a small block was in error it could hold up
    the queue when trying to place a larger than midplane job.
 -- CRAY - ignore all interactive nodes and jobs on interactive nodes.
 -- Add new job state reason of "FrontEndDown" which applies only to Cray and
    IBM BlueGene systems.
 -- Cray - Enable configure option of "--enable-salloc-background" to permit
    the srun and salloc commands to be executed in the background. This does
    NOT remove the ALPS limitation that only one job reservation can be created
    for each Linux session ID.
 -- Cray - For srun wrapper when creating a job allocation, set the default job
    name to the executable file's name.
 -- Add support for Cray ALPS 5.0.0
 -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
    mark front end node down.
 -- FRONTEND - don't down a front end node if you have an epilog error.
 -- Cray - fix for if a frontend slurmd was started after the slurmctld had
    already pinged it on startup the unresponding flag would be removed from
    the frontend node.
 -- Cray - Fix issue on smap not displaying grid correctly.
 -- Fixed minor memory leak in sview.
Morris Jette's avatar
Morris Jette committed

* Changes in SLURM 2.3.3
========================
 -- Fix task/cgroup plugin error when used with GRES. Patch by Alexander
    Bersenev (Institute of Mathematics and Mechanics, Russia).
 -- Permit pending job exceeding a partition limit to run if its QOS flag is
    modified to permit the partition limit to be exceeded. Patch from Bill
    Brophy, Bull.
 -- BLUEGENE - Fixed preemption issue.
 -- sacct search for jobs using filtering was ignoring wckey filter.
 -- Fixed issue with QOS preemption when adding new QOS.
 -- Fixed issue with comment field being used in a job finishing before it
    starts in accounting.
 -- Add slashes in front of derived exit code when modifying a job.
 -- Handle numeric suffix of "T" for terabyte units. Patch from John Thiltges,
    University of Nebraska-Lincoln.
 -- Prevent resetting a held job's priority when updating other job parameters.
    Patch from Alejandro Lucero Palau, BSC.
Morris Jette's avatar
Morris Jette committed
 -- Improve logic to import a user's environment. Needed with --get-user-env
    option used with Moab. Patch from Mark Grondona, LLNL.
 -- Fix bug in sview layout if node count less than configured grid_x_width.
 -- Modify PAM module to prefer to use SLURM library with same major release
    number that it was built with.
 -- Permit gres count configuration of zero.
 -- Fix race condition where sbcast command can result in deadlock of slurmd
    daemon. Patch by Don Albert, Bull.
 -- Fix bug in srun --multi-prog configuration file to avoid printing duplicate
    record error when "*" is used at the end of the file for the task ID.
 -- Let operators see reservation data even if "PrivateData=reservations" flag
    is set in slurm.conf. Patch from Don Albert, Bull.
 -- Added new sbatch option "--export-file" as needed for latest version of
    Moab. Patch from Phil Eckert, LLNL.
 -- Fix for sacct printing CPUTime(RAW) where the the is greater than a 32 bit
    number.
 -- Fix bug in --switch option with topology resulting in bad switch count use.
    Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).
 -- Fix PrivateFlags bug when using Priority Multifactor plugin.  If using sprio
    all jobs would be returned even if the flag was set.
    Patch from Bill Brophy, Bull.
 -- Fix for possible invalid memory reference in slurmctld in job dependency
    logic. Patch from Carles Fenoy (Barcelona Supercomputer Center).
* Changes in SLURM 2.3.2
========================
 -- Add configure option of "--without-rpath" which builds SLURM tools without
    the rpath option, which will work if Munge and BlueGene libraries are in
    the default library search path and make system updates easier.
 -- Fixed issue where if a job ended with ESLURMD_UID_NOT_FOUND and
    ESLURMD_GID_NOT_FOUND where slurm would be a little over zealous
    in treating missing a GID or UID as a fatal error.
 -- Backfill scheduling - Add SchedulerParameters configuration parameter of
    "bf_res" to control the resolution in the backfill scheduler's data about
    when jobs begin and end. Default value is 60 seconds (used to be 1 second).
 -- Cray - Remove the "family" specification from the GPU reservation request.
 -- Updated set_oomadj.c, replacing deprecated oom_adj reference with
    oom_score_adj
 -- Fix resource allocation bug, generic resources allocation was ignoring the
    job's ntasks_per_node and cpus_per_task parameters. Patch from Carles
    Fenoy, BSC.
 -- Avoid orphan job step if slurmctld is down when a job step completes.
 -- Fix Lua link order, patch from Pär Andersson, NSC.
 -- Set SLURM_CPUS_PER_TASK=1 when user specifies --cpus-per-task=1.
 -- Fix for fatal error managing GRES. Patch by Carles Fenoy, BSC.
 -- Fixed race condition when using the DBD in accounting where if a job
    wasn't started at the time the eligible message was sent but started
    before the db_index was returned information like start time would be lost.
 -- Fix issue in accounting where normalized shares could be updated
    incorrectly when getting fairshare from the parent.
 -- Fixed if not enforcing associations  but want QOS support for a default
    qos on the cluster to fill that in correctly.
 -- Fix in select/cons_res for "fatal: cons_res: sync loop not progressing"
    with some configurations and job option combinations.
 -- BLUEGNE - Fixed issue with handling HTC modes and rebooting.
* Changes in SLURM 2.3.1
========================
 -- Do not remove the backup slurmctld's pid file when it assumes control, only
    when it actually shuts down. Patch from Andriy Grytsenko (Massive Solutions
    Limited).
 -- Avoid clearing a job's reason from JobHeldAdmin or JobHeldUser when it is
    otherwise updated using scontrol or sview commands. Patch based upon work
    by Phil Eckert (LLNL).
 -- BLUEGENE - Fix for if changing the defined blocks in the bluegene.conf and
    jobs happen to be running on blocks not in the new config.
Morris Jette's avatar
Morris Jette committed
 -- Many cosmetic modifications to eliminate warning message from GCC version
    4.6 compiler.
 -- Fix for sview reservation tab when finding correct reservation.
 -- Fix for handling QOS limits per user on a reconfig of the slurmctld.
 -- Do not treat the absence of a gres.conf file as a fatal error on systems
    configured with GRES, but set GRES counts to zero.
 -- BLUEGENE - Update correctly the state in the reason of a block if an
    admin sets the state to error.
 -- BLUEGENE - handle reason of blocks in error more correctly between
    restarts of the slurmctld.
 -- BLUEGENE - Fix minor potential memory leak when setting block error reason.
 -- BLUEGENE - Fix if running in Static/Overlap mode and full system block
    is in an error state, won't deny jobs.
 -- Fix for accounting where your cluster isn't numbered in counting order
    (i.e. 1-9,0 instead of 0-9).  The bug would cause 'sacct -N nodename' to
    not give correct results on these systems.
 -- Fix to GRES allocation logic when resources are associated with specific
    CPUs on a node. Patch from Steve Trofinoff, CSCS.
 -- Fix bugs in sched/backfill with respect to QOS reservation support and job
    time limits. Patch from Alejandro Lucero Palau (Barcelona Supercomputer
    Center).
 -- BGQ - fix to set up corner correctly for sub block jobs.
 -- Major re-write of the CPU Management User and Administrator Guide (web
    page) by Martin Perry, Bull.
 -- BLUEGENE - If removing blocks from system that once existed cleanup of old
    block happens correctly now.
 -- Prevent slurmctld crashing with configuration of MaxMemPerCPU=0.
jette's avatar
jette committed
 -- Prevent job hold by operator or account coordinator of his own job from
    being an Administrator Hold rather than User Hold by default.
Morris Jette's avatar
Morris Jette committed
 -- Cray - Fix for srun.pl parsing to avoid adding spaces between option and
    argument (e.g. "-N2" parsed properly without changing to "-N 2").
 -- Major updates to cgroup support by Mark Grondona (LLNL) and Matthieu
    Hautreux (CEA) and Sam Lang. Fixes timing problems with respect to the
    task_epilog. Allows cgroup mount point to be configurable. Added new
    configuration parameters MaxRAMPercent and MaxSwapPercent. Allow cgroup
    configuration parameters that are precentages to be floating point.
 -- Fixed issue where sview wasn't displaying correct nice value for jobs.
 -- Fixed issue where sview wasn't displaying correct min memory per node/cpu
    value for jobs.
 -- Disable some SelectTypeParameters for select/linear that aren't compatible.
 -- Move slurm_select_init to proper place to avoid loading multiple select
    plugins in the slurmd.
 -- BGQ - Include runjob_plugin.so in the bluegene rpm.
 -- Report correct job "Reason" if needed nodes are DOWN, DRAINED, or
    NOT_RESPONDING, "Resources" rather than "PartitionNodeLimit".
 -- BLUEGENE - Fixed issues with running on a sub-midplane system.
 -- Added some missing calls to allow older versions of SLURM to talk to newer.
 -- BGQ - allow steps to be ran.
 -- Do not attempt to run HeathCheckProgram on powered down nodes. Patch from
    Ramiro Alba, Centre Tecnològic de Tranferència de Calor, Spain.
* Changes in SLURM 2.3.0-2
==========================
 -- Fix for memory issue inside sview.
 -- Fix issue where if a job was pending and the slurmctld was restarted a
    variable wasn't initialized in the job structure making it so that job
    wouldn't run.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.3.0
========================
 -- BLUEGENE - make sure we only set the jobinfo_select start_loc on a job
    when we are on a small block, not a regular one.
 -- BGQ - fix issue where not copying the correct amount of memory.
 -- BLUEGENE - fix clean start if jobs were running when the slurmctld was
    shutdown and then the system size changed.  This would probably only happen
    if you were emulating a system.
 -- Fix sview for calling a cray system from a non-cray system to get the
    correct geometry of the system.
 -- BLUEGENE - fix to correctly import pervious version of block state file.
 -- BLUEGENE - handle loading better when doing a clean start with static
    blocks.
 -- Add sinfo format and sort option "%n" for NodeHostName and "%o" for
    NodeAddr.
 -- If a job is deferred due to partition limits, then re-test those limits
    after a partition is modified. Patch from Don Lipari.
 -- Fix bug which would crash slurmcld if job's owner (not root) tries to clear
    a job's licenses by setting value to "".
 -- Cosmetic fix for printing out debug info in the priority plugin.
 -- In sview when switching from a bluegene machine to a regular linux cluster
    and vice versa the node->base partition lists will be displayed if setup
    in your .slurm/sviewrc file.
 -- BLUEGENE - Fix for creating full system static block on a BGQ system.
 -- BLUEGENE - Fix deadlock issue if toggling between Dynamic and Static block
    allocation with jobs running on blocks that don't exist in the static
    setup.
 -- BLUEGENE - Modify code to only give HTC states to BGP systems and not
    allow them on Q systems.
 -- BLUEGENE - Make it possible for an admin to define multiple dimension
    conn_types in a block definition.
 -- BGQ - Alter tools to output multiple dimensional conn_type.
Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.3.0.rc2
============================
 -- With sched/wiki or sched/wiki2 (Maui or Moab scheduler), insure that a
    requeued job's priority is reset to zero.
 -- BLUEGENE - fix to run steps correctly in a BGL/P emulated system.
 -- Fixed issue where if there was a network issue between the slurmctld and
    the DBD where both remained up but were disconnected the slurmctld would
    get registered again with the DBD.
 -- Fixed issue where if the DBD connection from the ctld goes away because of
    a POLLERR the dbd_fail callback is called.
 -- BLUEGENE - Fix to smap command-line mode display.
 -- Change in GRES behavior for job steps: A job step's default generic
    resource allocation will be set to that of the job. If a job step's --gres
    value is set to "none" then none of the generic resources which have been
    allocated to the job will be allocated to the job step.
 -- Add srun environment value of SLURM_STEP_GRES to set default --gres value
    for a job step.
 -- Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
    to avoid thrashing slurmd daemon.
 -- Cray - Fix to make nodes state in accounting consistent with state set by
    ALPS.
 -- Cray - A node DOWN to ALPS will be marked DOWN to SLURM only after reaching
    SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This
    change makes behavior makes SLURM handling of the node DOWN state more
    consistent with ALPS. This change effects only Cray systems.
 -- Cray - Fix to work with 4.0.* instead of just 4.0.0
 -- Cray - Modify srun/aprun wrapper to map --exclusive to -F exclusive and
    --share to -F share. Note this does not consider the partition's Shared
    configuration, so it is an imperfect mapping of options.
 -- BLUEGENE - Added notice in the print config to tell if you are emulated
    or not.
 -- BLUEGENE - Fix job step scalability issue with large task count.
 -- BGQ - Improved c-node selection when asked for a sub-block job that
    cannot fit into the available shape.
 -- BLUEGENE - Modify "scontrol show step" to show  I/O nodes (BGL and BGP) or
    c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to
    "BP_List=".
 -- Code cleanup on step request to get the correct select_jobinfo.
 -- Memory leak fixed for rolling up accounting with down clusters.
 -- BGQ - fix issue where if first job step is the entire block and then the
    next parallel step is ran on a sub block, SLURM won't over subscribe cnodes.
 -- Treat duplicate switch name in topology.conf as fatal error. Patch from Rod
    Schultz, Bull
 -- Minor update to documentation describing the AllowGroups option for a
    partition in the slurm.conf.
 -- Fix problem with _job_create() when not using qos's.  It makes
    _job_create() consistent with similar logic in select_nodes().
 -- GrpCPURunMins in a QOS flushed out.
 -- Fix for squeue -t "CONFIGURING" to actually work.
 -- CRAY - Add cray.conf parameter of SyncTimeout, maximum time to defer job
    scheduling if SLURM node or job state are out of synchronization with ALPS.
 -- If salloc was run as interactive, with job control, reset the foreground
    process group of the terminal to the process group of the parent pid before
    exiting. Patch from Don Albert, Bull.
 -- BGQ - set up the corner of a sub block correctly based on a relative
    position in the block instead of absolute.
 -- BGQ - make sure the recently added select_jobinfo of a step launch request
    isn't sent to the slurmd where environment variables would be overwritten
    incorrectly.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.3.0.rc1
============================
 -- NOTE THERE HAVE BEEN NEW FIELDS ADDED TO THE JOB AND PARTITION STATE SAVE
    FILES AND RPCS. PENDING AND RUNNING JOBS WILL BE LOST WHEN UPGRADING FROM
    EARLIER VERSION 2.3 PRE-RELEASES AND RPCS WILL NOT WORK WITH EARLIER
    VERSIONS.
 -- select/cray: Add support for Accelerator information including model and
    memory options.
 -- Cray systems: Add support to suspend/resume salloc command to insure that
    aprun does not get initiated when the job is suspended. Processes suspended
    and resumed are determined by using process group ID and parent process ID,
    so some processes may be missed. Since salloc runs as a normal user, it's
    ability to identify processes associated with a job is limited.
 -- Cray systems: Modify smap and sview to display all nodes even if multiple
    nodes exist at each coordinate.
 -- Improve efficiency of select/linear plugin with topology/tree plugin
    configured, Patch by Andriy Grytsenko (Massive Solutions Limited).
 -- For front-end architectures on which job steps are run (emulated Cray and
    BlueGene systems only), fix bug that would free memory still in use.
 -- Add squeue support to display a job's license information. Patch by Andy
    Roosen (University of Deleware).
 -- Add flag to the select APIs for job suspend/resume indicating if the action
    is for gang scheduling or an explicit job suspend/resume by the user. Only
    an explicit job suspend/resume will reset the job's priority and make
    resources exclusively held by the job available to other jobs.
 -- Fix possible invalid memory reference in sched/backfill. Patch by Andriy
    Grytsenko (Massive Solutions Limited).
 -- Add select_jobinfo to the task launch RPC. Based upon patch by Andriy
    Grytsenko (Massive Solutions Limited).
 -- Add DefMemPerCPU/Node and MaxMemPerCPU/Node to partition configuration.
    This improves flexibility when gang scheduling only specific partitions.
 -- Added new enums to print out when a job is held by a QOS instead of an
    association limit.
 -- Enhancements to sched/backfill performance with select/cons_res plugin.
    Patch from Bjørn-Helge Mevik, University of Oslo.
 -- Correct job run time reported by smap for suspended jobs.
 -- Improve job preemption logic to avoid preempting more jobs than needed.
Morris Jette's avatar
Morris Jette committed
 -- Add contribs/arrayrun tool providing support for job arrays. Contributed by
    Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
    and manual file editing is required.
 -- When suspending a job, wait 2 seconds instead of 1 second between sending
    SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
    1 second delay.
 -- Add support for managing devices based upon Linux cgroup container. Based
    upon patch by Yiannis Georgiou, Bull.
 -- Fix memory buffering bug if a AllowGroups parameter of a partition has 100
    or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
Morris Jette's avatar
Morris Jette committed
 -- Fix bug in generic resource tracking of gres associated with specific CPUs.
    Resources were being over-allocated.
 -- On systems with front-end nodes (IBM BlueGene and Cray) limit batch jobs to
    only one CPU of these shared resources.
 -- Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
    interactive (salloc) and batch jobs if the job has a memory limit. For Cray
    systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
    memory limit.
 -- Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
    Patch from Rod Schultz, Bull.
 -- Restore node configuration information (CPUs, memory, etc.) for powered
    down when slurmctld daemon restarts rather than waiting for the node to be
    restored to service and getting the information from the node (NOTE: Only
    relevent if FastSchedule=0).
 -- For Cray systems with the srun2aprun wrapper, rebuild the srun man page
    identifying the srun optioins which are valid on that system.
 -- BlueGene: Permit users to specify a separate connection type for each
    dimension (e.g. "--conn-type=torus,mesh,torus").
 -- Add the ability for a user to limit the number of leaf switches in a job's
    allocation using the --switch option of salloc, sbatch and srun. There is
    also a new SchedulerParameters value of max_switch_wait, which a SLURM
    administrator can used to set a maximum job delay and prevent a user job
    from blocking lower priority jobs for too long. Based on work by Rod
    Schultz, Bull.
* Changes in SLURM 2.3.0.pre6
=============================
 -- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE CONFIGURATION RESPONSE RPC
    AS SHOWN BY "SCONTROL SHOW CONFIG". THIS FUNCTION WILL ONLY WORK WHEN THE
    SERVER AND CLIENT ARE BOTH RUNNING SLURM VERSION 2.3.0.pre6
Moe Jette's avatar
Moe Jette committed
 -- Modify job expansion logic to support licenses, generic resources, and
    currently running job steps.
 -- Added an rpath if using the --with-munge option of configure.
 -- Add support for multiple sets of DEFAULT node, partition, and frontend
    specifications in slurm.conf so that default values can be changed mulitple
    times as the configuration file is read.
 -- BLUEGENE - Improved logic to place small blocks in free space before freeing
    larger blocks.
 -- Add optional argument to srun's --kill-on-bad-exit so that user can set
    its value to zero and override a SLURM configuration parameter of
    KillOnBadExit.
 -- Fix bug in GraceTime support for preempted jobs that prevented proper
    operation when more than one job was being preempted. Based on patch from
    Bill Brophy, Bull.
 -- Fix for running sview from a non-bluegene cluster to a bluegene cluster.
    Regression from pre5.
Moe Jette's avatar
Moe Jette committed
 -- If job's TMPDIR environment is not set or is not usable, reset to "/tmp".
    Patch from Andriy Grytsenko (Massive Solutions Limited).
 -- Remove logic for defunct RPC: DBD_GET_JOBS.
 -- Propagate DebugFlag changes by scontrol to the plugins.
 -- Improve accuracy of REQUEST_JOB_WILL_RUN start time with respect to higher
    priority pending jobs.
 -- Add -R/--reservation option to squeue command as a job filter.
 -- Add scancel support for --clusters option.
 -- Note that scontrol and sprio can only support a single cluster at one time.
 -- Add support to salloc for a new environment variable SALLOC_KILL_CMD.
 -- Add scontrol ability to increment or decrement a job or step time limit.
 -- Add support for SLURM_TIME_FORMAT environment variable to control time
    stamp output format. Work by Gerrit Renker, CSCS.
 -- Fix error handling in mvapich plugin that could cause srun to enter an
    infinite loop under rare circumstances.
Moe Jette's avatar
Moe Jette committed
 -- Add support for multiple task plugins. Patch from Andriy Grytsenko (Massive
    Solutions Limited).
 -- Addition of per-user node/cpu limits for QOS's. Patch from Aaron Knister,
    UMBC.
 -- Fix logic for multiple job resize operations.
 -- BLUEGENE - many fixes to make things work correctly on a L/P system.
 -- Fix bug in layout of job step with --nodelist option plus node count. Old
    code could allocate too few nodes.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.3.0.pre5
=============================
 -- NOTE: THERE HAS BEEN A NEW FIELD ADDED TO THE JOB STATE FILE. UPGRADES FROM
    VERSION 2.3.0-PRE4 WILL RESULT IN LOST JOBS UNLESS THE "orig_dependency"
    FIELD IS REMOVED FROM JOB STATE SAVE/RESTORE LOGIC. ON CRAY SYSTEMS A NEW
    "confirm_cookie" FIELD WAS ADDED AND HAS THE SAME EFFECT OF DISABLING JOB
    STATE RESTORE.
 -- BLUEGENE - Improve speed of start up when removing blocks at the beginning.
 -- Correct init.d/slurm status to have non-zero exit code if ANY Slurm
    damon that should be running on the node is not running. Patch from Rod
    Schulz, Bull.
 -- Improve accuracy of response to "srun --test-only jobid=#".
 -- Fix bug in front-end configurations which reports job_cnt_comp underflow
    errors after slurmctld restarts.
 -- Eliminate "error from _trigger_slurmctld_event in backup.c" due to lack of
    event triggers.
 -- Fix logic in BackupController to properly recover front-end node state and
    avoid purging active jobs.
 -- Added man pages to html pages and the new cpu_management.html page.
    Submitted by Martin Perry / Rod Schultz, Bull.
 -- Job dependency information will only show the currently active dependencies
    rather than the original dependencies. From Dan Rusak, Bull.
 -- Add RPCs to get the SPANK environment variables from the slurmctld daemon.
    Patch from Andrej N. Gritsenko.
 -- Updated plugins/task/cgroup/task_cgroup_cpuset.c to support newer
    HWLOC_API_VERSION.
 -- Do not build select/bluegene plugin if C++ compiler is not installed.
 -- Add new configure option --with-srun2aprun to build an srun command
    which is a wrapper over Cray's aprun command and supports many srun
    options. Without this option, the srun command will advise the user
    to use the aprun command.
 -- Change container ID supported by proctrack plugin from 32-bit to 64-bit.
 -- Added contribs/cray/libalps_test_programs.tar.gz with tools to validate
    SLURM's logic used to support Cray systems.
 -- Create RPM for srun command that is a wrapper for the Cray/ALPS aprun
    command. Dependent upon .rpmmacros parameter of "%_with_srun2aprun".
 -- Add configuration parameter MaxStepCount to limit effect of bad batch
    scripts.
auble1's avatar
auble1 committed
 -- Moving to github
 -- Fix for handling a 2.3 system talking to a 2.2 slurmctld.
 -- Add contribs/lua/job_submit.license.lua script. Update job_submit and Lua
    related documentation.
 -- Test if _make_batch_script() is called with a NULL script.
 -- Increase hostlist support from 24k to 64k nodes.
 -- Renamed the Accounting Storage database's "DerivedExitString" job field to
    "Comment".  Provided backward compatible support for "DerivedExitString" in
    the sacctmgr tool.
 -- Added the ability to save the job's comment field to the Accounting
    Storage db (to the formerly named, "DerivedExitString" job field).  This
    behavior is enabled by a new slurm.conf parameter:
    AccountingStoreJobComment.
 -- Test if _make_batch_script() is called with a NULL script.
 -- Increase hostlist support from 24k to 64k nodes.
 -- Fix srun to handle signals correctly when waiting for a step creation.
 -- Preserve the last job ID across slurmctld daemon restarts even if the job
    state file can not be fully recovered.
 -- Made the hostlist functions be able to arbitrarily handle any size
    dimension no matter what the size of the cluster is in dimensions.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.3.0.pre4
=============================
 -- Add GraceTime to Partition and QOS data structures. Preempted jobs will be
    given this time interval before termination. Work by Bill Brophy, Bull.
 -- Add the ability for scontrol and sview to modify slurmctld DebugFlags
    values.
 -- Various Cray-specific patches:
    - Fix a bug in distinguishing XT from XE.
    - Avoids problems with empty nodenames on Cray.
    - Check whether ALPS is hanging on to nodes, which happens if ALPS has not
      yet cleaned up the node partition.
    - Stops select/cray from clobbering node_ptr->reason.
    - Perform 'safe' release of ALPS reservations using inventory and apkill.
    - Compile-time sanity check for the apbasil and apkill files.
    - Changes error handling in do_basil_release() (called by
      select_g_job_fini()).
    - Warn that salloc --no-shell option is not supported on Cray systems.
 -- Add a reservation flag of "License_Only". If set, then jobs using the
    reservation may use the licenses associated with it plus any compute nodes.
    Otherwise the job is limited to the compute nodes associated with the
    reservation.
 -- Change slurm.conf node configuration parameter from "Procs" to "CPUs".
    Both parameters will be supported for now.
 -- BLUEGENE - fix for when user requests only midplane names with no count at
    job submission time to process the node count correctly.
 -- Fix job step resource allocation problem when both node and tasks counts
    are specified. New logic selects nodes with larger CPU counts as needed.
 -- BGQ - make it so srun wraps runjob (still under construction, but works
    for most cases)
 -- Permit a job's QOS and Comment field to both change in a single RPC. This
    was previously disabled since Moab stored the QOS within the Comment field.
 -- Add support for jobs to expand in size. Submit additional batch job with
Moe Jette's avatar
Moe Jette committed
    the option "--dependency=expand:<jobid>". See web page "faq.html#job_size"
    for details. Restrictions to be removed in the future.
 -- Added --with-alps-emulation to configure, and also an optional cray.conf
    to setup alps location and database information.
 -- Modify PMI data types from 16-bits to 32-bits in order to support MPICH2
    jobs with more than 65,536 tasks. Patch from Hongjia Cao, NUDT.
 -- Set slurmd's soft process CPU limit equal to it's hard limit and notify the
    user if the limit is not infinite.
 -- Added proctrack/cgroup and task/cgroup plugins from Matthieu Hautreux, CEA.
 -- Fix slurmctld restart logic that could leave nodes in UNKNOWN state for a
    longer time than necessary after restart.
* Changes in SLURM 2.3.0.pre3
=============================
 -- BGQ - Appears to work correctly in emulation mode, no sub blocks just yet.
 -- Minor typos fixed
 -- Various bug fixes for Cray systems.
 -- Fix bug that when setting a compute node to idle state, it was failing to
    set the systems up_node_bitmap.
 -- BLUEGENE - code reorder
 -- BLUEGENE - Now only one select plugin for all Bluegene systems.
 -- Modify srun to set the SLURM_JOB_NAME environment variable when srun is
    used to create a new job allocation. Not set when srun is used to create a
    job step within an existing job allocation.
 -- Modify init.d/slurm script to start multiple slurmd daemons per compute
    node if so configured. Patch from Matthieu Hautreux, CEA.
 -- Change license data structure counters from uint16_t to uint32_t to support
    larger license counts.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.3.0.pre2
=============================
 -- Log a job's requeue or cancellation due to preemption to that job's stderr:
    "*** JOB 65547 CANCELLED AT 2011-01-21T12:59:33 DUE TO PREEMPTION ***".
 -- Added new job termination state of JOB_PREEMPTED, "PR" or "PREEMPTED" to
    indicate job termination was due to preemption.
 -- Optimize advanced reservations resource selection for computer topology.
    The logic has been added to select/linear and select/cons_res, but will
    not be enabled until the other select plugins are modified.
Moe Jette's avatar
Moe Jette committed
 -- Remove checkpoint/xlch plugin.
 -- Disable deletion of partitions that have unfinished jobs (pending,
    running or suspended states). Patch from Martin Perry, BULL.
 -- In sview, disable the sorting of node records by name at startup for
    clusters over 1000 nodes. Users can enable this by selecting the "Name"
    tab. This change dramatically improves scalability of sview.
 -- Report error when trying to change a node's state from scontrol for Cray
 -- Do not attempt to read the batch script for non-batch jobs. This patch
    eliminates some inappropriate error messages.
 -- Preserve NodeHostName when reordering nodes due to system topology.
 -- On Cray/ALPS systems  do node inventory before scheduling jobs.
 -- Disable some salloc options on Cray systems.
 -- Disable scontrol's wait_job command on Cray systems.
 -- Disable srun command on native Cray/ALPS systems.
 -- Updated configure option "--enable-cray-emulation" (still under
    development) to emulate a cray XT/XE system, and auto-detect a real
    Cray XT/XE systems (removed no longer needed --enable-cray configure
    option).  Building on native Cray systems requires the
    cray-MySQL-devel-enterprise rpm and expat XML parser library/headers.
* Changes in SLURM 2.3.0.pre1
=============================
 -- Added that when a slurmctld closes the connection to the database it's
    registered host and port are removed.
 -- Added flag to slurmdbd.conf TrackSlurmctldDown where if set will mark idle
    resources as down on a cluster when a slurmctld disconnects or is no
    longer reachable.
 -- Added support for more than one front-end node to run slurmd on
    architectures where the slurmd does not execute on the compute nodes
Moe Jette's avatar
Moe Jette committed
    (e.g. BlueGene). New configuration parameters FrontendNode and FrontendAddr
    added. See "man slurm.conf" for more information.
 -- With the scontrol show job command when using the --details option, show
    a batch job's script.
 -- Add ability to create reservations or partitions and submit batch jobs
    using sview. Also add the ability to delete reservations and partitions.
Moe Jette's avatar
Moe Jette committed
 -- Added new configuration parameter MaxJobId. Once reached, restart job ID
    values at FirstJobId.
 -- When restarting slurmctld with priority/basic, increment all job priorities
    so the highest job priority becomes TOP_PRIORITY.
Moe Jette's avatar
Moe Jette committed
* Changes in SLURM 2.2.8