NEWS 336 KB
Newer Older
 -- BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure
 -- GRES - Correct tracking of specific resources used after slurmctld restart.
    Counts would previously go negative as jobs terminate and decrement from
    a base value of zero.
 -- Fix for priority/multifactor2 plugin to not assert when configured with
    --enable-debug.
 -- Select/cons_res - If the job request specified --ntasks-per-socket and the
    allocation using is cores, then pack the tasks onto the sockets up to the
    specified value.
 -- BGQ - If a cnode goes into an 'error' state and the block containing the
    cnode does not have a job running on it do not resume the block.
 -- BGQ - Handle blocks that don't free themselves in a reasonable time better.
 -- BGQ - Fix for signaling steps when allocation ends before step.
 -- Fix for backfill scheduling logic with job preemption; starts more jobs.
 -- xcgroup - remove bugs with EINTR management in write calls
 -- jobacct_gather - fix total values to not always == the max values.
 -- Fix for handling node registration messages from older versions without
    energy data.
 -- BGQ - Allow user to request full dimensional mesh.
 -- sdiag command - Correction to jobs started value reported.
 -- Prevent slurmctld assert when invalid change to reservation with running
    jobs is made.
 -- BGQ - If signal is NODE_FAIL allow forward even if job is completing
    and timeout in the runjob_mux trying to send in this situation.
 -- BGQ - More robust checking for correct node, task, and ntasks-per-node
    options in srun, and push that logic to salloc and sbatch.
 -- GRES topology bug in core selection logic fixed.
 -- Fix to handle init.d script for querying status and not return 1 on
    success.
* Changes in SLURM 2.5.3
========================
 -- Gres/gpu plugin - If no GPUs requested, set CUDA_VISIBLE_DEVICES=NoDevFiles.
    This bug was introduced in 2.5.2 for the case where a GPU count was
    configured, but without device files.
 -- task/affinity plugin - Fix bug in CPU masks for some processors.
 -- Modify sacct command to get format from SACCT_FORMAT environment variable.
 -- BGQ - Changed order of library inclusions and fixed incorrect declaration
    to compile correctly on newer compilers
 -- Fix for not building sview if glib exists on a system but not the gtk libs.
 -- BGQ - Fix for handling a job cleanup on a small block if the job has long
    since left the system.
 -- Fix race condition in job dependency logic which can result in invalid
    memory reference.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.5.2
========================
 -- Fix advanced reservation recovery logic when upgrading from version 2.4.
 -- BLUEGENE - fix for QOS/Association node limits.
 -- Add missing "safe" flag from print of AccountStorageEnforce option.
 -- Fix logic to optimize GRES topology with respect to allocated CPUs.
 -- Add job_submit/all_partitions plugin to set a job's default partition
    to ALL available partitions in the cluster.
 -- Modify switch/nrt logic to permit build without libnrt.so library.
 -- Handle srun task launch failure without duplicate error messages or abort.
 -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet
    added to the QOS list.
 -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs.
 -- Fix for job request of multiple partitions in which some partitions lack
    nodes with required features.
 -- Permit a job to use a QOS they do not have access to if an administrator
    manually set the job's QOS (previously the job would be rejected).
 -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU,
    slurm.NO_VAL, etc.
 -- Fix topology/tree logic when nodes defined in slurm.conf get re-ordered.
 -- In select/cons_res, correct logic to allocate whole sockets to jobs. Work
    by Magnus Jonsson, Umea University.
 -- In select/cons_res, correct logic when job removed from only some nodes.
 -- Avoid apparent kernel bug in 2.6.32 which apparently is solved in
    at least 3.5.0.  This avoids a stack overflow when running jobs on
    more than 120k nodes.
 -- BLUEGENE - If we made a block that isn't runnable because of a overlapping
    block, destroy it correctly.
 -- Switch/nrt - Dynamically load libnrt.so from within the plugin as needed.
    This eliminates the need for libnrt.so on the head node.
 -- BLUEGENE - Fix in reservation logic that could cause abort.
* Changes in SLURM 2.5.1
========================
 -- Correction to hostlist sorting for hostnames that contain two numeric
    components and the first numeric component has various sizes (e.g.
    "rack9blade1" should come before "rack10blade1")
 -- BGQ - Only poll on initialized blocks instead of calling getBlocks on
    each block independently.
 -- Fix of task/affinity plugin logic for Power7 processors having hyper-
    threading disabled (cpu mask has gaps).
Chris Reed's avatar
Chris Reed committed
 -- Fix of job priority ordering with sched/builtin and priority/multifactor.
Chris Read's avatar
Chris Read committed
    Patch from Chris Read.
 -- CRAY - Fix for setting up the aprun for a large job (+2000 nodes).
 -- Fix for race condition related to compute node boot resulting in node being
    set down with reason of "Node <name> unexpectedly rebooted"
 -- RAPL - Fix for handling errors when opening msr files.
 -- BGQ - Fix for salloc/sbatch to do the correct allocation when asking for
    -N1 -n#.
 -- BGQ - in emulation make it so we can pretend to run large jobs (>64k nodes)
 -- BLUEGENE - Correct method to update conn_type of a job.
 -- BLUEGENE - Fix issue with preemption when needing to preempt multiple jobs
    to make one job run.
 -- Fixed issue where if an srun dies inside of an allocation abnormally it
    would of also killed the allocation.
 -- FRONTEND - fixed issue where if a systems nodes weren't defined in the
    slurm.conf with NodeAddr's signals going to a step could be handled
    incorrectly.
 -- If sched/backfill starts a job with a QOS having NO_RESERVE and not job
    time limit, start it with the partition time limit (or one year if the
    partition has no time limit) rather than NO_VAL (140 year time limit);
 -- Alter hostlist logic to allocate large grid dynamically instead of on
    stack.
 -- Change RPC version checks to support version 2.5 slurmctld with version 2.4
    slurmd daemons.
Morris Jette's avatar
Morris Jette committed
 -- Correct core reservation logic for use with select/serial plugin.
 -- Exit scontrol command on stdin EOF.
 -- Disable job --exclusive option with select/serial plugin.
* Changes in SLURM 2.5.0
========================
 -- Add DenyOnLimit flag for QOS to deny jobs at submission time if they
    request resources that reach a 'Max' limit.
 -- Permit SlurmUser or operator to change QOS of non-pending jobs (e.g.
    running jobs).
 -- BGQ - move initial poll to beginning of realtime interaction, which will
    also cause it to run if the realtime server ever goes away.
* Changes in SLURM 2.5.0-rc2
============================
 -- Modify sbcast logic to survive slurmd daemon restart while file a
    transmission is in progress.
 -- Add retry logic to munge encode/decode calls. This is needed if the munge
    deamon is under very heavy load (e.g. with 1000 slurmd daemons per compute
    node).
 -- Add launch and acct_gather_energy plugins to RPMs.
 -- Restore support for srun "--mpi=list" option.
 -- CRAY - Introduce step accounting for a Cray.
 -- Modify srun to abandon I/O 60 seconds after the last task ends. Otherwise
    an aborted slurmstepd can cause the srun process to hang indefinitely.
 -- ENERGY - RAPL - alter code to close open files (and only open them once
    where needed)
 -- If the PrologSlurmctld fails, then requeue the job an indefinite number
    of times instead of only one time.
* Changes in SLURM 2.5.0-rc1
============================
 -- Added Prolog and Epilog Guide (web page). Based upon work by Jason Sollom,
    Cray Inc. and used by permission.
 -- Restore gang scheduling functionality. Preemptor was not being scheduled.
    Fix for bugzilla #3.
Morris Jette's avatar
Morris Jette committed
 -- Add "cpu_load" to node information. Populate CPULOAD in node information
    reported to Moab cluster manager.
Morris Jette's avatar
Morris Jette committed
 -- Preempt jobs only when insufficient idle resources exist to start job,
    regardless of the node weight.
 -- Added priority/multifactor2 plugin based upon ticket distribution system.
    Work by Janne Blomqvist, Aalto University.
 -- Add SLURM_NODELIST to environment variables available to Prolog and Epilog.
 -- Permit reservations to allow or deny access by account and/or user.
 -- Add ReconfigFlags value of KeepPartState. See "man slurm.conf" for details.
 -- Modify the task/cgroup plugin adding a task_pre_launch_priv function and
    move slurmstepd outside of the step's cgroup. Work by Matthieu Hautreux.
 -- Intel MIC processor support added using gres/mic plugin. BIG thanks to
    Olli-Pekka Lehto, CSC-IT Center for Science Ltd.
 -- Accounting - Change empty jobacctinfo structs to not actually be used
    instead of putting 0's into the database we put NO_VALS and have sacct
    figure out jobacct_gather wasn't used.
 -- Cray - Prevent calling basil_confirm more than once per job using a flag.
 -- Fix bug with topology/tree and job with min-max node count. Now try to
    get max node count rather than minimizing leaf switches used.
 -- Add AccountingStorageEnforce=safe option to provide method to avoid jobs
    launching that wouldn't be able to run to completion because of a
    GrpCPUMins limit.
 -- Add support for RFC 5424 timestamps in logfiles. Disable with configuration
    option of "--disable-rfc5424time". By Janne Blomqvist, Aalto University.
 -- CRAY - Replace srun.pl with launch/aprun plugin to use srun to wrap the
    aprun process instead of a perl script.
Danny Auble's avatar
Danny Auble committed
 -- srun - Rename --runjob-opts to --launcher-opts to be used on systems other
    than BGQ.
 -- Added new DebugFlags - Energy for AcctGatherEnergy plugins.
 -- start deprecation of sacct --dump --fdump
 -- BGQ - added --verbose=OFF when srun --quiet is used
 -- Added acct_gather_energy/rapl plugin to record power consumption by job.
    Work by Yiannis Georgiou, Martin Perry, et. al., Bull

* Changes in SLURM 2.5.0.pre3
=============================
 -- Add Google search to all web pages.
 -- Add sinfo -T option to print reservation information. Work by Bill Brophy,
    Bull.
 -- Force slurmd exit after 2 minute wait, even if threads are hung.
 -- Change node_req field in struct job_resources from 8 to 32 bits so we can
    run more than 256 jobs per node.
 -- sched/backfill: Improve accuracy of expected job start with respect to
    reservations.
 -- sinfo partition field size will be set the the length of the longest
    partition name by default.
 -- Make it so the parse_time will return a valid 0 if given epoch time and
    set errno == ESLURM_INVALID_TIME_VALUE on error instead.
 -- Correct srun --no-alloc logic when node count exceeds node list or task
    task count is not a multiple of the node count. Work by Hongjia Cao, NUDT.
 -- Completed integration with IBM Parallel Environment including POE and IBM's
    NRT switch library.

* Changes in SLURM 2.5.0.pre2
=============================
 -- When running with multiple slurmd daemons per node, enable specifying a
    range of ports on a single line of the node configuration in slurm.conf.
 -- Add reservation flag of Part_Nodes to allocate all nodes in a partition to
    a reservation and automatically change the reservation when nodes are
    added to or removed from the reservation. Based upon work by
    Bill Brophy, Bull.
 -- Add support for advanced reservation for specific cores rather than whole
    nodes. Current limiations: homogeneous cluster, nodes idle when reservation
    created, and no more than one reservation per node. Code is still under
    development. Work by Alejandro Lucero Palau, et. al, BSC.
 -- Add DebugFlag of Switch to log switch plugin details.
 -- Correct job node_cnt value in job completion plugin when job fails due to
    down node. Previously was too low by one.
 -- Add new srun option --cpu-freq to enable user control over the job's CPU
    frequency and thus it's power consumption. NOTE: cpu frequency is not
    currently preserved for jobs being suspended and later resumed. Work by
    Don Albert, Bull.
 -- Add node configuration information about "boards" and optimize task
    placement on minimum number of boards. Work by Rod Schultz, Bull.
* Changes in SLURM 2.5.0.pre1
=============================
 -- Add new output to "scontrol show configuration" of LicensesUsed. Output is
    "name:used/total"
 -- Changed jobacct_gather plugin infrastructure to be cleaner and easier to
    maintain.
 -- Change license option count separator from "*" to ":" for consistency with
    the gres option (e.g. "--licenses=foo:2 --gres=gpu:2"). The "*" will still
    be accepted, but is no longer documented.
 -- Permit more than 100 jobs to be scheduled per node (new limit is 250
Danny Auble's avatar
Danny Auble committed
 -- Restructure of srun code to allow outside programs to utilize existing
    logic.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.6
========================
 -- Correct WillRun authentication logic when issued for non-job owner.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - fix memory leak
 -- BGQ - Fix to check block for action 'D' if it also has nodes in error.
* Changes in SLURM 2.4.5
========================
 -- Cray - On job kill requeust, send SIGCONT, SIGTERM, wait KillWait and send
    SIGKILL. Previously just sent SIGKILL to tasks.
 -- BGQ - Fix issue when running srun outside of an allocation and only
    specifying the number of tasks and not the number of nodes.
 -- BGQ - validate correct ntasks_per_node
 -- BGQ - when srun -Q is given make runjob be quiet
 -- Modify use of OOM (out of memory protection) for Linux 2.6.36 kernel
    or later. NOTE: If you were setting the environment variable
    SLURMSTEPD_OOM_ADJ=-17, it should be set to -1000 for Linux 2.6.36 kernel
    or later.
 -- BGQ - Fix job step timeout actually happen when done from within an
    allocation.
 -- Reset node MAINT state flag when a reservation's nodes or flags change.
 -- Accounting - Fix issue where QOS usage was being zeroed out on a
    slurmctld restart.
 -- BGQ - Add 64 tasks per node as a valid option for srun when used with
    overcommit.
 -- BLUEGENE - With Dynamic layout mode - Fix issue where if a larger block
    was already in error and isn't deallocating and underlying hardware goes
    bad one could get overlapping blocks in error making the code assert when
    a new job request comes in.
 -- BGQ - handle pending actions on a block better when trying to deallocate it.
 -- Accounting - Fixed issue where if nodenames have changed on a system and
    you query against that with -N and -E you will get all jobs during that
    time instead of only the ones running on -N.
Danny Auble's avatar
Danny Auble committed
 -- BGP - Fix for HTC mode
 -- Accounting - If a job start message fails to the SlurmDBD reset the db_inx
    so it gets sent again.  This isn't a major problem since the start will
    happen when the job ends, but this does make things cleaner.
 -- If an salloc is waiting for an allocation to happen and is canceled by the
    user mark the state canceled instead of completed.
 -- Fix issue in accounting if a user puts a '\' in their job name.
 -- Accounting - Fix for if asking for users or accounts that were deleted
    with associations get the deleted associations as well.
 -- BGQ - Handle shared blocks that need to be removed and have jobs running
    on them.  This should only happen in extreme conditions.
 -- Fix inconsistency for hostlists that have more than 1 range.
 -- BGQ - Add mutex around recovery for the Real Time server to avoid hitting
    DB2 so hard.
 -- BGQ - If an allocation exists on a block that has a 'D' action on it fail
    job on future step creation attempts.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.4
========================
 -- BGQ - minor fix to make build work in emulated mode.
 -- BGQ - Fix if large block goes into error and the next highest priority jobs
    are planning on using the block.  Previously it would fail those jobs
    erroneously.
 -- BGQ - Fix issue when a cnode going to an error (not SoftwareError) state
    with a job running or trying to run on it.
 -- Execute slurm_spank_job_epilog when there is no system Epilog configured.
 -- Fix for srun --test-only to work correctly with timelimits
 -- BGQ - If a job goes away while still trying to free it up in the
    database, and the job is running on a small block make sure we free up
    the correct node count.
 -- BGQ - Logic added to make sure a job has finished on a block before it is
    purged from the system if its front-end node goes down.
 -- Modify strigger so that a filter option of "--user=0" is supported.
 -- Correct --mem-per-cpu logic for core or socket allocations with multiple
    threads per core.
 -- Fix for older < glibc 2.4 systems to use euidaccess() instead of eaccess().
 -- BLUEGENE - Do not alter a pending job's node count when changing it's
 -- BGQ - Add functionality to make it so we track the actions on a block.
    This is needed for when a free request is added to a block but there are
    jobs finishing up so we don't start new jobs on the block since they will
    fail on start.
 -- BGQ - Fixed InactiveLimit to work correctly to avoid scenarios where a
    user's pending allocation was started with srun and then for some reason
    the slurmctld was brought down and while it was down the srun was removed.
 -- Fixed InactiveLimit math to work correctly
 -- BGQ - Add logic to make it so blocks can't use a midplane with a nodeboard
    in error for passthrough.
 -- BGQ - Make it so if a nodeboard goes in error any block using that midplane
    for passthrough gets removed on a dynamic system.
 -- BGQ - Fix for printing realtime server debug correctly.
 -- BGQ - Cleaner handling of cnode failures when reported through the runjob
    interface instead of through the normal method.
 -- smap - spread node information across multiple lines for larger systems.
 -- Cray - Defer salloc until after PrologSlurmctld completes.
 -- Correction to slurmdbd communications failure handling logic, incorrect
    error codes returned in some cases.
* Changes in SLURM 2.4.3
========================
 -- Accounting - Fix so complete 32 bit numbers can be put in for a priority.
 -- cgroups - fix if initial directory is non-existent SLURM creates it
    correctly.  Before the errno wasn't being checked correctly
 -- BGQ - fixed srun when only requesting a task count and not a node count
    to operate the same way salloc or sbatch did and assign a task per cpu
    by default instead of task per node.
 -- Fix salloc --gid to work correctly.  Reported by Brian Gilmer
 -- BGQ - fix smap to set the correct default MloaderImage
 -- BLUEGENE - updated documentation.
 -- Close the batch job's environment file when it contains no data to avoid
    leaking file descriptors.
 -- Fix sbcast's credential to last till the end of a job instead of the
    previous 20 minute time limit.  The previous behavior would fail for
    large files 20 minutes into the transfer.
 -- Return ESLURM_NODES_BUSY rather than ESLURM_NODE_NOT_AVAIL error on job
    submit when required nodes are up, but completing a job or in exclusive
    job allocation.
Danny Auble's avatar
Danny Auble committed
 -- Add HWLOC_FLAGS so linking to libslurm works correctly
 -- BGQ - If using backfill and a shared block is running at least one job
    and a job comes through backfill and can fit on the block without ending
    jobs don't set an end_time for the running jobs since they don't need to
    end to start the job.
 -- Initialize bind_verbose when using task/cgroup.
 -- BGQ - Fix for handling backfill much better when sharing blocks.
 -- BGQ - Fix for making small blocks on first pass if not sharing blocks.
 -- BLUEGENE - Remove force of default conn_type instead of leaving NAV
    when none are requested.  The Block allocator sets it up temporarily so
    this isn't needed.
 -- BLUEGENE - Fix deadlock issue when dealing with bad hardware if using
    static blocks.
 -- Fix to mysql plugin during rollup to only query suspended table when jobs
    reported some suspended time.
 -- Fix compile with glibc 2.16 (Kacper Kowalik)
 -- BGQ - fix for deadlock where a block has error on it and all jobs
    running on it are preemptable by scheduling job.
 -- proctrack/cgroup: Exclude internal threads from "scontrol list pids".
    Patch from Matthieu Hautreux, CEA.
 -- Memory leak fixed for select/linear when preempting jobs.
 -- Fix if updating begin time of a job to update the eligible time in
    accounting as well.
 -- BGQ - make it so you can signal steps when signaling the job allocation.
 -- BGQ - Remove extra overhead if a large block has many cnode failures.
 -- Priority/Multifactor - Fix issue with age factor when a job is estimated to
    start in the future but is able to run now.
 -- CRAY - update to work with ALPS 5.1
 -- BGQ - Handle issue of speed and mutexes when polling instead of using the
    realtime server.
 -- BGQ - Fix minor sorting issue with sview when sorting by midplanes.
 -- Accounting - Fix for handling per user max node/cpus limits on a QOS
    correctly for current job.
 -- Update documentation for -/+= when updating a reservation's
    users/accounts/flags
 -- Update pam module to work if using aliases on nodes instead of actual
    host names.
 -- Correction to task layout logic in select/cons_res for job with minimum
    and maximum node count.
 -- BGQ - Put final poll after realtime comes back into service to avoid
    having the realtime server go down over and over again while waiting
    for the poll to finish.
 -- task/cgroup/memory - ensure that ConstrainSwapSpace=no is correctly
    handled. Work by Matthieu Hautreux, CEA.
 -- CRAY - Fix for sacct -N option to work correctly
 -- CRAY - Update documentation to describe installation from rpm instead
    or previous piecemeal method.
 -- Fix sacct to work with QOS' that have previously been deleted.
 -- Added all available limits to the output of sacctmgr list qos
* Changes in SLURM 2.4.2
========================
 -- BLUEGENE - Correct potential deadlock issue when hardware goes bad and
    there are jobs running on that hardware.
 -- If job is submitted to more than one partition, it's partition pointer can
    be set to an invalid value. This can result in the count of CPUs allocated
    on a node being bad, resulting in over- or under-allocation of its CPUs.
    Patch by Carles Fenoy, BSC.
 -- Fix bug in task layout with select/cons_res plugin and --ntasks-per-node
    option. Patch by Martin Perry, Bull.
 -- BLUEGENE - remove race condition where if a block is removed while waiting
    for a job to finish on it the number of unused cpus wasn't updated
    correctly.
 -- BGQ - make sure we have a valid block when creating or finishing a step
    allocation.
 -- BLUEGENE - If a large block (> 1 midplane) is in error and underlying
    hardware is marked bad remove the larger block and create a block over
    just the bad hardware making the other hardware available to run on.
 -- BLUEGENE - Handle job completion correctly if an admin removes a block
    where other blocks on an overlapping midplane are running jobs.
 -- BLUEGENE - correctly remove running jobs when freeing a block.
 -- BGQ - correct logic to place multiple (< 1 midplane) steps inside a
    multi midplane block allocation.
 -- BGQ - Make it possible for a multi midplane allocation to run on more
    than 1 midplane but not the entire allocation.
 -- BGL - Fix for syncing users on block from Tim Wickberg
 -- Fix initialization of protocol_version for some messages to make sure it
    is always set when sending or receiving a message.
 -- Reset backfilled job counter only when explicitly cleared using scontrol.
    Patch from Alejandro Lucero Palau, BSC.
 -- BLUEGENE - Fix for handling blocks when a larger block will not free and
    while it is attempting to free underlying hardware is marked in error
    making small blocks overlapping with the freeing block.  This only
    applies to dynamic layout mode.
 -- Cray and BlueGene - Do not treat lack of usable front-end nodes when
    slurmctld deamon starts as a fatal error. Also preserve correct front-end
    node for jobs when there is more than one front-end node and the slurmctld
    daemon restarts.
 -- Correct parsing of srun/sbatch input/output/error file names so that only
    the name "none" is mapped to /dev/null and not any file name starting
    with "none" (e.g. "none.o").
 -- BGQ - added version string to the load of the runjob_mux plugin to verify
    the current plugin has been loaded when using runjob_mux_refresh_config
 -- CGROUPS - Use system mount/umount function calls instead of doing fork
    exec of mount/umount from Janne Blomqvist.
 -- BLUEGENE - correct start time setup when no jobs are blocking the way
    from Mark Nelson
 -- Fixed sacct --state=S query to return information about suspended jobs
    current or in the past.
 -- FRONTEND - Made error warning more apparent if a frontend node isn't
    configured correctly.
 -- BGQ - update documentation about runjob_mux_refresh_config which works
    correctly as of IBM driver V1R1M1 efix 008.
* Changes in SLURM 2.4.1
========================
 -- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved
    correctly when transitioning.  This also applies for 2.4.0 -> 2.4.1, no
    state will be lost. (Thanks to Carles Fenoy)

Danny Auble's avatar
Danny Auble committed
* Changes in SLURM 2.4.0
========================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.
 -- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug
 -- Modify scontrol to require "-dd" option to report batch job's script. Patch
    from Don Albert, Bull.
 -- Modify SchedulerParamters option to match documentation: "bf_res="
    changed to "bf_resolution=". Patch from Rod Schultz, Bull.
 -- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
 -- In etc/init.d/slurm move check for scontrol after sourcing
    /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
 -- Fix in scheduling logic that can delay jobs with min/max node counts.
 -- BGQ - fix issue where if a step uses the entire allocation and then
    the next step in the allocation only uses part of the allocation it gets
    the correct cnodes.
 -- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
    function didn't always work correctly.
 -- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
    to make a larger small block and are running with sub-blocks.
 -- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
 -- BGQ - When using an old IBM driver cnodes that go into error because of
    a job kill timeout aren't always reported to the system.  This is now
    handled by the runjob_mux plugin.
 -- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
 -- Improve memory consumption on step layouts with high task count.
 -- BGQ - quiter debug when the real time server comes back but there are
    still messages we find when we poll but haven't given it back to the real
    time yet.
 -- BGQ - fix for if a request comes in smaller than the smallest block and
    we must use a small block instead of a shared midplane block.
 -- Fix issues on large jobs (>64k tasks) to have the correct counter type when
    packing the step layout structure.
 -- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
    but not node count the node count is correctly figured out.
 -- Move logic to always use the 1st alphanumeric node as the batch host for
    batch jobs.
 -- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
    same time a block is destroyed and that block just happens to be the
    smallest overlapping block over the bad hardware.
 -- Fix bug when querying accounting looking for a job node size.
 -- BLUEGENE - fix possible race condition if cleaning up a block and the
    removal of the job on the block failed.
 -- BLUEGENE - fix issue if a cable was in an error state make it so we can
    check if a block is still makable if the cable wasn't in error.
 -- Put nodes names in alphabetic order in node table.
 -- If preempted job should have a grace time and preempt mode is not cancel
    but job is going to be canceled because it is interactive or other reason
    it now receives the grace time.
 -- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
    in order for the runjob_mux to run correctly.
 -- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.
Danny Auble's avatar
Danny Auble committed

* Changes in SLURM 2.4.0.rc1
Morris Jette's avatar
Morris Jette committed
=============================
Morris Jette's avatar
Morris Jette committed
 -- Improve task binding logic by making fuller use of HWLOC library,
    especially with respect to Opteron 6000 series processors. Work contributed
    by Komoto Masahiro.
 -- Add new configuration parameter PriorityFlags, based upon work by
    Carles Fenoy (Barcelona Supercomputer Center).
 -- Modify the step completion RPC between slurmd and slurmstepd in order to
    eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
 -- Change the owner of slurmctld and slurmdbd log files to the appropriate
    user. Without this change the files will be created by and owned by the
    user starting the daemons (likely user root).
 -- Reorganize the slurmstepd logic in order to better support NFS and
    Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA.
 -- Fix bug in allocating GRES that are associated with specific CPUs. In some
    cases the code allocated first available GRES to job instead of allocating
    GRES accessible to the specific CPUs allocated to the job.
 -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
    and job epilog/prolog: slurm_spank_job_{prolog,epilog}
 -- spank: Add spank_option_getopt() function to api
 -- Change resolution of switch wait time from minutes to seconds.
 -- Added CrpCPUMins to the output of sshare -l for those using hard limit
    accounting.  Work contributed by Mark Nelson.
 -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
    additional resources for newly launched tasks. Contributed by Hongjia Cao,
    NUDT.
 -- BGQ - fixed issue where if a user asked for a specific node count and more
    tasks than possible without overcommit the request would be allowed on more
    nodes than requested.
 -- Add support for new SchedulerParameters of bf_max_job_user, maximum number
    of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
    University of Oslo.
 -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
    larger than midplane jobs.
 -- Added cpu_run_min to the output of sshare --long.  Work contributed by
    Mark Nelson.
 -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
 -- Add sinfo output format option of "%R" for partition name without "*"
    appended for default partition.
 -- Cray - Add support for zero compute note resource allocation to run batch
    script on front-end node with no ALPS reservation. Useful for pre- or post-
    processing.
 -- Support for cyclic distribution of cpus in task/cgroup plugin from Martin
    Perry, Bull.
 -- GrpMEM limit for QOSes and associations added Patch from Bjørn-Helge Mevik,
    University of Oslo.
 -- Various performance improvements for up to 500% higher throughput depending
    upon configuration. Work supported by the Oak Ridge National Laboratory
    Extreme Scale Systems Center.
 -- Added jobacct_gather/cgroup plugin.  It is not advised to use this in
    production as it isn't currently complete and doesn't provide an equivalent
    substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre4
=============================
 -- Add logic to cache GPU file information (bitmap index mapping to device
    file number) in the slurmd daemon and transfer that information to the
    slurmstepd whenever a job step is initiated. This is needed to set the
    appropriate CUDA_VISIBLE_DEVICES environment variable value when the
    devices are not in strict numeric order (e.g. some GPUs are skipped).
    Based upon work by Nicolas Bigaouette.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - Remove ability to make a sub-block with a geometry with one or more
    of it's dimensions of length 3.  There is a limitation in the IBM I/O
    subsystem that is problematic with multiple sub-blocks with a dimension
    of length 3, so we will disallow them to be able to be created.  This
    mean you if you ask the system for an allocation of 12 c-nodes you will
    be given 16.  If this is ever fix in BGQ you can remove this patch.
 -- BLUEGENE - Better handling blocks that go into error state or deallocate
    while jobs are running on them.
 -- BGQ - fix for handling mix of steps running at same time some of which
    are full allocation jobs, and others that are smaller.
 -- BGQ - fix for core dump after running multiple sub-block jobs on static
    blocks.
 -- BGQ - fixed sync issue where if a job finishes in SLURM but not in mmcs
    for a long time after the SLURM job has been flushed from the system
    we don't have to worry about rebooting the block to sync the system.
 -- BGQ - In scontrol/sview node counts are now displayed with
    CnodeCount/CnodeErrCount so to point out there are cnodes in an error state
    on the block.  Draining the block and having it reboot when all jobs are
    gone will clear up the cnodes in Software Failure.
 -- Change default SchedulerParameters max_switch_wait field value from 60 to
    300 seconds.
 -- BGQ - catch errors from the kill option of the runjob client.
 -- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is
    gone.  Previously it had a timelimit which has proven to not be the right
    thing.
 -- FRONTEND - fix issue where if a compute node was in a down state and
    an admin updates the node to idle/resume the compute nodes will go
    instantly to idle instead of idle* which means no response.
Danny Auble's avatar
Danny Auble committed
 -- Fix regression in 2.4.0.pre3 where number of submitted jobs limit wasn't
    being honored for QOS.
 -- Cray - Enable logging of BASIL communications with environment variables.
    Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
    or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log".
    Patch from Steve Tronfinoff, CSCS.
 -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
    mark front end node down.
 -- FRONTEND - don't down a front end node if you have an epilog error
 -- BLUEGENE - if a job has an epilog error don't down the midplane it was
    running on.
 -- BGQ - added new DebugFlag (NoRealTime) for only printing debug from
    state change while the realtime server is running.
 -- Fix multi-cluster mode with sview starting on a non-bluegene cluster going
    to a bluegene cluster.
 -- BLUEGENE - ability to show Rack Midplane name of midplanes in sview and
    scontrol.
* Changes in SLURM 2.4.0.pre3
=============================
 -- Let a job be submitted even if it exceeds a QOS limit. Job will be left
    in a pending state until the QOS limit or job parameters change. Patch by
    Phil Eckert, LLNL.
Morris Jette's avatar
Morris Jette committed
 -- Add sacct support for the option "--name". Work by Yuri D'Elia, Center for
    Biomedicine, EURAC Research, Italy.
Danny Auble's avatar
Danny Auble committed
 -- BGQ - handle preemption.
 -- Add an srun shepard process to cancel a job and/or step of the srun process
    is killed abnormally (e.g. SIGKILL).
 -- BGQ - handle deadlock issue when a nodeboard goes into an error state.
 -- BGQ - more thorough handling of blocks with multiple jobs running on them.
 -- Fix man2html process to compile in the build directory instead of the
    source dir.
 -- Behavior of srun --multi-prog modified so that any program arguments
    specified on the command line will be appended to the program arguments
    specified in the program configuration file.
jette's avatar
jette committed
 -- Add new command, sdiag, which reports a variety of job scheduling
    statistics. Based upon work by Alejandro Lucero Palau, BSC.
 -- BLUEGENE - Added DefaultConnType to the bluegene.conf file.  This makes it
    so you can specify any connection type you would like (TORUS or MESH) as
    the default in dynamic mode.  Previously it always defaulted to TORUS.
 -- Made squeue -n and -w options more consistent with salloc, sbatch, srun,
    and scancel. Patch by Don Lipari, LLNL.
 -- Have sacctmgr remove user records when no associations exist for that user.
jette's avatar
jette committed
 -- Several header file changes for clean build with NetBSD. Patches from
    Aleksej Saushev.
 -- Fix for possible deadlock in accounting logic: Avoid calling
    jobacct_gather_g_getinfo() until there is data to read from the socket.
 -- Fix race condition that could generate "job_cnt_comp underflow" errors on
    front-end architectures.
 -- BGQ - Fix issue where a system with missing cables could cause core dump.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.4.0.pre2
=============================
 -- CRAY - Add support for GPU memory allocation using SLURM GRES (Generic
    RESource) support. Work by Steve Trofinoff, CSCS.
 -- Add support for job allocations with multiple job constraint counts. For
    example: salloc -C "[rack1*2&rack2*4]" ... will allocate the job 2 nodes
    from rack1 and 4 nodes from rack2. Support for only a single constraint
    name been added to job step support.
 -- BGQ - Remove old method for marking cnodes down.
 -- BGQ - Remove BGP images from view in sview.
 -- BGQ - print out failed cnodes in scontrol show nodes.
 -- BGQ - Add srun option of "--runjob-opts" to pass options to the runjob
    command.
 -- FRONTEND - handle step launch failure better.
 -- BGQ - Added a mutex to protect the now changing ba_system pointers.
 -- BGQ - added new functionality for sub-block allocations - no preemption
    for this yet though.
 -- Add --name option to squeue to filter output by job name. Patch from Yuri
    D'Elia.
 -- BGQ - Added linking to runjob client libary which gives support to totalview
    to use srun instead of runjob.
 -- Add numeric range checks to scontrol update options. Patch from Phil
    Eckert, LLNL.
 -- Add ReconfigFlags configuration option to control actions of "scontrol
    reconfig". Patch from Don Albert, Bull.
 -- BGQ - handle reboots with multiple jobs running on a block.
 -- BGQ - Add message handler thread to forward signals to runjob process.
* Changes in SLURM 2.4.0.pre1
=============================
 -- BGQ - use the ba_geo_tables to figure out the blocks instead of the old
    algorithm.  The improves timing in the worst cases and simplifies the code
    greatly.
 -- BLUEGENE - Change to output tools labels from BP to Midplane
    (i.e. BP List -> MidplaneList).
 -- BLUEGENE - read MPs and BPs from the bluegene.conf
 -- Modify srun's SIGINT handling logic timer (two SIGINTs within one second) to
    be based microsecond rather than second timer.
 -- Modify advance reservation to accept multiple specific block sizes rather
    than a single node count.
 -- Permit administrator to change a job's QOS to any value without validating
    the job's owner has permission to use that QOS. Based upon patch by Phil
    Eckert (LLNL).
 -- Add trigger flag for a permanent trigger. The trigger will NOT be purged
    after an event occurs, but only when explicitly deleted.
 -- Interpret a reservation with Nodes=ALL and a Partition specification as
    reserving all nodes within the specified partition rather than all nodes
    on the system. Based upon patch by Phil Eckert (LLNL).
 -- Add the ability to reboot all compute nodes after they become idle. The
    RebootProgram configuration parameter must be set and an authorized user
    must execute the command "scontrol reboot_nodes". Patch from Andriy
    Grytsenko (Massive Solutions Limited).
 -- Modify slurmdbd.conf parsing to accept DebugLevel strings (quiet, fatal,
    info, etc.) in addition to numeric values. The parsing of slurm.conf was
    modified in the same fashion for SlurmctldDebug and SlurmdDebug values.
    The output of sview and "scontrol show config" was also modified to report
    those values as strings rather than numeric values.
 -- Changed default value of StateSaveLocation configuration parameter from
    /tmp to /var/spool.
 -- Prevent associations from being deleted if it has any jobs in running,
    pending or suspended state. Previous code prevented this only for running
    jobs.
 -- If a job can not run due to QOS or association limits, then do not cancel
    the job, but leave it pending in a system held state (priority = 1). The
    job will run when its limits or the QOS/association limits change. Based
    upon a patch by Phil Ekcert (LLNL).
 -- BGQ - Added logic to keep track of cnodes in an error state inside of a
    booted block.
 -- Added the ability to update a node's NodeAddr and NodeHostName with
    scontrol. Also enable setting a node's state to "future" using scontrol.
 -- Add a node state flag of CLOUD and save/restore NodeAddr and NodeHostName
    information for nodes with a flag of CLOUD.
 -- Cray: Add support for job reservations with node IDs that are not in
    numeric order. Fix for Bugzilla #5.
 -- BGQ - Fix issue with smap -R
 -- Fix association limit support for jobs queued for multiple partitions.
 -- BLUEGENE - fix issue for sub-midplane systems to create a full system
    block correctly.
 -- BLUEGENE - Added option to the bluegene.conf to tell you are running on
    a sub midplane system.
 -- Added the UserID of the user issuing the RPC to the job_submit/lua
    functions.
 -- Fixed issue where if a job ended with ESLURMD_UID_NOT_FOUND and
    ESLURMD_GID_NOT_FOUND where slurm would be a little over zealous
    in treating missing a GID or UID as a fatal error.
 -- If job time limit exceeds partition maximum, but job's minimum time limit
    does not, set job's time limit to partition maximum at allocation time.
* Changes in SLURM 2.3.6
========================
 -- Fix DefMemPerCPU for partition definitions.
 -- Fix to create a reservation with licenses and no nodes.
 -- Fix issue with assoc_mgr if a bad state file is given and the database
    isn't up at the time the slurmctld starts, not running the
    priority/multifactor plugin, and then the database is started up later.
 -- Gres: If a gres has a count of one and an associated file then when doing
    a reconfiguration, the node's bitmap was not cleared resulting in an
    underflow upon job termination or removal from scheduling matrix by the
    backfill scheduler.
 -- Fix race condition in job dependency logic which can result in invalid
    memory reference.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.3.5
========================
 -- Improve support for overlapping advanced reservations. Patch from
    Bill Brophy, Bull.
 -- Modify Makefiles for support of Debian hardening flags. Patch from
    Simon Ruderich.
 -- CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
    node that is DOWN in ALPS as DOWN in SLURM).
 -- Fixed the setting of SLURM_SUBMIT_DIR for jobs submitted by Moab (BZ#1467).
    Patch by Don Lipari, LLNL.
 -- Correction to init.d/slurmdbd exit code for status option. Patch by Bill
    Brophy, Bull.
 -- When the optional max_time is not specified for --switches=count, the site
    max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
    Based on patch from Rod Schultz.
 -- Fix bug in select/cons_res plugin when used with topology/tree and a node
    range count in job allocation request.
 -- Fixed moab_2_slurmdb.pl script to correctly work for end records.
 -- Add support for new SchedulerParameters of max_depend_depth defining the
    maximum number of jobs to test for circular dependencies (i.e. job A waits
    for job B to start and job B waits for job A to start). Default value is
    10 jobs.
 -- Fix potential race condition if MinJobAge is very low (i.e. 1) and using
    slurmdbd accounting and running large amounts of jobs (>50 sec).  Job
    information could be corrupted before it had a chance to reach the DBD.
 -- Fix state restore of job limit set from admin value for min_cpus.
 -- Fix clearing of limit values if an admin removes the limit for max cpus
    and time limit where it was previously set by an admin.
 -- Fix issue where log message is more than 256 chars and then has a format.
 -- Fix sched/wiki2 to support job account name, gres, partition name, wckey,
    or working directory that contains "#" (a job record separator). Also fix
    for wckey or working directory that contains a double quote '\"'.
 -- CRAY - fix for handling memory requests from user for an allocation.
 -- Add support for switches parameter to the job_submit/lua plugin. Work by
    Par Andersson, NSC.
 -- Fix to job preemption logic to preempt multiple jobs at the same time.
 -- Fix minor issue where uid and gid were switched in sview for submitting
    batch jobs.
 -- Fix possible illegal memory reference in slurmctld for job step with
    relative option. Work by Matthieu Hautreux (CEA).
 -- Reset priority of system held jobs when dependency is satisfied. Work by
    Don Lipari, LLNL.
Morris Jette's avatar
Morris Jette committed
* Changes in SLURM 2.3.4
========================
 -- Set DEFAULT flag in partition structure when slurmctld reads the
    configuration file. Patch from Rémi Palancher.
 -- Fix for possible deadlock in accounting logic: Avoid calling
    jobacct_gather_g_getinfo() until there is data to read from the socket.
 -- Fix typo in accounting when using reservations. Patch from Alejandro
    Lucero Palau.
 -- Fix to the multifactor priority plugin to calculate effective usage earlier
    to give a correct priority on the first decay cycle after a restart of the
    slurmctld. Patch from Martin Perry, Bull.
 -- Permit user root to run a job step for any job as any user. Patch from
    Didier Gazen, Laboratoire d'Aerologie.
 -- BLUEGENE - fix for not allowing jobs if all midplanes are drained and all
    blocks are in an error state.
 -- Avoid slurmctld abort due to bad pointer when setting an advanced
    reservation MAINT flag if it contains no nodes (only licenses).
Morris Jette's avatar
Morris Jette committed
 -- Fix bug when requeued batch job is scheduled to run on a different node
    zero, but attemts job launch on old node zero.
 -- Fix bug in step task distribution when nodes are not configured in numeric
    order. Patch from Hongjia Cao, NUDT.
 -- Fix for srun allocating running within existing allocation with --exclude
    option and --nnodes count small enough to remove more nodes. Patch from
    Phil Eckert, LLNL.
 -- Work around to handle certain combinations of glibc/kernel
    (i.e. glibc-2.14/Linux-3.1) to correctly open the pty of the slurmstepd
    as the job user. Patch from Mark Grondona, LLNL.
 -- Modify linking to include "-ldl" only when needed. Patch from Aleksej
    Saushev.
 -- Fix smap regression to display nodes that are drained or down correctly.
 -- Several bug fixes and performance improvements with related to batch
    scripts containing very large numbers of arguments. Patches from Par
    Andersson, NSC.
 -- Fixed extremely hard to reproduce threading issue in assoc_mgr.
 -- Correct "scontrol show daemons" output if there is more than one
    ControlMachine configured.
 -- Add node read lock where needed in slurmctld/agent code.
Morris Jette's avatar
Morris Jette committed
 -- Added test for LUA library named "liblua5.1.so.0" in addition to
    "liblua5.1.so" as needed by Debian. Patch by Remi Palancher.
 -- Added partition default_time field to job_submit LUA plugin. Patch by
    Remi Palancher.
 -- Fix bug in cray/srun wrapper stdin/out/err file handling.
 -- In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
    option is used.
 -- BLUEGENE - fix issue where if a small block was in error it could hold up
    the queue when trying to place a larger than midplane job.
 -- CRAY - ignore all interactive nodes and jobs on interactive nodes.
 -- Add new job state reason of "FrontEndDown" which applies only to Cray and
    IBM BlueGene systems.
 -- Cray - Enable configure option of "--enable-salloc-background" to permit
    the srun and salloc commands to be executed in the background. This does
    NOT remove the ALPS limitation that only one job reservation can be created
    for each Linux session ID.
 -- Cray - For srun wrapper when creating a job allocation, set the default job
    name to the executable file's name.
 -- Add support for Cray ALPS 5.0.0
 -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
    mark front end node down.
 -- FRONTEND - don't down a front end node if you have an epilog error.
 -- Cray - fix for if a frontend slurmd was started after the slurmctld had
    already pinged it on startup the unresponding flag would be removed from
    the frontend node.
 -- Cray - Fix issue on smap not displaying grid correctly.
 -- Fixed minor memory leak in sview.
Morris Jette's avatar
Morris Jette committed

* Changes in SLURM 2.3.3
========================
 -- Fix task/cgroup plugin error when used with GRES. Patch by Alexander
    Bersenev (Institute of Mathematics and Mechanics, Russia).
 -- Permit pending job exceeding a partition limit to run if its QOS flag is
    modified to permit the partition limit to be exceeded. Patch from Bill
    Brophy, Bull.
 -- BLUEGENE - Fixed preemption issue.
 -- sacct search for jobs using filtering was ignoring wckey filter.
 -- Fixed issue with QOS preemption when adding new QOS.
 -- Fixed issue with comment field being used in a job finishing before it
    starts in accounting.
 -- Add slashes in front of derived exit code when modifying a job.
 -- Handle numeric suffix of "T" for terabyte units. Patch from John Thiltges,
    University of Nebraska-Lincoln.
 -- Prevent resetting a held job's priority when updating other job parameters.
    Patch from Alejandro Lucero Palau, BSC.
Morris Jette's avatar
Morris Jette committed
 -- Improve logic to import a user's environment. Needed with --get-user-env
    option used with Moab. Patch from Mark Grondona, LLNL.
 -- Fix bug in sview layout if node count less than configured grid_x_width.
 -- Modify PAM module to prefer to use SLURM library with same major release
    number that it was built with.
 -- Permit gres count configuration of zero.
 -- Fix race condition where sbcast command can result in deadlock of slurmd
    daemon. Patch by Don Albert, Bull.
 -- Fix bug in srun --multi-prog configuration file to avoid printing duplicate
    record error when "*" is used at the end of the file for the task ID.
 -- Let operators see reservation data even if "PrivateData=reservations" flag
    is set in slurm.conf. Patch from Don Albert, Bull.
 -- Added new sbatch option "--export-file" as needed for latest version of
    Moab. Patch from Phil Eckert, LLNL.
 -- Fix for sacct printing CPUTime(RAW) where the the is greater than a 32 bit
    number.
 -- Fix bug in --switch option with topology resulting in bad switch count use.
    Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).
 -- Fix PrivateFlags bug when using Priority Multifactor plugin.  If using sprio
    all jobs would be returned even if the flag was set.
    Patch from Bill Brophy, Bull.
 -- Fix for possible invalid memory reference in slurmctld in job dependency
    logic. Patch from Carles Fenoy (Barcelona Supercomputer Center).
* Changes in SLURM 2.3.2
========================
 -- Add configure option of "--without-rpath" which builds SLURM tools without
    the rpath option, which will work if Munge and BlueGene libraries are in
    the default library search path and make system updates easier.
 -- Fixed issue where if a job ended with ESLURMD_UID_NOT_FOUND and
    ESLURMD_GID_NOT_FOUND where slurm would be a little over zealous
    in treating missing a GID or UID as a fatal error.
 -- Backfill scheduling - Add SchedulerParameters configuration parameter of
    "bf_res" to control the resolution in the backfill scheduler's data about
    when jobs begin and end. Default value is 60 seconds (used to be 1 second).
 -- Cray - Remove the "family" specification from the GPU reservation request.
 -- Updated set_oomadj.c, replacing deprecated oom_adj reference with
    oom_score_adj
 -- Fix resource allocation bug, generic resources allocation was ignoring the
    job's ntasks_per_node and cpus_per_task parameters. Patch from Carles
    Fenoy, BSC.
 -- Avoid orphan job step if slurmctld is down when a job step completes.
 -- Fix Lua link order, patch from Pär Andersson, NSC.
 -- Set SLURM_CPUS_PER_TASK=1 when user specifies --cpus-per-task=1.
 -- Fix for fatal error managing GRES. Patch by Carles Fenoy, BSC.
 -- Fixed race condition when using the DBD in accounting where if a job
    wasn't started at the time the eligible message was sent but started
    before the db_index was returned information like start time would be lost.
 -- Fix issue in accounting where normalized shares could be updated
    incorrectly when getting fairshare from the parent.
 -- Fixed if not enforcing associations  but want QOS support for a default
    qos on the cluster to fill that in correctly.
 -- Fix in select/cons_res for "fatal: cons_res: sync loop not progressing"
    with some configurations and job option combinations.
 -- BLUEGNE - Fixed issue with handling HTC modes and rebooting.
* Changes in SLURM 2.3.1
========================
 -- Do not remove the backup slurmctld's pid file when it assumes control, only
    when it actually shuts down. Patch from Andriy Grytsenko (Massive Solutions
    Limited).
 -- Avoid clearing a job's reason from JobHeldAdmin or JobHeldUser when it is
    otherwise updated using scontrol or sview commands. Patch based upon work
    by Phil Eckert (LLNL).
 -- BLUEGENE - Fix for if changing the defined blocks in the bluegene.conf and
    jobs happen to be running on blocks not in the new config.
Morris Jette's avatar
Morris Jette committed
 -- Many cosmetic modifications to eliminate warning message from GCC version
    4.6 compiler.
 -- Fix for sview reservation tab when finding correct reservation.
 -- Fix for handling QOS limits per user on a reconfig of the slurmctld.
 -- Do not treat the absence of a gres.conf file as a fatal error on systems
    configured with GRES, but set GRES counts to zero.
 -- BLUEGENE - Update correctly the state in the reason of a block if an
    admin sets the state to error.
 -- BLUEGENE - handle reason of blocks in error more correctly between
    restarts of the slurmctld.
 -- BLUEGENE - Fix minor potential memory leak when setting block error reason.
 -- BLUEGENE - Fix if running in Static/Overlap mode and full system block
    is in an error state, won't deny jobs.
 -- Fix for accounting where your cluster isn't numbered in counting order
    (i.e. 1-9,0 instead of 0-9).  The bug would cause 'sacct -N nodename' to
    not give correct results on these systems.
 -- Fix to GRES allocation logic when resources are associated with specific
    CPUs on a node. Patch from Steve Trofinoff, CSCS.
 -- Fix bugs in sched/backfill with respect to QOS reservation support and job
    time limits. Patch from Alejandro Lucero Palau (Barcelona Supercomputer
    Center).
 -- BGQ - fix to set up corner correctly for sub block jobs.
 -- Major re-write of the CPU Management User and Administrator Guide (web
    page) by Martin Perry, Bull.
 -- BLUEGENE - If removing blocks from system that once existed cleanup of old
    block happens correctly now.
 -- Prevent slurmctld crashing with configuration of MaxMemPerCPU=0.
jette's avatar
jette committed
 -- Prevent job hold by operator or account coordinator of his own job from
    being an Administrator Hold rather than User Hold by default.
Morris Jette's avatar
Morris Jette committed
 -- Cray - Fix for srun.pl parsing to avoid adding spaces between option and
    argument (e.g. "-N2" parsed properly without changing to "-N 2").
 -- Major updates to cgroup support by Mark Grondona (LLNL) and Matthieu
    Hautreux (CEA) and Sam Lang. Fixes timing problems with respect to the
    task_epilog. Allows cgroup mount point to be configurable. Added new
    configuration parameters MaxRAMPercent and MaxSwapPercent. Allow cgroup
    configuration parameters that are precentages to be floating point.
 -- Fixed issue where sview wasn't displaying correct nice value for jobs.
 -- Fixed issue where sview wasn't displaying correct min memory per node/cpu
    value for jobs.
 -- Disable some SelectTypeParameters for select/linear that aren't compatible.
 -- Move slurm_select_init to proper place to avoid loading multiple select
    plugins in the slurmd.
 -- BGQ - Include runjob_plugin.so in the bluegene rpm.
 -- Report correct job "Reason" if needed nodes are DOWN, DRAINED, or
    NOT_RESPONDING, "Resources" rather than "PartitionNodeLimit".
 -- BLUEGENE - Fixed issues with running on a sub-midplane system.
 -- Added some missing calls to allow older versions of SLURM to talk to newer.
 -- BGQ - allow steps to be ran.
 -- Do not attempt to run HeathCheckProgram on powered down nodes. Patch from
    Ramiro Alba, Centre Tecnològic de Tranferència de Calor, Spain.
* Changes in SLURM 2.3.0-2
==========================