Commits · 07ef9066ee9aa801a4c921e6f20653a375bcd27e · Manuel G. Marciani / ces_slurm_simulator

08 Mar, 2013 4 commits

Start NEWS for v2.5.5 · ad3caaae
Morris Jette authored Mar 08, 2013

ad3caaae

GRES topology bug in core selection logic fixed. · a59ba9bc

jette authored Mar 07, 2013

This problem would effect systems in which specific GRES are associated
with specific CPUs.
One possible result is the CPUs identified as usable could be inappropriate
and job would be held when trying to layout out the tasks on CPUs (all
done as part of the job allocation process).
The other problem is that if multiple GRES are linked to specific CPUs,
there was a CPU bitmap OR which should have been an AND, resulting in
some CPUs being identified as usable, but not available to all GRES.

a59ba9bc

Fix to handle init.d script for querying status and not return 1 on · 01e855a9
Danny Auble authored Mar 08, 2013
```
success
```
01e855a9
CRAY - Initial commit to add support for Basil 1.3 to the mix. · 07a6aec0
Stephen Trofinoff authored Mar 07, 2013

07a6aec0

07 Mar, 2013 1 commit

GRES topology bug in core selection logic fixed. · 07eb5d24

jette authored Mar 07, 2013

This problem would effect systems in which specific GRES are associated
with specific CPUs.
One possible result is the CPUs identified as usable could be inappropriate
and job would be held when trying to layout out the tasks on CPUs (all
done as part of the job allocation process).
The other problem is that if multiple GRES are linked to specific CPUs,
there was a CPU bitmap OR which should have been an AND, resulting in
some CPUs being identified as usable, but not available to all GRES.

07eb5d24

06 Mar, 2013 2 commits
- BGQ - More robust checking for correct node, task, and ntasks-per-node · 3419a62c
  Danny Auble authored Mar 05, 2013
```
options in srun, and push that logic to salloc and sbatch.

Bug 201
```
  3419a62c
- BGQ - If signal is NODE_FAIL allow forward even if job is completing · de8232d8
  Danny Auble authored Mar 04, 2013
```
and timeout in the runjob_mux trying to send in this situation.

Bug 223
```
  de8232d8
04 Mar, 2013 4 commits

Added support to purge reservation records from accounting. · b990f1fa
Danny Auble authored Feb 28, 2013

b990f1fa

Permit backfill scheduler to continue executing after reliquishing locks · d08b015c

Magnus Jonsson authored Mar 04, 2013

Jobs are not backfilled due to the fact that backfill does not finish the complete backlog of jobs in the queue before it's interrupted and starts all over again. We sometimes have lots of jobs in the queue of various sizes and users and even with idle nodes short job will not start because of this.  I have made a patch for backfill with a configuration option (bf_continue) to let backfill continue.

d08b015c

Prevent slurmctld assert after invalid reservation update attempt · c97129d6

Morris Jette authored Mar 04, 2013

The original reservation data structure is deleted and it's backup
added to the reservation list, but jobs can retain a pointer to the
original (now invalid) reservation data structure.
Bug 250

c97129d6

sdiag command - Correction to jobs started value reported. · c3b4f76f
Alejandro Lucero Palau authored Mar 04, 2013

c3b4f76f

01 Mar, 2013 1 commit
- BGQ - Allow user to request full dimensional mesh. · c9e5b072
  Danny Auble authored Feb 28, 2013
  
  c9e5b072
28 Feb, 2013 1 commit
- Fix for handling node registration messages from older versions without · 39512911
  Danny Auble authored Feb 28, 2013
```
energy data.
```
  39512911
27 Feb, 2013 2 commits
- jobacct_gather - fix total values to not always == the max values. · 43f837b6
  Danny Auble authored Feb 27, 2013
  
  43f837b6
- xcgroup - remove bugs with EINTR management in write calls · f39710fd
  Matthieu Hautreux authored Feb 25, 2013
  
  f39710fd
26 Feb, 2013 3 commits
- Fix for backfill scheduling logic with job preemption · b0f3b651
  Morris Jette authored Feb 26, 2013
```
Without this fix, jobs that should be initiated by the backfill
scheduler based upon the preemption of other jobs will not be
started.
```
  b0f3b651
- BGQ - Fix for signaling steps when allocation ends before step. · e5f0e6e6
  Danny Auble authored Feb 25, 2013
  
  e5f0e6e6
- BGQ - Handle blocks that don't free themselves in a reasonable time better. · 7af7d108
  Danny Auble authored Feb 25, 2013
  
  7af7d108
25 Feb, 2013 1 commit
- BGQ - If a cnode goes into an 'error' state and the block containing the · 5ce0b624
  Danny Auble authored Feb 25, 2013
```
cnode does not have a job running on it do not resume the block.
```
  5ce0b624
22 Feb, 2013 3 commits

select/cons_res, enforce ntasks-per-socket for core allocations · 6b3dd796

Morris Jette authored Feb 22, 2013

Select/cons_res - If the job request specified --ntasks-per-socket and the
allocation using is cores, then pack the tasks onto the sockets up to the
specified value. Previously it would ignore the ntasks-per-socket
parameter and distribute tasks across sockets

6b3dd796

Fix for priority/multifactor2 plugin to not assert when configured with · 2a1c5aad
Danny Auble authored Feb 22, 2013
```
--enable-debug.
```
2a1c5aad
GRES - Correct tracking of specific resources used after slurmctld restart. · 0dfe44c6
Morris Jette authored Feb 21, 2013
```
Counts would previously go negative as jobs terminate and decrement from
a base value of zero
```
0dfe44c6

21 Feb, 2013 2 commits
- BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure · 044c0f6b
  Danny Auble authored Feb 21, 2013
  
  044c0f6b
- Network file systems, at least Lustre, can return -1 and set errno · 1f8856af
  Matthieu Hautreux authored Feb 21, 2013
```
to EINTR when something wrong happened between the open call and
its return. By ensuring that Slurm retries on such errors, we can
better tolerate Network file systems errors at launch time.
```
  1f8856af
20 Feb, 2013 1 commit
- Backfill - responsive improvements for systems with large numbers of jobs · 5d5e35c2
  Danny Auble authored Feb 20, 2013
```
(>5000) and using the SchedulerParameters option bf_max_job_user.

NEWS note for last few commits
```
  5d5e35c2
19 Feb, 2013 2 commits
- Added support for a job having different priorities in different partitions. · 4e61a562
  Alejandro Lucero Palau authored Feb 19, 2013
  
  4e61a562
- Make sched/backfill the default plugin rather than sched/builtin (FIFO) · 65ce5232
  David Bigagli authored Feb 19, 2013
  
  65ce5232
15 Feb, 2013 1 commit
- Correct sinfo "%c" (node's CPU count) output value for Bluegene systems · ed63a86b
  Morris Jette authored Feb 15, 2013
  
  ed63a86b
13 Feb, 2013 2 commits
- BGQ - When cnodes fail in a timeout fashion correctly look up parent · 8e0f2cc4
  Danny Auble authored Feb 13, 2013
```
midplane.
```
  8e0f2cc4
- More robust logic for tree message forward · 3026f7a7
  Hongjia Cao authored Feb 12, 2013
```
Handle situation where a receiving/forwarding host can't unpack the header
of the sender (not compatible version).
```
  3026f7a7
12 Feb, 2013 6 commits
- Eliminate configuration file 4096 character line limitation · 97a5fa2f
  Puenlap Lee authored Feb 12, 2013
  
  97a5fa2f
- Comment out all of the logic in the job_submit/defaults plugin. · 0ce6c2ee
  Morris Jette authored Feb 12, 2013
```
The logic is only an example and not meant for actual use.
```
  0ce6c2ee
- Fix for handling a test-only job or immediate job that fails while being · 3117e934
  Danny Auble authored Feb 12, 2013
```
built.
```
  3117e934
- Added allocated memory to node information available · 3533f822
  Morris Jette authored Feb 12, 2013
```
(within the existing select_nodeinfo field of the node_info_t data structure).
Added Allocated Memory to node information displayed by sview and scontrol commands.
bug 229
```
  3533f822
- Change partition-specific SelectType to SelectTypeParameters · 373a3df1
  Morris Jette authored Feb 11, 2013
```
This makes the configuration parameter names consistent within
a partition and system-wide
```
  373a3df1
- Add SelectType field to "scontrol show partition" output · 78d3fbf8
  Morris Jette authored Feb 11, 2013
```
Added new field to partition_info data structure
Break up some long lines and minor format changes
Move some definitions and statements into alphabetic order
```
  78d3fbf8
11 Feb, 2013 2 commits

Various updates for new slurmctld/dynalloc plugin · 463d2388

Morris Jette authored Feb 11, 2013

1. Removed the job_submit and job_modify functions from the plugin, they are not required for the "slurmctld" plugin type
2. Renamed the new parameter from "JobSubmitDynAllocPort" to "DynAllocPort" and renamed the variable (You need to change this in your slurm.conf file)
3. Added logic so you can see the DynAllocPort value using "scontrol show config" or "sview"
4. I made some minor formatting changes, mostly for lines that were too long
5. Added #ifdef to the msg.h header file
6. Changed the #ifdef variables in the header files to start with "DYNALLOC_", perhaps not needed, but it should safer, especiallly with some common names like "INFO_H"
7. I re-wrote much of info.c. There was no need to get a copy of the node information and process the copy. We can just work directly with the data structures.

463d2388

Added slurmctld/dynalloc plugin, JobSubmitDynAllocPort config parameter · 835b902b
Jimmy Cao authored Feb 11, 2013
```
These provide support for MapReduce+
```
835b902b

08 Feb, 2013 2 commits
- When asking for job states with sacct default to 'now' instead of midnight · af29d9a7
  Danny Auble authored Feb 07, 2013
```
of the current day.
```
  af29d9a7
- Better debug when the database is down and using the --cluster option in · fc5282d3
  David Bigagli authored Feb 07, 2013
```
the user commands.
```
  fc5282d3