Commits · 046689e94a143da44707f81be4fd9bbe9aca912b · Manuel G. Marciani / ces_slurm_simulator

26 Aug, 2013 1 commit

Add new job_state of JOB_BOOT_FAIL · ba54c8b4

Morris Jette authored Aug 26, 2013

Used job terminations due to failure to boot it's allocated nodes
or BlueGene block.
bug 213

ba54c8b4

24 Aug, 2013 1 commit
- If running jobacct_gather/none fix issue on unpacking step completion. · 33ff8dbc
  Danny Auble authored Aug 23, 2013
  
  33ff8dbc
23 Aug, 2013 1 commit

Correct value of min_nodes returned by loading job info · 98e24b0d

Morris Jette authored Aug 23, 2013

This is a correction of a bug introduced in commit
https://github.com/SchedMD/slurm/commit/ac44db862c8d1f460e55ad09017d058942ff6499
That commit eliminated the need of reading the node state information
from squeue for performance reasons (mostly for large parallel systems
in which the Prolog ran squeue, which generates a lot of simultaneous
RPCs, slowing down the job launch process). It also assumed 1 CPU per
node. If a pending job specified a node count of 1 and a task count
larger than one, squeue was reporting the node count of the job as
the same as the task count. This patch moves that same calculation
of a pending job's minimum node count into slurmctld, so the squeue
still does not need to read the node information, but can report the
correct node count for pending jobs with minimal overhead.

98e24b0d

22 Aug, 2013 5 commits
- BackupController - Make sure we have a connection to the DBD first thing · b62e729d
  Danny Auble authored Aug 22, 2013
```
to avoid it thinking we don't have a cluster name.
```
  b62e729d
- Add stdin/out/err to sview job output. · 81ff404e
  Nathan Yee authored Aug 22, 2013
  
  81ff404e
- BackupController - Make sure we have a connection to the DBD first thing · 8e3ab25f
  Danny Auble authored Aug 22, 2013
```
to avoid it thinking we don't have a cluster name.
```
  8e3ab25f
- Add squeue output format options for job command and working director · 5cdfb028
  Nathan Yee authored Aug 22, 2013
```
%o and %Z respectively
```
  5cdfb028
- News for last update · 7da8e149
  Danny Auble authored Aug 21, 2013
  
  7da8e149
21 Aug, 2013 1 commit

Fix of wrong node/job state problem after reconfig · d80c8667

Hongjia Cao authored Aug 21, 2013

If there are completing jobs, a reconfigure will set wrong job/node
state: all nodes of the completing job will be set allocated, and the
job will not be removed even if the completing nodes are released. The
state can only be restored by restarting slurmctld after the completing
nodes released.

d80c8667

20 Aug, 2013 2 commits
- Fix issue with reconfig and GrpCPURunMins · 6d793189
  Danny Auble authored Aug 20, 2013
  
  6d793189
- Added fields to "scontrol show job" output · 47e300a6
  Morris Jette authored Aug 20, 2013
```
Added boards_per_node, sockets_per_board, ntasks_per_node,
ntasks_per_board, ntasks_per_socket, ntasks_per_core, and nice.
```
  47e300a6
19 Aug, 2013 5 commits
- Added "JobAcctGatherParams" configuration parameter. · 22f253b3
  Chris Read authored Aug 19, 2013
  
  22f253b3
- Updated NEWS and RELEASE_NOTES with the -I sh5util option. · 4fb32559
  David Bigagli authored Aug 19, 2013
  
  4fb32559
- Clarify a change in NEWS · 7b650ddf
  Morris Jette authored Aug 19, 2013
  
  7b650ddf
- begin NEWS for v13.12.0-pre2 · 7682f4de
  Morris Jette authored Aug 16, 2013
  
  7682f4de
- Start NEWS for v2.6.2 · cb6b9ddd
  Morris Jette authored Aug 16, 2013
  
  cb6b9ddd
17 Aug, 2013 1 commit
- Start NEWS for v2.6.2 · 9f334c91
  Morris Jette authored Aug 16, 2013
  
  9f334c91
16 Aug, 2013 2 commits
- Sched/backfill - Change default max_job_bf parameter from 50 to 100. · d3cdbf56
  Morris Jette authored Aug 16, 2013
```
This makes it consistent with the value of default_queue_depth.
The backfill scheduler should be able to easily handle this value
(or much higher for pretty much any configuration).
```
  d3cdbf56
- Fix issue with a 2.5 slurmstepd locking up when talking to a 2.6 slurmd. · e804c9bb
  Danny Auble authored Aug 15, 2013
  
  e804c9bb
15 Aug, 2013 5 commits
- CRAY - fix issue with accelerators on a cray when parsing BASIL 1.3 XML. · c30fe1b3
  Danny Auble authored Aug 15, 2013
  
  c30fe1b3
- Fix issue with potentially referencing past an array in parse_time() · 2833c19a
  Danny Auble authored Aug 15, 2013
  
  2833c19a
- Fix in accounting_storage/filetxt to correct start times which sometimes · 9eba4384
  Danny Auble authored Aug 15, 2013
```
could end up before the job started. Bug 371
```
  9eba4384
- Proctrack/pgid - Add support for proctrack_p_plugin_get_pids() · 188df55c
  Morris Jette authored Aug 14, 2013
```
This function can now be called to test for processes which are
dumping in order to avoid sending them a SIGKILL until dump
completes. Change in logic required for job_container/cray.
```
  188df55c
- Fix CPURunMins if a job is requeued from a failed launch. · 8aaa817e
  Danny Auble authored Aug 14, 2013
  
  8aaa817e
14 Aug, 2013 5 commits

Make jobacct_gather/cgroup work correctly and also make all jobacct_gather · 2eba8d7f
Danny Auble authored Aug 14, 2013
```
plugins more maintainable.
```
2eba8d7f
Validate a job's accounting frequency at submission time · 26560fa5
Morris Jette authored Aug 14, 2013
```
This avoids waiting for the job's initiation to fail.
```
26560fa5

Fix job state recovery logic for accounting frequency · 6d878aa7

Morris Jette authored Aug 14, 2013

Fix job state recovery logic in which a job's accounting frequency was
not set. This would result in a value of 65534 seconds being used (the
equivalent of NO_VAL in uint16_t), which could result in the job being
requeued or aborted.

6d878aa7

Fix pack and unpack between 2.6 and 2.5 · 585e7947
David Bigagli authored Aug 13, 2013

585e7947

Fix infinite loop for one byte config file · 3820cf2e

Morris Jette authored Aug 13, 2013

Problem reported by BYU. slurm.conf included a file one byte in
length. Logic created a buffer one byte long and used fgets()
to read the file. fgets() reads one byte less than the buffer
size to include a trailing '\0', so it fails to read the file.

3820cf2e

13 Aug, 2013 3 commits

Update NEWS to reflect several recent code changes · 1c9c3be2
Morris Jette authored Aug 13, 2013

1c9c3be2

select/cons_res - Avoid extraneous "oversubscribe" error messages · 302d8b3f

jette authored Aug 13, 2013

This problem was reported by Harvard University and could be
reproduced with a command line of "srun -N1 --tasks-per-node=2 -O id".
With other job types, the error message could be logged many times
for each job. This change logs the error once per job and only if
the job request does not include the -O/--overcommit option.

302d8b3f

MYSQL - fix issue when rolling up usage and events happened when a cluster · 9c09a71b
Danny Auble authored Aug 12, 2013
```
was down (slurmctld not running) during that time period.
```
9c09a71b

09 Aug, 2013 2 commits
- PGSQL - Notes about Postgres functionality being removed in the next · f0c534b7
  Danny Auble authored Aug 09, 2013
```
version of Slurm.
```
  f0c534b7
- Remove Postgres plugins. Anyone wanting this functionality back can use · 9deabb79
  Danny Auble authored Aug 09, 2013
```
the reverse of this comment to return the code to the mix.
```
  9deabb79
08 Aug, 2013 3 commits
- Set a job's RLIMIT_AS limit based upon it's memory limit and VsizeFactor · 4454bc40
  Mark Nelson authored Aug 08, 2013
  
  4454bc40
- Permit Slurm administrator to submit a batch job as any user · 5adfad4b
  Morris Jette authored Aug 08, 2013
  
  5adfad4b
- Add StdIn, StdOut, and StdErr paths to job information dumped · fd52b8a4
  Morris Jette authored Aug 07, 2013
```
Visible only using "scontrol show job" today and these fields are
only relevant for batch jobs.
```
  fd52b8a4
07 Aug, 2013 3 commits
- Add mechanism for job_submit plugin to generate error msg for command · 2ecab57b
  Morris Jette authored Aug 07, 2013
```
Add mechanism for job_submit plugin to generate error message for srun
salloc or sbatch to log. New argument added to job_submit function in
the plugin.
bug 278
```
  2ecab57b
- sview - Add missing debug_flag options. · 4f33c49e
  Danny Auble authored Aug 06, 2013
  
  4f33c49e
- Added sinfo and squeue format option of "%all" to print all fields · 3f67a89d
  Morris Jette authored Aug 06, 2013
```
with a vertical bar separating each field.
```
  3f67a89d