Commits · c2b326e796c864c2fe18707754eefaeec7b0688e · Manuel G. Marciani / ces_slurm_simulator

27 Sep, 2013 3 commits
- BLUEGENE - Don't ignore a conn-type request from the user. · 871d9f2f
  Danny Auble authored Sep 27, 2013
  
  871d9f2f
- Update news for last commit · 19ddb83f
  Morris Jette authored Sep 27, 2013
  
  19ddb83f
- Add SchedulerParameters value of "bf_max_job_part" · 7e417a3b
  Morris Jette authored Sep 27, 2013
```
This specifies the maximum depth the backfill scheduler should go
in any single partition.
```
  7e417a3b
25 Sep, 2013 1 commit
- Fix typo in NEWS · db73f1f5
  Morris Jette authored Sep 25, 2013
  
  db73f1f5
24 Sep, 2013 4 commits
- Fix issues where memory or node count of a srun job is altered while the · b30139e5
  Danny Auble authored Sep 24, 2013
```
srun is pending.  The step creation would use the old values and possibly
hang srun since the step wouldn't be able to be created in the modified
allocation.
```
  b30139e5
- Scontrol - Enable changing a job's stdout file · 99e40d91
  Morris Jette authored Sep 24, 2013
  
  99e40d91
- Fix issue when a user has held a job and then sets the begin time · 09498479
  Danny Auble authored Sep 23, 2013
```
into the future.
```
  09498479
- Close file descriptors on exec of prolog, epilog, etc. · 29094e33
  Morris Jette authored Sep 23, 2013
  
  29094e33
23 Sep, 2013 2 commits
- Fix issue with step accounting if a job is requeued. · beb970e9
  Danny Auble authored Sep 23, 2013
  
  beb970e9
- Reorder get config logic to avoid deadlock. · 262374a8
  Morris Jette authored Sep 23, 2013
```
bug 428
```
  262374a8
17 Sep, 2013 2 commits
- scontrol parsing fix · 7912c05b
  Morris Jette authored Sep 17, 2013
```
for setdebugflags command, avoid parsing "-flagname" as
an scontrol command line option.
```
  7912c05b
- This fixes the MaxCPUsPerNode partition constraint · 308b7432
  Armin Größlinger authored Sep 16, 2013
```
for CR_Socket.
```
  308b7432
14 Sep, 2013 2 commits
- job_submit/pbs - extend dependency support · 4da7696f
  Morris Jette authored Sep 13, 2013
```
Add support for "on" and "before*" options
```
  4da7696f
- Backported sh5util from master to 2.6 as there are some important · 421f55a5
  David Bigagli authored Sep 13, 2013
```
bugfixes and the new item extraction feature.
```
  421f55a5
13 Sep, 2013 4 commits
- Add spank PBS plugin to set a bunch of env vars · c1e3bbeb
  Morris Jette authored Sep 13, 2013
  
  c1e3bbeb
- job_submit/pbs - Add some env var support · de62a336
  Morris Jette authored Sep 13, 2013
```
Set PBS_ACCOUNT, PBS_ENVIRONMENT, and PBS_QUEUE only for batch
jobs and only if the user submission sets the account and partition.
```
  de62a336
- Fix qsub/sbatch support for PBS dependency · 5f97e1fa
  Morris Jette authored Sep 13, 2013
  
  5f97e1fa
- Add job_submit/pbs plugin to translate PBS job dependency options · 755bde6b
  Morris Jette authored Sep 12, 2013
```
No support for PBS "before" options
```
  755bde6b
12 Sep, 2013 1 commit

Add qsub support for some more options: · 454ee59b

Morris Jette authored Sep 11, 2013

-l accelerator=true|false	(GPU use)
-l mpiprocs=#	(processors per node)
-l naccelerators=#	(GPU count)
-l select=#		(node count)
-l ncpus=#		(task count)
-v key=value	(environment variable)
-W umask=#		(set job's umask)
Note: the -v option does NOT support quoted commas.

454ee59b

11 Sep, 2013 2 commits

Add support for some new #PBS options in sbatch scripts · 95971e58

Morris Jette authored Sep 11, 2013

-l accelerator=true|false	(GPU use)
-l mpiprocs=#		(processors per node)
-l naccelerators=#	(GPU count)
-l select=#		(node count)
-l ncpus=#		(task count)
-v key=value		(environment variable)
-W umask=#		(set job's umask)

95971e58

Expand NEWS explanation of a change · 39504ced
Morris Jette authored Sep 10, 2013

39504ced

10 Sep, 2013 3 commits
- Start NEWS for v2.6.3 · 6c79e0b7
  Morris Jette authored Sep 10, 2013
  
  6c79e0b7
- If the OverTimeLimit is defined do not declare failed those jobs · 03455f57
  David Bigagli authored Sep 10, 2013
```
that ended in the OverTimeLimit interval.
```
  03455f57
- Update NEWS file. · fa352eb6
  David Bigagli authored Sep 10, 2013
  
  fa352eb6
09 Sep, 2013 2 commits
- Fix segfault if submitting to multiple partitions and holding the job. · bc188e8c
  Danny Auble authored Sep 09, 2013
  
  bc188e8c
- CRAY - Make Slurm work with CLE 5.1.1 · 3b5539bd
  Danny Auble authored Sep 09, 2013
  
  3b5539bd
06 Sep, 2013 1 commit
- Switch/nrt - Prevent invalid memory reference · 97dac70e
  Morris Jette authored Sep 05, 2013
```
Caused by allocating single adapter per node of specific adapter type.
```
  97dac70e
04 Sep, 2013 1 commit

Improve GRES support for CPU topology · 6f50943c

Morris Jette authored Sep 04, 2013

Previous logic would pick CPUs then
reject jobs that can not match GRES to the allocated CPUs. New logic first
filters out CPUs that can not use the GRES, next picks CPUs for the job,
and finally picks the GRES that best match those CPUs.
bug 410

6f50943c

30 Aug, 2013 1 commit
- Validate permissions of key directories at slurmctld startup · 368671b5
  Morris Jette authored Aug 29, 2013
```
Report anything that is world writable.
```
  368671b5
29 Aug, 2013 3 commits
- remove last comment, it is documented in job_mgr.c as this... · ab64e75b
  Danny Auble authored Aug 29, 2013
```
/* Current code (<= 2.1) has it so we start the new
 * job with the next step id.  This could be used
 * when restarting to figure out which step the
 * previous run of this job stopped on. */
```
  ab64e75b
- When a job is requeued reset the step id's back to 0. · 7e7edfca
  Danny Auble authored Aug 29, 2013
  
  7e7edfca
- Enforce --ntasks-per-socket=1 job option when allocating by socket · 58dd480a
  Magnus Jonsson authored Aug 29, 2013
```
See
https://groups.google.com/forum/#!topic/slurm-devel/j4izr0L4w8w
```
  58dd480a
28 Aug, 2013 2 commits

Fix for invalid memory reference · caa69594

Morris Jette authored Aug 28, 2013

due to multiple free calls caused by job arrays submitted to
multiple partitions. The root cause is the job priority array
of the original job being re-used by the subsequent job array
entries. A similar problem that could be induced by the user
specifying a job accounting frequency when submitting a job
array is also fixed.
bug 401

caa69594

Make sure GrpCPURunMins is added when creating a user, account or QOS with · 2806f6d9
Danny Auble authored Aug 28, 2013
```
sacctmgr.
```
2806f6d9

27 Aug, 2013 1 commit

Reservation with CoreCnt: Avoid possible invalid memory reference · e0541f93

Morris Jette authored Aug 27, 2013

If reservation create request included a CoreCnt value and more
nodes are required than configured, the logic in select/cons_res
could go off the end of the core_cnt array. This patch adds a
check for a zero value in the core_cnt array, which terminates
the user-specified array.
Back-port from master of commit 211c224b

e0541f93

24 Aug, 2013 1 commit
- If running jobacct_gather/none fix issue on unpacking step completion. · 33ff8dbc
  Danny Auble authored Aug 23, 2013
  
  33ff8dbc
23 Aug, 2013 1 commit

Correct value of min_nodes returned by loading job info · 98e24b0d

Morris Jette authored Aug 23, 2013

This is a correction of a bug introduced in commit
https://github.com/SchedMD/slurm/commit/ac44db862c8d1f460e55ad09017d058942ff6499
That commit eliminated the need of reading the node state information
from squeue for performance reasons (mostly for large parallel systems
in which the Prolog ran squeue, which generates a lot of simultaneous
RPCs, slowing down the job launch process). It also assumed 1 CPU per
node. If a pending job specified a node count of 1 and a task count
larger than one, squeue was reporting the node count of the job as
the same as the task count. This patch moves that same calculation
of a pending job's minimum node count into slurmctld, so the squeue
still does not need to read the node information, but can report the
correct node count for pending jobs with minimal overhead.

98e24b0d

22 Aug, 2013 2 commits
- BackupController - Make sure we have a connection to the DBD first thing · 8e3ab25f
  Danny Auble authored Aug 22, 2013
```
to avoid it thinking we don't have a cluster name.
```
  8e3ab25f
- News for last update · 7da8e149
  Danny Auble authored Aug 21, 2013
  
  7da8e149
21 Aug, 2013 1 commit

Fix of wrong node/job state problem after reconfig · d80c8667

Hongjia Cao authored Aug 21, 2013

If there are completing jobs, a reconfigure will set wrong job/node
state: all nodes of the completing job will be set allocated, and the
job will not be removed even if the completing nodes are released. The
state can only be restored by restarting slurmctld after the completing
nodes released.

d80c8667