Commits · a04a479b6d47e08a1270db905beff27a2266fa83 · Manuel G. Marciani / ces_slurm_simulator

05 Jun, 2014 1 commit
- Treat special characters like %A, %s etc. literally in the file names · 4f6606b3
  David Bigagli authored Jun 04, 2014
```
when specified escaped.
```
  4f6606b3
04 Jun, 2014 4 commits

Disable duplicate event triggers · f68a8f80

Morris Jette authored Jun 04, 2014

Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP
("Duplicate event trigger").

f68a8f80

Add support for strigger to accept program args · 3519287c
Morris Jette authored Jun 04, 2014
```
Modify strigger to accept arguments to the program to execute when an
event trigger occurs.
```
3519287c

Added strigger --noheader option · d0d0553f

Morris Jette authored Jun 04, 2014

Added strigger option of -N, --noheader to not print the header when
displaying a list of triggers.

d0d0553f

task/cgrou fix batch job binding to CPUs · 399cb897

Morris Jette authored Jun 04, 2014

batch jobs have cpus_per_task set to zero, which resulted in an
error of "task/cgroup: task[0] unable to set taskset '0x0'"

399cb897

03 Jun, 2014 4 commits

scontrol now accepts the option job=xxx or jobid=xxx for the · b1534743
David Bigagli authored Jun 03, 2014
```
requeue, requeuehold and release operations.
```
b1534743

Do not purge batch files at completion time · ac044199

Morris Jette authored Jun 03, 2014

Do not purge the script and environment files for completed jobs on
slurmctld reconfiguration or restart (they might be later requeued).
Purge the files only when the job record is purged.
bug 834

ac044199

scale jobs mem-per-cpu limit · 0fbdb9e2

Morris Jette authored Jun 02, 2014

If a job --mem-per-cpu limit exceeds the partition or system limit, then
scale the job's memory limit and CPUs per task to satisfy the limit.
bug 848

0fbdb9e2

After reconfig rebuild the job node counters only for jobs that has · 02263d8f
David Bigagli authored Jun 02, 2014
```
not finished yet otherwise if requeued the job may enter an invalid
COMPLETING state.
```
02263d8f

29 May, 2014 1 commit
- fix memory leak related to job preemption · bcb50fa2
  Morris Jette authored May 28, 2014
```
select/cons_res plugin: Fix memory leak related to job preemption.
bug 837
```
  bcb50fa2
28 May, 2014 2 commits

Fix issues with code when using automake 1.14.1 · d75bcaa5
Danny Auble authored May 28, 2014

d75bcaa5

Added SchedulerParameters value of Ignore_NUMA · 47f482ab

Morris Jette authored May 28, 2014

This give system administrators the option on AMD Opteron 6000
series processors of either considering each NUMA node on a socket
as a separate socket (resulting in some incorrect logging of socket
count information) or not (resulting in sub-optimal job allocations
since each core in the socket will be considered equivalent, even
if on different NUMA nodes within the socket).
bug 838

47f482ab

23 May, 2014 3 commits

Improve the clean up of cgroup hierarchy. · dfb0739e
David Bigagli authored May 23, 2014

dfb0739e
Update NEWS. · 0891e4e3
David Bigagli authored May 23, 2014

0891e4e3

This commit has a functionality fixes, but they were all related and · 67fdbce5

Danny Auble authored May 22, 2014

not able to be separated into multiply patches.

If EnforcePartLimits=Yes and QOS job is using can override limits, allow
it.

Fix issues if partition allows or denys account's or QOS' and either are
not set.

If a job requests a partition and it doesn't allow a QOS or account the
job is requesting pend unless EnforcePartLimits=Yes.  Before it would
always kill the job at submit.

67fdbce5

21 May, 2014 5 commits
- Keep supporting 'srun -N x --pty bash' for historical reasons. · 6aadcf15
  David Bigagli authored May 21, 2014
  
  6aadcf15
- If TaskProlog sets SLURM_PROLOG_CPU_MASK reset affinity for that task · d1efe282
  Danny Auble authored May 20, 2014
```
based on the mask given.
```
  d1efe282
- Make cgroup task layout (block | cyclic) method mirror that of · 06750d32
  Danny Auble authored May 20, 2014
```
task/affinity.
```
  06750d32
- task/affinity - When using --hint=nomultithread only bind to the first · 82d5e763
  Danny Auble authored May 20, 2014
```
thread in a core.
```
  82d5e763
- Add new distribution method fcyclic so when a task is using multiple cpus · a335b974
  Danny Auble authored May 20, 2014
```
it can bind cyclically across sockets.
```
  a335b974
20 May, 2014 3 commits
- Pack cpus-per-task onto single socket · cfc399f1
  Morris Jette authored May 20, 2014
```
cpus-per-task support: Try to pack all CPUs of each tasks onto one socket.
Previous logic could spread the tasks CPUs across multiple sockets.
```
  cfc399f1
- Revert "Fix division by zero issue when get_cpuinfo returns 0 for number of sockets." · dcadb070
  Danny Auble authored May 20, 2014
```
This reverts commit b22268d8.
```
  dcadb070
- Fix division by zero issue when get_cpuinfo returns 0 for number of sockets. · b22268d8
  Danny Auble authored May 20, 2014
  
  b22268d8
19 May, 2014 1 commit

Properly handle job requeue options · 68a4bfd7

Morris Jette authored May 19, 2014

Properly enforce job --requeue and --norequeue options. Previous
logic was in three places not doing so (either ignoring the value,
ANDing it with the JobRequeue configuration option or using the
JobRequeue configuration option by itself).
bug 821

68a4bfd7

15 May, 2014 2 commits

Add SelectTypeParameters option of CR_PACK_NODES · 7c2fe50e

Morris Jette authored May 15, 2014

Add SelectTypeParameters option of CR_PACK_NODES to pack a job's tasks
tightly on its allocated nodes rather than distributing them evenly across
the allocated nodes.
bug 819

7c2fe50e

Close window with srun if waiting for an allocation and while printing · c563b34e
Danny Auble authored May 15, 2014
```
something you also get a signal which would produce deadlock.

Fix Bug 601.
```
c563b34e

14 May, 2014 2 commits
- Run EpilogSlurmctld for jobs killed in reconfig · 87128cf0
  Morris Jette authored May 14, 2014
```
Run EpilogSlurmctld for a job is killed during slurmctld reconfiguration.
bug 806
```
  87128cf0
- Jobs hidden only if ALL partitions are hidden · 5fc21da2
  Morris Jette authored May 14, 2014
```
Only if ALL of their partitions are hidden will a job be hidden by default.
bug 812
```
  5fc21da2
13 May, 2014 4 commits

Correct CR_LLN with node selection by job · 899561b1

Morris Jette authored May 13, 2014

Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
Previous logic would in most instances allocate resources on all nodes
to the job.

899561b1

Correct squeue job node & CPU counts on requeue · 4f97cae8

Morris Jette authored May 13, 2014

Correct squeue's job node and CPU counts for requeued jobs.
Previously, when a job was requeued, its CPU count reported
was that of the previous execution. When combined with the
--ntasks-per-node option, squeue would compute the expected
node count. If the --exclusive option is also used, the node
count reported by squeue could be off by a large margin (e.g.
"sbatch --exclusive --ntasks-per-node=1 -N1 .." on requeue
would use the number of CPUs on the allocated node to recompute
the expected node count).
bug 756

4f97cae8

Fix issue where batch cpuset wasn't looked at correctly in · c5728294
Danny Auble authored May 13, 2014
```
jobacct_gather/cgroup.
```
c5728294
Support non-standard slurm.conf path · 3bf2adcd
Morris Jette authored May 13, 2014
```
Support SLURM_CONF path which does not have "slurm.conf" as the file name.
bug 803
```
3bf2adcd

12 May, 2014 4 commits

Retry step create if node not responding · ffad3102

Morris Jette authored May 12, 2014

If a job has non-responding node, retry job step create rather than
returning with DOWN node error.
bug 734

ffad3102

Cosmetic mods to NEWS · e17ffc1b
Morris Jette authored May 12, 2014

e17ffc1b
Fix support for job --profile=none option · 043e1b08
Puenlap Lee authored May 12, 2014
```
Also correct related documentation
```
043e1b08

fix of comp nodes causing backfill to end early · d508ea95

Hongjia Cao authored May 12, 2014

Completing nodes is removed when calling _try_sched() for a job, which
is the case in select_nodes(). If _try_sched() thinks the job can run
now but select_nodes() returns ESLURM_NODES_BUSY, the backfill loop will
be ended.

d508ea95

09 May, 2014 2 commits
- CRAY - make job_container/cncu default when running on a Cray natively · dbf03e40
  Danny Auble authored May 08, 2014
  
  dbf03e40
- If an invalid assoc_ptr comes in don't use the id to verify it. · 2261d393
  Danny Auble authored May 09, 2014
  
  2261d393
08 May, 2014 2 commits

Fix sinfo -R to print each node once · b5ace9a8

Morris Jette authored May 07, 2014

Fix sinfo -R to print each down/drained node once, rather than once per
partition. This was broken in the sinfo change to process each partition's
information in a separate pthread.

b5ace9a8

Correct sinfo sort fields options · ff518ad1

Morris Jette authored May 07, 2014

Correct sinfo --sort fields to match documentation: E => Reason,
H -> Reason Time (new), R -> Partition Name, u/U -> Reason user (new)

ff518ad1