Commits · 9e71bfb71a754440672d5f2145115ae3096af4c9 · Manuel G. Marciani / ces_slurm_simulator

02 Apr, 2013 2 commits

Morris Jette authored Apr 02, 2013

A fix for this problem will require more study. This one causes
xassert when an attempt to start a job results in it not being
started by sched/backfill due to the partition time limit.

9e71bfb7

Fix sched/backfill logic for min/max time limit and partition limit · 17a00dee

Morris Jette authored Apr 01, 2013

Fix sched/backfill logic to initiate jobs with maximum time limit over the
partition limit, but the minimum time limit permits it to start.
Related to bug 251

17a00dee

01 Apr, 2013 1 commit
- Reset a job's reason from PartitionDown when the partition is set up · c1a0ef0c
  Morris Jette authored Apr 01, 2013
```
Fix for bug 224
```
  c1a0ef0c
29 Mar, 2013 2 commits
- BGQ - Push action 'D' info to scontrol for admins. · 257a97ef
  Danny Auble authored Mar 29, 2013
  
  257a97ef
- Add sanity check for NULL cluster names trying to register. · 4b862252
  Danny Auble authored Mar 29, 2013
  
  4b862252
27 Mar, 2013 3 commits
- Added support for FreeBSD. · 5338879e
  Jason Bacon authored Mar 27, 2013
  
  5338879e
- Purge vestigial job scripts · ea3c9f0b
  Morris Jette authored Mar 27, 2013
```
WIthout this patch, when the slurmd cold starts or slurmstepd
terminates abnormally, the job script file can be left around.
bug 243
```
  ea3c9f0b
- Reject job at submit time if the node count is invalid · f1cf6d2d
  Morris Jette authored Mar 27, 2013
```
Previously such a job submitted to a DOWN partition would be queued.
bug 187
```
  f1cf6d2d
26 Mar, 2013 2 commits
- Accounting - Minor fix to avoid reuse of variable erroneously. · 9403500e
  Danny Auble authored Mar 26, 2013
  
  9403500e
- Accounting - When rolling up data from past usage ignore "idle" time from · 2ed8a4d6
  Danny Auble authored Mar 26, 2013
```
a reservation when it has the "Ignore_Jobs" flag set.  Since jobs could run
outside of the reservation in it's nodes without this you could have
double time.
```
  2ed8a4d6
25 Mar, 2013 2 commits
- Cray - Disable enforcement of MaxTasksPerNode · aacdb424
  Morris Jette authored Mar 25, 2013
```
This is not applicable with launch/aprun
```
  aacdb424
- Note nature of last two patches from Hongjia Cao · a63e616e
  Morris Jette authored Mar 25, 2013
  
  a63e616e
22 Mar, 2013 2 commits

Select/cray - Modify build to enable direct use of libslurm library. · 7d4f145a

Morris Jette authored Mar 22, 2013

These changes are required so that select/cray can load select/linear,
  which is a bit more complex than the other select plugin structures.
Export plugin_context_create and plugin_context_destroy symbols from
  libslurm.so.
Correct typo in exported hostlist_sort symbol name
Define some functions in select/cray to avoid undefined symbols if
  the plugin is loaded via libslurm rather than from a slurm command
  (which has all of the required symbols)

7d4f145a

Add Allow/Deny Groups/User fields to front end node configuration · 9a127c23
Morris Jette authored Mar 21, 2013

9a127c23

20 Mar, 2013 3 commits
- add username to the filename pattern in the batch script · c4d93160
  Luis Cabellos authored Mar 20, 2013
  
  c4d93160
- [PATCH] fix of job requiring contiguous nodes can not run · e416e35f
  Hongjia Cao authored Mar 20, 2013
  
  e416e35f
- SlurmDBD - fix to allow user root along with the slurm user to register a · 485cb062
  Danny Auble authored Mar 20, 2013
```
cluster.
```
  485cb062
19 Mar, 2013 3 commits
- Select/cons_res - Tighter packing of job allocations on sockets. · 7fcdc7e5
  Morris Jette authored Mar 19, 2013
  
  7fcdc7e5
- Note nature of latest change · 8e038b5c
  Morris Jette authored Mar 19, 2013
  
  8e038b5c
- Do not report error when job step terminates while sstat is running · 4cb6137c
  Morris Jette authored Mar 19, 2013
  
  4cb6137c
14 Mar, 2013 4 commits
- sreport - Fix by adding planned down time to utilization reports. · dced5e7f
  Danny Auble authored Mar 14, 2013
  
  dced5e7f
- Accounting - more checks for strings with a possible `'` in it. · ff021de1
  Danny Auble authored Mar 14, 2013
  
  ff021de1
- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly. · 5c370edb
  Danny Auble authored Mar 11, 2013
  
  5c370edb
- Change default log time to ISO 8601 (remove time zone) · 28beff27
  Morris Jette authored Mar 14, 2013
```
Add milliseconds to default log message header (both RFC 5424 and ISO 8601
time formats). Disable milliseconds logging using the configure
parameter "--disable-log-time-msec". Default time format changes to
ISO 8601 (without time zone information). Specify "--enable-rfc5424time"
to restore the time zone information.
```
  28beff27
13 Mar, 2013 2 commits

Add "--enable-rfc5424time-secs" configure parameter · 00c46ce5

Morris Jette authored Mar 13, 2013

Add milliseconds to default log message header with the (default)
RFC5424 time format. Disable milliseconds logging using the configure
parameter "--enable-rfc5424time-secs". Sample time stamp format is as
follows: "2013-03-13T14:28:17.767-07:00".

00c46ce5

Correction to error returned by step request error for too many CPUs · 36df0bbf

Morris Jette authored Mar 13, 2013

If step requests more CPUs than possible in specified node count of job
allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
ESLURM_NODES_BUSY and retrying.

36df0bbf

12 Mar, 2013 1 commit
- Minor format changes from previous commit · f5a89755
  Morris Jette authored Mar 12, 2013
  
  f5a89755
11 Mar, 2013 3 commits
- Export SLURM_ environment variables from sbatch, even if not --exported · 336bd7bf
  Nathan Yee authored Mar 11, 2013
```
Without this change, when the sbatch --export option is used, many
Slurm environment variables are not set unless explicitly exported.
```
  336bd7bf
- Fix for sacctmgr add qos to handle the 'flags' option. · c8498b0d
  Danny Auble authored Mar 11, 2013
  
  c8498b0d
- Start NEWS for v2.5.0-pre3 · 56544fe4
  Morris Jette authored Mar 11, 2013
  
  56544fe4
08 Mar, 2013 4 commits

Start NEWS for v2.5.5 · ad3caaae
Morris Jette authored Mar 08, 2013

ad3caaae

GRES topology bug in core selection logic fixed. · a59ba9bc

jette authored Mar 07, 2013

This problem would effect systems in which specific GRES are associated
with specific CPUs.
One possible result is the CPUs identified as usable could be inappropriate
and job would be held when trying to layout out the tasks on CPUs (all
done as part of the job allocation process).
The other problem is that if multiple GRES are linked to specific CPUs,
there was a CPU bitmap OR which should have been an AND, resulting in
some CPUs being identified as usable, but not available to all GRES.

a59ba9bc

Fix to handle init.d script for querying status and not return 1 on · 01e855a9
Danny Auble authored Mar 08, 2013
```
success
```
01e855a9
CRAY - Initial commit to add support for Basil 1.3 to the mix. · 07a6aec0
Stephen Trofinoff authored Mar 07, 2013

07a6aec0

07 Mar, 2013 1 commit

GRES topology bug in core selection logic fixed. · 07eb5d24

jette authored Mar 07, 2013

This problem would effect systems in which specific GRES are associated
with specific CPUs.
One possible result is the CPUs identified as usable could be inappropriate
and job would be held when trying to layout out the tasks on CPUs (all
done as part of the job allocation process).
The other problem is that if multiple GRES are linked to specific CPUs,
there was a CPU bitmap OR which should have been an AND, resulting in
some CPUs being identified as usable, but not available to all GRES.

07eb5d24

06 Mar, 2013 2 commits
- BGQ - More robust checking for correct node, task, and ntasks-per-node · 3419a62c
  Danny Auble authored Mar 05, 2013
```
options in srun, and push that logic to salloc and sbatch.

Bug 201
```
  3419a62c
- BGQ - If signal is NODE_FAIL allow forward even if job is completing · de8232d8
  Danny Auble authored Mar 04, 2013
```
and timeout in the runjob_mux trying to send in this situation.

Bug 223
```
  de8232d8
04 Mar, 2013 3 commits

Added support to purge reservation records from accounting. · b990f1fa
Danny Auble authored Feb 28, 2013

b990f1fa

Permit backfill scheduler to continue executing after reliquishing locks · d08b015c

Magnus Jonsson authored Mar 04, 2013

Jobs are not backfilled due to the fact that backfill does not finish the complete backlog of jobs in the queue before it's interrupted and starts all over again. We sometimes have lots of jobs in the queue of various sizes and users and even with idle nodes short job will not start because of this.  I have made a patch for backfill with a configuration option (bf_continue) to let backfill continue.

d08b015c

Prevent slurmctld assert after invalid reservation update attempt · c97129d6

Morris Jette authored Mar 04, 2013

The original reservation data structure is deleted and it's backup
added to the reservation list, but jobs can retain a pointer to the
original (now invalid) reservation data structure.
Bug 250

c97129d6