Commits · 22ece52e27a20312889ecc98fc5bcf69294244e1 · Manuel G. Marciani / ces_slurm_simulator · GitLab

10 Apr, 2011 14 commits

api: remove unreferenced and undocumented function · 22ece52e

Moe Jette authored Apr 10, 2011

This removes a function "slurm_pack_msg_no_header" which is nowhere referenced
in the src tree and which also is not listed in any of the slurm manpages.

As far as I understand the documentation, each slurm message needs to have a
header, it could thus be that this function is from very old or initial code.

22ece52e

scontrol: refactor if/else statement · 367c71ba
Moe Jette authored Apr 10, 2011

367c71ba
protocol_defs: remove duplicate/identical test · 5c13acad
Moe Jette authored Apr 10, 2011
```
This removes a duplicated test statement which appears identically twice.
```
5c13acad

sprio: add support for the SLURM_CLUSTERS environment variable · 0a0efdf2

Moe Jette authored Apr 10, 2011

This adds support for the SLURM_CLUSTERS environment variable also for sprio.
It also makes the test for the priority plugin type dependent on whether
running with multiple cluster support or not.

0a0efdf2

scontrol: add support for the SLURM_CLUSTERS environment variable · c7045c83

Moe Jette authored Apr 09, 2011

On our frontend host we support multiple clusters (Cray and non-Cray) by
setting the SLURM_CLUSTERS environment variable accordingly.

In order to use scontrol (e.g. for hold/release of a user job) from a
frontend host to control jobs on a remote Cray system, we need support for
the SLURM_CLUSTERS environment variable also in scontrol.

c7045c83

slurmctld: keep original nice value when putting job on hold · b414712e

Moe Jette authored Apr 09, 2011

The current code erases the old nice value (both negative and positive) when a job is
put on hold so that the job has a 0 nice component upon release.

This interaction causes difficulties if the nice value set at submission time had been
set there for a reason, for instance when
 * a system administrator has allowed to set a negative nice value;
 * the user wanted to keep this as a low-priority job and wants his/her other jobs
   to go first (indenpendent of the hold option);
 * the nice value is used for other semantics - at our site for instance, we use it
   for computed "base priority values" that are computed by looking at how much of
   their quota a given group has already (over)used.

Here is an example which illustrates the loss of original nice values:

  [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 55
  [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 66
  [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 77
  [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 88
  [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 99
  [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 110

This is from user 'kraused' whose project 's310' is within the allocated quota and thus
has an initial nice value of -542 (set via the job_submit/lua plugin).

However, by putting his jobs on hold, he has lost this advantage:

  JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
     55  kraused      15181        153          0       5028      10000          0      0
     66  kraused      15181        153          0       5028      10000          0      0
     77  kraused      15181        153          0       5028      10000          0      0
     88  kraused      15178        150          0       5028      10000          0      0
     99  kraused      15178        150          0       5028      10000          0      0
    110  kraused      15178        150          0       5028      10000          0      0

I believe that resetting the nice value has been there for a reason, thus the patch prevents
reset of current nice value only if the operation is not user/administrator hold.

b414712e

slurmctld: test job_specs->min_nodes before altering the value via partition setting · f8ea48bb

Moe Jette authored Apr 09, 2011

This fixes a problem when trying to move a pending job from one partitition to another
while not supplying any other parameters:
 * if a partition value is present, the job is pending and no min_nodes are supplied,
   the job_specs->min_nodes get set from the detail_ptr value;
 * this causes subsequent tests for job_specs->min_nodes ==/!= NO_VAL to fail.

The following illustrates the behaviour, the example is taken from our system:
  palu2:0 ~>scontrol update jobid=3944 partition=night
  slurm_update error: Requested operation not supported on this system

  slurmctld.log
  [2011-04-06T14:39:51] update_job: setting partition to night for job_id 3944
  [2011-04-06T14:39:51] Change of size for job 3944 not supported
  [2011-04-06T14:39:51] updating accounting
  [2011-04-06T14:39:51] _slurm_rpc_update_job JobId=3944 uid=21215: Requested operation not supported on this system

==> The 'Change of size for job 3944' reveals that the !select_g_job_expand_allow() case was triggered,
    after setting the job_specs->min_nodes due to supplying a job_specs->partition.

Fix:
====
 Since the test for select_g_job_expand_allow() is not dependent on the job state, moved it up, before
 the test for the job_specs->partition. At the same time, also moved the equality for INFINITE/NO_VAL
 min_nodes values to the same place.
 Tests for job_specs->min_nodes below the job_specs->partition setting depend on the job state, 
 - the 'Reset min and max node counts as needed, insure consistency' requires pending state;
 - the other remaining test is only for IS_JOB_RUNNING/SUSPENDED.

f8ea48bb

slurmctld: case of authorized operator releasing user hold · 1895a10a

Moe Jette authored Apr 09, 2011

This patch avoids that the priority is not recalculated on 'scontrol release',
which  happens when an authorized operator releases a job, or if the job is
released via e.g. the job_submit plugin.

The patch reorders the tests in update_job() to 
 * test first if the job has been held by the user and, only if not,
 * test whether an authorized operator changed the priority or
   the updated priority is being reduced.

Due to earlier permission checks, we have either
 * job_ptr->user_id == uid or 
 * authorized,
where in both cases the release-user-hold operation is authorized.

1895a10a

scontrol: set uid when releasing a job · 6353467b

Moe Jette authored Apr 09, 2011

This fix is related to an earlier one and was observed when trying to 'scontrol release'
a job previously submitted via 'sbatch --hold' by the same user.

Within the job_submit/lua plugin, the user gets automatically assigned a partition. So,
even if no submitter uid checks are usually expected, it can happen in the process of
releasing a job, that a part_check is performed.

In this case, the error message was

[2011-03-30T18:37:17] _part_access_check: uid 4294967294 access to partition usup denied, bad group
[2011-03-30T18:37:17] error: _slurm_rpc_update_job JobId=12856 uid=21215: User's group not permitted to use this partition

and like before (in scontrol_update_job()), was fixed by supplying the UID of the requesting user.

6353467b

add function args to header · a269b6f4
Moe Jette authored Apr 09, 2011

a269b6f4

slurmstepd: avoid coredump in case of NULL job · e0d92b8a

Moe Jette authored Apr 09, 2011

We build slurm with --enable-memory-leak-debug and encountered twice the same core
dump when user 'root' was trying to run jobs during a maintenance session. 

The root user is not in the accounting database, which explains the errors seen
below. The gdb session shows that in this invocation 

palu7:0 log>stat /var/crash/palu7-slurmstepd-6602.core 
...
Modify: 2011-04-04 19:34:44.000000000 +0200

slurmctld.log
[2011-04-04T19:34:44] _slurm_rpc_submit_batch_job JobId=3254 usec=1773
[2011-04-04T19:34:44] ALPS RESERVATION #5, JobId 3254: BASIL -n 1920 -N 0 -d 1 -m 1333
[2011-04-04T19:34:44] sched: Allocate JobId=3254 NodeList=nid000[03-13,18-29,32-88] #CPUs=1920
[2011-04-04T19:34:44] error: slurmd error 4005 running JobId=3254 on front_end=palu7: User not found on host
[2011-04-04T19:34:44] update_front_end: set state of palu7 to DRAINING
[2011-04-04T19:34:44] completing job 3254
[2011-04-04T19:34:44] Requeue JobId=3254 due to node failure
[2011-04-04T19:34:44] sched: job_complete for JobId=3254 successful
[2011-04-04T19:34:44] requeue batch job 3254
[2011-04-04T20:28:43] sched: Cancel of JobId=3254 by UID=0, usec=57285

(gdb) core-file palu7-slurmstepd-6602.core 
[New Thread 6604]
Core was generated by `/opt/slurm/2.3.0/sbin/slurmstepd'.
Program terminated with signal 11, Segmentation fault.
#0  main (argc=1, argv=0x7fffd65a1fd8) at slurmstepd.c:413
413             jobacct_gather_g_destroy(job->jobacct);
(gdb) print job
$1 = (slurmd_job_t *) 0x0
(gdb) list
408
409     #ifdef MEMORY_LEAK_DEBUG
410     static void
411     _step_cleanup(slurmd_job_t *job, slurm_msg_t *msg, int rc)
412     {
413             jobacct_gather_g_destroy(job->jobacct);
414             if (!job->batch)
415                     job_destroy(job);
416             /*
417              * The message cannot be freed until the jobstep is complete
(gdb) print msg
$2 = (slurm_msg_t *) 0x916008
(gdb) print rc
$3 = -1
(gdb) 

The patch tests for a NULL job argument for the calls that need to dereference the job pointer.

e0d92b8a

select/cray: zero reservation ID is not an error · 03f984aa

Moe Jette authored Apr 09, 2011

This avoids meaningless error messages that warn about a zero reservation ID:

 [2011-04-07T15:31:26] _slurm_rpc_submit_batch_job JobId=2870 usec=33390
                       ... a minute later the user decides to scancel the queued job:
 [2011-04-07T15:32:34] error: JobId=2870 has invalid (ZERO) resId
 [2011-04-07T15:32:34] sched: Cancel of JobId=2870 by UID=21770, usec=230

To keep things simple, that test has been removed.

(The patch is in particular also necessary since now job_signal() may trigger
 a basil_release() of a pending job which has no ALPS reservation yet.)

03f984aa

select/cray: release ALPS reservation on termination signals · 12772a3a

Moe Jette authored Apr 09, 2011

On rosa we experienced severe problems when jobs got killed via scancel or
as a result of job timeout. Job cleanup took several minutes, created stray
processes that consumed resources on the slurmd node, keeping the system
for long spans unable from scheduling.

This problem did not show up on the smaller 2-cabinet XE system (which also
runs a more recent ALPS version). The fix for the problem is to keep new
script lines from starting by sending apkill only after formally releasing
the reservation.

For all signals whose default disposition is to terminate or to dump core,
the reservation is released before signalling the aprun job steps. This
prevents a race condition where further aprun lines get executed while the
apkill of the current aprun line in the job script is in progress.

We did a before/after test on rosa under full load and the problem disappeared.

12772a3a

add testimonial from CSCS · 44bec602
Moe Jette authored Apr 09, 2011

44bec602

09 Apr, 2011 4 commits
- First attempt to clean up make install of man page html files · 1a05cbd6
  Don Lipari authored Apr 08, 2011
  
  1a05cbd6
- correct bad hyperlink · ea0026bb
  Moe Jette authored Apr 08, 2011
  
  ea0026bb
- -- Add RPCs to get the SPANK environment variables from the slurmctld daemon. · a99a66af
  Moe Jette authored Apr 08, 2011
```
    Patch from Andrej N. Gritsenko.
```
  a99a66af
- Note Support for BGQ and Sun XT/XE · 9033b096
  Moe Jette authored Apr 08, 2011
  
  9033b096
08 Apr, 2011 5 commits
- various fixes to run tests on bgq system. · 3646b683
  Moe Jette authored Apr 08, 2011
```
Most of the changes were to support hostname with 4 digit suffix.
Some other tests are failing, but will leave for now
```
  3646b683
- modifications to address dependency information being cleared from the job info as · 73c1ec85
  Moe Jette authored Apr 08, 2011
```
they are satisfied.
```
  73c1ec85
- Modify test to work with the dependency string getting cleared as dependencies are satisfied. · 6b4d73b7
  Moe Jette authored Apr 08, 2011
  
  6b4d73b7
- -- Job dependency information will only show the currently active dependencies · e91dd030
  Moe Jette authored Apr 08, 2011
```
    rather than the original dependencies. From Dan Rusak, Bull.
```
  e91dd030
- define argument to a function as void · 1d6c10d5
  Moe Jette authored Apr 07, 2011
```
add some more logging information
```
  1d6c10d5
07 Apr, 2011 12 commits
- -- Added man pages to html pages and the new cpu_management.html page. · a1ae775e
  Don Lipari authored Apr 07, 2011
```
   Submitted by Martin Perry / Rod Schultz, Bull.
```
  a1ae775e
- improve wording of an error · 50daada9
  Moe Jette authored Apr 07, 2011
  
  50daada9
- -- Fix logic in BackupController to properly recover front-end node state and · 6ea9e5e1
  Moe Jette authored Apr 07, 2011
```
    avoid purging active jobs.
```
  6ea9e5e1
- -- Eliminate "error from _trigger_slurmctld_event in backup.c" due to lack of · b69bc2a3
  Moe Jette authored Apr 07, 2011
```
    event triggers.
```
  b69bc2a3
- -- Fix bug in front-end configurations which reports job_cnt_comp underflow · 3c8bb61f
  Moe Jette authored Apr 07, 2011
```
    errors after slurmctld restarts.
```
  3c8bb61f
- improve information logged with respect to event triggers · ab679155
  Moe Jette authored Apr 07, 2011
  
  ab679155
- update the cray documentation. · 5bb05e25
  Moe Jette authored Apr 07, 2011
  
  5bb05e25
- start to properly map srun options into aprun options · 09655d17
  Moe Jette authored Apr 07, 2011
  
  09655d17
- fix for dealing with a NULL jobinfo structure · 7bb26622
  Danny Auble authored Apr 07, 2011
  
  7bb26622
- svn merge -r23022:23028 https://eris.llnl.gov/svn/slurm/branches/slurm-2.2 · 170ad087
  Danny Auble authored Apr 06, 2011
  
  170ad087
- Fix so slurmctld will pack correctly 2.1 step information. (Only needed if a... · 767898e7
  Danny Auble authored Apr 06, 2011
```
Fix so slurmctld will pack correctly 2.1 step information. (Only needed if a 2.1 client is talking to a 2.2 slurmctld.)
```
  767898e7
- Fix for when configuring a node with more resources than in real life and using task/affinity. · 8017a113
  Danny Auble authored Apr 06, 2011
  
  8017a113
06 Apr, 2011 5 commits
- now have all srun options in the wrapper. · 0cbcfba6
  Moe Jette authored Apr 06, 2011
  
  0cbcfba6
- next batch of options · 05eb4182
  Moe Jette authored Apr 06, 2011
  
  05eb4182
- add more options to wrapper · ad97446b
  Moe Jette authored Apr 06, 2011
  
  ad97446b
- svn merge -r23010:23022 https://eris.llnl.gov/svn/slurm/branches/slurm-2.2 · 6c163a5c
  Moe Jette authored Apr 05, 2011
  
  6c163a5c
- Minor changes to salloc/sbatch/srun man pages · 95cbc075
  Don Lipari authored Apr 05, 2011
  
  95cbc075