Commits · 8ed1b303288e5de210ce9db2baf9c94c191a7213 · Manuel G. Marciani / ces_slurm_simulator

25 May, 2012 1 commit

Modify scontrol show job to require -dd option to print batch script. · 8ed1b303

Don Albert authored May 25, 2012

I have implemented the changes as you suggested: using a "-dd" option to indicate that the display of the script is wanted, and setting both the "SHOW_DETAIL" and a new "SHOW_DETAIL2" flag.

Since "scontrol" can be run interactively as well, I added a new "script" option to indicate that display of both the script and the details is wanted if the job is a batch job.

Here are the man page updates for "man scontrol". For the "-d, --details" option:

-d, --details
Causes the show command to provide additional details where available. Repeating the option more than
once (e.g., "-dd") will cause the show job command to also list the batch script, if the job was a batch
job.

For the interactive "details" option:

details
Causes the show command to provide additional details where available. Job information will include
CPUs and NUMA memory allocated on each node. Note that on computers with hyperthreading enabled and
SLURM configured to allocate cores, each listed CPU represents one physical core. Each hyperthread on
that core can be allocated a separate task, so a job's CPU count and task count may differ. See the
--cpu_bind and --mem_bind option descriptions in srun man pages for more information. The details
option is currently only supported for the show job command. To also list the batch script for batch
jobs, in addition to the details, use the script option described below instead of this option.

And for the new interactive "script" option:

script Causes the show job command to list the batch script for batch jobs in addition to the detail informa-
tion described under the details option above.

Attached are the patch file for the changes and a text file with the results of the tests I did to check out the changes. The patches are against SLURM 2.4.0-rc1.

-Don Albert-

8ed1b303

24 May, 2012 3 commits
- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without · c592b8c1
  Danny Auble authored May 24, 2012
```
compiling with --enable-debug
```
  c592b8c1
- Pass job submit host to moab · 62c03f73
  Jon Bringhurst authored May 24, 2012
```
The purpose of this is so moab scripts and commands (such as 'checkjob')
have consistent access to the SUBMITHOST variable.
```
  62c03f73
- fix parse_uint32/16 to complain if a non digit is given · 4fe889c4
  Danny Auble authored May 23, 2012
  
  4fe889c4
23 May, 2012 3 commits
- BGQ - make it so srun -i<taskid> works correctly · f78a6a06
  Danny Auble authored May 23, 2012
  
  f78a6a06
- Fix issue with assoc_mgr if a bad state file is given and the database · 094b4a0c
  Danny Auble authored May 23, 2012
```
isn't up at the time the slurmctld starts, not running the
priority/multifactor plugin, and then the database is started up later.
```
  094b4a0c
- Fix to create a reservation with licenses and no nodes · f212ba3c
  Morris Jette authored May 23, 2012
  
  f212ba3c
22 May, 2012 1 commit
- Fix DefMemPerCPU for partition definitions · 5fe8c10f
  Danny Auble authored May 22, 2012
  
  5fe8c10f
16 May, 2012 4 commits
- Zero node jobs can use partition with no nodes. · cf564e12
  Morris Jette authored May 16, 2012
```
Cray - Improve support for zero compute note resource allocations.
Partition used can now be configured with no nodes nodes.
```
  cf564e12
- update news for rc2 · 4ccd5fe1
  Danny Auble authored May 16, 2012
  
  4ccd5fe1
- updated for 2.3.6 (if it ever happens) · 7ca49560
  Danny Auble authored May 16, 2012
  
  7ca49560
- META and NEWS updated for 2.4.0-rc1 · 93353c94
  Danny Auble authored May 16, 2012
  
  93353c94
11 May, 2012 1 commit
- Added NEWS article about addition of jobacct_gather/cgroup · e7b4ed56
  Danny Auble authored May 11, 2012
  
  e7b4ed56
10 May, 2012 1 commit
- Document nature of performance improvements · 9ee5af4f
  Morris Jette authored May 10, 2012
  
  9ee5af4f
09 May, 2012 2 commits

Reset priority of system held jobs when dependency is satisfied · 9e9298b1

Don Lipari authored May 09, 2012

The symptom is that SLURM schedules lower priority jobs to run when higher priority, dependent jobs have their dependencies satisfied.  This happens because dependent jobs still have a priority of 1 when the job queue is sorted in the schedule() function.  The proposed fix forces jobs to have their priority updated when their dependencies are satisfied.

9e9298b1

Reset priority of system held jobs when dependency is satisfied · bf9f2452

Don Lipari authored May 09, 2012

The symptom is that SLURM schedules lower priority jobs to run when higher priority, dependent jobs have their dependencies satisfied.  This happens because dependent jobs still have a priority of 1 when the job queue is sorted in the schedule() function.  The proposed fix forces jobs to have their priority updated when their dependencies are satisfied.

bf9f2452

04 May, 2012 1 commit
- Modifications to Bjørn-Helge Mevik's patch to be more friendly for future · 72d594bf
  Danny Auble authored May 04, 2012
```
developments.
```
  72d594bf
03 May, 2012 1 commit

Fix segv in slurmctld for job step with relative option · 9bb178c3

Matthieu Hautreux authored May 03, 2012

Here is the way to reproduce it :
[root@cuzco27 georgioy]# salloc -n64 -N4 --exclusive
salloc: Granted job allocation 8
[root@cuzco27 georgioy]#srun -r 0 -n 30 -N 2 sleep 300&
[root@cuzco27 georgioy]#srun -r 1 -n 40 -N 3 sleep 300&
[root@cuzco27 georgioy]# srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error: Unable to create job step: Zero Bytes were transmitted or received

9bb178c3

02 May, 2012 1 commit
- original patch from Martin for Support for cyclic distribution of · 69eff678
  Martin Perrry authored May 01, 2012
```
cpus in task/cgroup plugin
```
  69eff678
27 Apr, 2012 2 commits

Cray - Add support for batch job with zero compute nodes · cd6fb7e5

Morris Jette authored Apr 27, 2012

Cray - Add support for zero compute note resource allocation to run batch
script on front-end node with no ALPS reservation. Useful for pre- or post-
processing. NOTE: The partition must be configured with MinNodes=0.

cd6fb7e5

Fix minor issue where uid and gid were switched in sview for submitting · 8e5da472
Danny Auble authored Apr 27, 2012
```
batch jobs.
```
8e5da472

26 Apr, 2012 2 commits

Add sinfo output format option of "%R" for partition name. · 16efb8e1

Morris Jette authored Apr 26, 2012

Sinfo output format of "%P" now prints "*" after default partition even if
no field width is specified (previously included "*" only if no field width
was specified. Added output format of "%R" to print partition name only
without identifying the default partition with "*").

16efb8e1

BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords. · 8b0888f2
Danny Auble authored Apr 25, 2012

8b0888f2

24 Apr, 2012 1 commit
- Fix to job preemption logic to preempt multiple jobs at the same time. · 27155dc8
  Morris Jette authored Apr 24, 2012
  
  27155dc8
23 Apr, 2012 2 commits
- Avoid sched/wiki2 parsing problem if quotes in user working dir or wckey · cf81b117
  Morris Jette authored Apr 23, 2012
  
  cf81b117
- Add support for switches parameter to the job_submit/lua plugin · 50360372
  Par Andersson authored Apr 22, 2012
  
  50360372
20 Apr, 2012 1 commit
- CRAY - fix for handling memory requests from user for an allocation. · 5604c5b4
  Danny Auble authored Apr 20, 2012
```
Previously the code would come up with how much memory a PE should have
instead of the memory a node should have.
```
  5604c5b4
18 Apr, 2012 1 commit
- Added cpu_run_min to the output of sshare --long. Work contributed by · c4d3e8e1
  Mark Nelson authored Apr 18, 2012
```
Mark Nelson.
```
  c4d3e8e1
17 Apr, 2012 3 commits

BLUEGENE - fixed issue where MaxNodes limit on a partition only limited · 5f51d6d5
Danny Auble authored Apr 16, 2012
```
larger than midplane jobs.
```
5f51d6d5

Add SchedulerParameters of bf_max_job_user · f6e3abf0

Bjørn-Helge Mevik authored Apr 16, 2012

Add support for new SchedulerParameters of bf_max_job_user, maximum number
of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
University of Oslo.

f6e3abf0

fix sched/wiki2 (Moab) to support "#" in job record information · 6cd20848

Morris Jette authored Apr 16, 2012

Fix sched/wiki2 to support job account name, gres, partition name, wckey,
or working directory that contains "#" (a job record separator). Without
this patch, the parsing will probably stop once reaching the "#".

6cd20848

12 Apr, 2012 1 commit
- Fix issue where log message is more than 256 chars and then has a format · f9aa52fc
  Danny Auble authored Apr 12, 2012
  
  f9aa52fc
10 Apr, 2012 4 commits
- Fix clearing of limit values if an admin removes the limit for max cpus · fd999b73
  Danny Auble authored Apr 10, 2012
```
and time limit where it was previously set by an admin.
```
  fd999b73
- Fix state restore of job limit set from admin value for min_cpus. · ae185ed8
  Danny Auble authored Apr 10, 2012
  
  ae185ed8
- Fix potential race condition if MinJobAge is very low (i.e. 1) and using · 0fed555a
  Danny Auble authored Apr 10, 2012
```
slurmdbd accounting and running large amounts of jobs (>50 sec).  Job
information could be corrupted before it had a chance to reach the DBD.
```
  0fed555a
- Update NEWS and RELEASE_NOTES for SLURM v2.4 · e534057c
  jette authored Apr 09, 2012
  
  e534057c
09 Apr, 2012 1 commit
- BGQ - fixed issue where if a user asked for a specific node count and more · 19845159
  Danny Auble authored Apr 09, 2012
```
tasks than possible without overcommit the request would be allowed on more
nodes than requested.
```
  19845159
03 Apr, 2012 2 commits

Minor updates to PMI2 code and documentation · 49e07b2d

Morris Jette authored Apr 03, 2012

Add documentation for the mpi/pmi2 plugin.
Minor changes to code formatting and logic, but old code should work fine.

49e07b2d

Limit depth of circular job dependency check · 0caecbc5

Morris Jette authored Apr 02, 2012

Add support for new SchedulerParameters of max_depend_depth defining the
maximum number of jobs to test for circular dependencies (i.e. job A waits
for job B to start and job B waits for job A to start). Default value is
10 jobs.

0caecbc5

02 Apr, 2012 1 commit

Fix in select/cons_res+topology+job with node range count · cd84134c

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

cd84134c