Commits · d9861b69c0fa5bbb2453bf7ba5f3be07867424ec · Manuel G. Marciani / ces_slurm_simulator

11 May, 2012 10 commits
- Logic to make sure when updating a job's reservation, partition, or · d9861b69
  Danny Auble authored May 10, 2012
```
qos or any combination of those the correct thing happens.

If the job is using a QOS or Partition that only works inside a reservation
then deign the update if only removing the reservation.
```
  d9861b69
- Make it so if a partition is changed on a job and not the reservation · 471fe7a0
  Danny Auble authored May 10, 2012
```
we will read the old reservation used so if using a partition or qos
that only allows it to be used in a reservation we won't fail.
```
  471fe7a0
- remove reservation check to be before check for partition or qos when · 3df4c1a0
  Danny Auble authored May 10, 2012
```
updating job so as to make sure the reservation is set correctly when
doing the other checks
```
  3df4c1a0
- minor formatting changes · aa5ab7eb
  Danny Auble authored May 10, 2012
  
  aa5ab7eb
- avoid doing strlen unless needed · 9108ccbf
  Danny Auble authored May 10, 2012
  
  9108ccbf
- order alphabetically · ab3a171f
  Danny Auble authored May 10, 2012
  
  ab3a171f
- remove unused variable and remove erroneous pack that would mess up · 8dd9a917
  Danny Auble authored May 10, 2012
```
previous versions of SLURM
```
  8dd9a917
- change name of flag to be shorter. · 6eb746c0
  Danny Auble authored May 10, 2012
  
  6eb746c0
- Add ability to specify a static allocation reservation. · ab0bef52
  Bill Brophy authored May 10, 2012
```
Original Patch from Bill Brophy (Group Bull)
```
  ab0bef52
- Patch to add a flag to a partition or qos to only allow them to be used · f7c4ad98
  Bill Brophy authored May 10, 2012
```
if a reservation is also requested in the job.

Original Patch from Bill Brophy (Group Bull)
```
  f7c4ad98
10 May, 2012 1 commit
- Document nature of performance improvements · 9ee5af4f
  Morris Jette authored May 10, 2012
  
  9ee5af4f
09 May, 2012 6 commits
- correct test_id · 8b5f101c
  Morris Jette authored May 09, 2012
  
  8b5f101c
- Revert some v2.3 mods not applicable to v2.4 · 4cbaf7b0
  Morris Jette authored May 09, 2012
  
  4cbaf7b0
- Clarify job step default gres allocation · ca55aee4
  Morris Jette authored May 09, 2012
  
  ca55aee4
- Reset priority of system held jobs when dependency is satisfied · 9e9298b1
  Don Lipari authored May 09, 2012
```
The symptom is that SLURM schedules lower priority jobs to run when higher priority, dependent jobs have their dependencies satisfied.  This happens because dependent jobs still have a priority of 1 when the job queue is sorted in the schedule() function.  The proposed fix forces jobs to have their priority updated when their dependencies are satisfied.
```
  9e9298b1
- sview fix to handle correct values. · 0b7bcc99
  Danny Auble authored May 09, 2012
  
  0b7bcc99
- Subtle changes to credential logic for performance · 3ddb9c92
  Morris Jette authored May 08, 2012
  
  3ddb9c92
07 May, 2012 4 commits

Merge from v2.3 with slight logic change · ec996c21
Morris Jette authored May 07, 2012
```
Job priority of 1 is no longer used as a special case in slurm v2.4
```
ec996c21
Merge branch 'slurm-2.3' · b2c0cff8
Morris Jette authored May 07, 2012

b2c0cff8
Enable zero node allocation only for Cray batch script · 1490e835
Morris Jette authored May 02, 2012

1490e835

Job priority reset bug on slurmctld restart · 5e9dca41

Don Lipari authored May 07, 2012

The commit 8b14f388 on Jan 19, 2011 is causing problems with Moab cluster-scheduled machines.  Under this case, Moab hands off every job submitted immediately to SLURM which gets a zero priority.  Once Moab schedules the job, Moab raises the job's priority to 10,000,000 and the job runs.

When you happen to restart the slurmctld under such conditions, the sync_job_priorities() function runs which attempts to raise job priorities into a higher range if they are getting too close to zero.  The problem as I see it is that you include the "boost" for zero priority jobs.  Hence the problem we are seeing is that once the slurmctld is restarted, a bunch of zero priority jobs are suddenly eligible.  So there becomes a disconnect between the top priority job Moab is trying to start and the top priority job SLURM sees.

I believe the fix is simple:

diff job_mgr.c~ job_mgr.c
6328,6329c6328,6331
<       while ((job_ptr = (struct job_record *) list_next(job_iterator)))
<               job_ptr->priority += prio_boost;
---
       while ((job_ptr = (struct job_record *) list_next(job_iterator))) {
               if (job_ptr->priority)
                       job_ptr->priority += prio_boost;
       }
Do you agree?

Don

5e9dca41

04 May, 2012 4 commits
- split test 22.1 out into 4 different sub tests · e6537e95
  Nathan Yee authored May 04, 2012
  
  e6537e95
- Modifications to Bjørn-Helge Mevik's patch to be more friendly for future · 72d594bf
  Danny Auble authored May 04, 2012
```
developments.
```
  72d594bf
- Original Patch - New feature: GrpMEM limit for QOSes and associations · ec2e363d
  Bjrn-Helge Mevik authored May 04, 2012
```
from Bjørn-Helge Mevik
```
  ec2e363d
- update munge home page · b138060e
  Danny Auble authored May 03, 2012
  
  b138060e
03 May, 2012 5 commits
- Merge branch 'slurm-2.3' · 8c15bc34
  Morris Jette authored May 03, 2012
  
  8c15bc34
- Pick step's relative nodes based upon nodes allocated to job, not nodes available to job · 63833965
  Matthieu Hautreux authored May 03, 2012
  
  63833965
- Fix segv in slurmctld for job step with relative option · 9bb178c3
  Matthieu Hautreux authored May 03, 2012
```
Here is the way to reproduce it :
[root@cuzco27 georgioy]# salloc -n64 -N4 --exclusive
salloc: Granted job allocation 8
[root@cuzco27 georgioy]#srun -r 0 -n 30 -N 2 sleep 300&
[root@cuzco27 georgioy]#srun -r 1 -n 40 -N 3 sleep 300&
[root@cuzco27 georgioy]# srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error: Unable to create job step: Zero Bytes were transmitted or received
```
  9bb178c3
- Remove vestigial sinfo "%R" format option, use "%E" instead for node reason · 33b9018a
  Morris Jette authored May 02, 2012
  
  33b9018a
- BLUEGENE - fix more issues where the max nodes of a partition weren't · c655c75c
  Danny Auble authored May 02, 2012
```
honored correctly.  I also put in nice notes where the values aren't
to be altered.
```
  c655c75c
02 May, 2012 10 commits
- More changes to support zero compute node cray allocation · 13dc12d7
  Morris Jette authored May 02, 2012
```
* Specify MinNodes via "scontrol update partition".
* Whenever the zero-node allocation ends, the frontend node is left in a state of COMPLETING until scontrol reconfigure is issued (this doesn't appear to impact the performance of the front end node as other jobs can still be submitted including other zero-node jobs).
```
  13dc12d7
- set total_cpus to be the same as the block_map_size to allow for faking · 4c8005c3
  Danny Auble authored May 02, 2012
```
system of different size than actual hardware.
```
  4c8005c3
- minor formatting · 7ae35983
  Danny Auble authored May 01, 2012
  
  7ae35983
- minor formatting issue and added comments. · f5f51bc0
  Danny Auble authored May 01, 2012
  
  f5f51bc0
- move function to where other static functions are. · fde95d06
  Danny Auble authored May 01, 2012
  
  fde95d06
- Simplify the way the hwloc_cpuset_t to hwloc_bitmap_t conversion is · 5e5f272b
  Danny Auble authored May 01, 2012
```
handled.
```
  5e5f272b
- minor formatting fixes · ec7fda3d
  Danny Auble authored May 01, 2012
  
  ec7fda3d
- code cleanup · dc4bcb9e
  Danny Auble authored May 01, 2012
  
  dc4bcb9e
- move static functions to be with the other static functions · ba97d486
  Danny Auble authored May 01, 2012
  
  ba97d486
- original patch from Martin for Support for cyclic distribution of · 69eff678
  Martin Perrry authored May 01, 2012
```
cpus in task/cgroup plugin
```
  69eff678