Commits · 032c6d89d02ffbb8bcf1797716e59af18b627d49 · Manuel G. Marciani / ces_slurm_simulator

14 May, 2012 7 commits
- Added SLURM header · 032c6d89
  Danny Auble authored May 14, 2012
  
  032c6d89
- removed whitespace · b2d26643
  Danny Auble authored May 14, 2012
  
  b2d26643
- Code added to contribs dealing with auto completion of slurm commands. · cebe118d
  Danny Auble authored May 14, 2012
```
Original code by Damien François. <damien.francois@uclouvain.Be>
```
  cebe118d
- fix to make sure the admin running the slurmd knows if the task/cgroup · 1489c4d2
  Danny Auble authored May 14, 2012
```
plugin fails to load.
```
  1489c4d2
- correctly initialize slurmd (move jobacct_gather init to the correct · 5345d9d6
  Danny Auble authored May 14, 2012
```
place)
```
  5345d9d6
- give warning about jobacct_gather/cgroup plugin being an experimental · 7f10cf4c
  Danny Auble authored May 14, 2012
```
plugin
```
  7f10cf4c
- Load the jobacct_gather plugin in the slurmd to make sure it is · 4d0fbd85
  Danny Auble authored May 14, 2012
```
possible.  Otherwise if it fails it wouldn't happen to the slurmstepd
and might be missed by failing silently.

It will also mess up the packing/unpacking since it should be obvious if
it fails on the slurmctld.
```
  4d0fbd85
11 May, 2012 22 commits
- Fix for running on system with multiple slurmd's configured. · 67ba1367
  Danny Auble authored May 11, 2012
  
  67ba1367
- Fix issue where cgroup information was loaded on the start of the · 5d6fa153
  Danny Auble authored May 11, 2012
```
slurmctld when starting up the jobacct_gather plugin.  It isn't needed
and makes errors in the slurmctld if not running as root.
```
  5d6fa153
- Added NEWS article about addition of jobacct_gather/cgroup · e7b4ed56
  Danny Auble authored May 11, 2012
  
  e7b4ed56
- removed unused functions/variables · a0bd7c02
  Danny Auble authored May 11, 2012
  
  a0bd7c02
- redo formatting change · 287631ca
  Danny Auble authored May 11, 2012
  
  287631ca
- remove unrelated documentation · f5dfa729
  Danny Auble authored May 11, 2012
  
  f5dfa729
- Formatting fixes to code · 750a8a7e
  Danny Auble authored May 11, 2012
  
  750a8a7e
- Patch to add experimental jobacct_gather/cgroup plugin. · 468326c4
  Martin Perry authored May 11, 2012
```
Original patch from Martin Perry (Bull)
```
  468326c4
- Correct agent purge logic for a newer message type · ac2c8516
  Morris Jette authored May 11, 2012
  
  ac2c8516
- Make it so you can edit the Static flag correctly in sview · bbcdf050
  Danny Auble authored May 10, 2012
  
  bbcdf050
- print out static flags from api · dbc1c8fa
  Danny Auble authored May 10, 2012
  
  dbc1c8fa
- Added documentation for STATIC_ALLOC flag in a reservation. · 6b3cfdba
  Danny Auble authored May 10, 2012
  
  6b3cfdba
- Logic to make sure when updating a job's reservation, partition, or · d9861b69
  Danny Auble authored May 10, 2012
```
qos or any combination of those the correct thing happens.

If the job is using a QOS or Partition that only works inside a reservation
then deign the update if only removing the reservation.
```
  d9861b69
- Make it so if a partition is changed on a job and not the reservation · 471fe7a0
  Danny Auble authored May 10, 2012
```
we will read the old reservation used so if using a partition or qos
that only allows it to be used in a reservation we won't fail.
```
  471fe7a0
- remove reservation check to be before check for partition or qos when · 3df4c1a0
  Danny Auble authored May 10, 2012
```
updating job so as to make sure the reservation is set correctly when
doing the other checks
```
  3df4c1a0
- minor formatting changes · aa5ab7eb
  Danny Auble authored May 10, 2012
  
  aa5ab7eb
- avoid doing strlen unless needed · 9108ccbf
  Danny Auble authored May 10, 2012
  
  9108ccbf
- order alphabetically · ab3a171f
  Danny Auble authored May 10, 2012
  
  ab3a171f
- remove unused variable and remove erroneous pack that would mess up · 8dd9a917
  Danny Auble authored May 10, 2012
```
previous versions of SLURM
```
  8dd9a917
- change name of flag to be shorter. · 6eb746c0
  Danny Auble authored May 10, 2012
  
  6eb746c0
- Add ability to specify a static allocation reservation. · ab0bef52
  Bill Brophy authored May 10, 2012
```
Original Patch from Bill Brophy (Group Bull)
```
  ab0bef52
- Patch to add a flag to a partition or qos to only allow them to be used · f7c4ad98
  Bill Brophy authored May 10, 2012
```
if a reservation is also requested in the job.

Original Patch from Bill Brophy (Group Bull)
```
  f7c4ad98
10 May, 2012 1 commit
- Document nature of performance improvements · 9ee5af4f
  Morris Jette authored May 10, 2012
  
  9ee5af4f
09 May, 2012 6 commits
- correct test_id · 8b5f101c
  Morris Jette authored May 09, 2012
  
  8b5f101c
- Revert some v2.3 mods not applicable to v2.4 · 4cbaf7b0
  Morris Jette authored May 09, 2012
  
  4cbaf7b0
- Clarify job step default gres allocation · ca55aee4
  Morris Jette authored May 09, 2012
  
  ca55aee4
- Reset priority of system held jobs when dependency is satisfied · 9e9298b1
  Don Lipari authored May 09, 2012
```
The symptom is that SLURM schedules lower priority jobs to run when higher priority, dependent jobs have their dependencies satisfied.  This happens because dependent jobs still have a priority of 1 when the job queue is sorted in the schedule() function.  The proposed fix forces jobs to have their priority updated when their dependencies are satisfied.
```
  9e9298b1
- sview fix to handle correct values. · 0b7bcc99
  Danny Auble authored May 09, 2012
  
  0b7bcc99
- Subtle changes to credential logic for performance · 3ddb9c92
  Morris Jette authored May 08, 2012
  
  3ddb9c92
07 May, 2012 4 commits

Merge from v2.3 with slight logic change · ec996c21
Morris Jette authored May 07, 2012
```
Job priority of 1 is no longer used as a special case in slurm v2.4
```
ec996c21
Merge branch 'slurm-2.3' · b2c0cff8
Morris Jette authored May 07, 2012

b2c0cff8
Enable zero node allocation only for Cray batch script · 1490e835
Morris Jette authored May 02, 2012

1490e835

Job priority reset bug on slurmctld restart · 5e9dca41

Don Lipari authored May 07, 2012

The commit 8b14f388 on Jan 19, 2011 is causing problems with Moab cluster-scheduled machines.  Under this case, Moab hands off every job submitted immediately to SLURM which gets a zero priority.  Once Moab schedules the job, Moab raises the job's priority to 10,000,000 and the job runs.

When you happen to restart the slurmctld under such conditions, the sync_job_priorities() function runs which attempts to raise job priorities into a higher range if they are getting too close to zero.  The problem as I see it is that you include the "boost" for zero priority jobs.  Hence the problem we are seeing is that once the slurmctld is restarted, a bunch of zero priority jobs are suddenly eligible.  So there becomes a disconnect between the top priority job Moab is trying to start and the top priority job SLURM sees.

I believe the fix is simple:

diff job_mgr.c~ job_mgr.c
6328,6329c6328,6331
<       while ((job_ptr = (struct job_record *) list_next(job_iterator)))
<               job_ptr->priority += prio_boost;
---
       while ((job_ptr = (struct job_record *) list_next(job_iterator))) {
               if (job_ptr->priority)
                       job_ptr->priority += prio_boost;
       }
Do you agree?

Don

5e9dca41