Commits · 1b2ff838cec084bedd53b8630811102d37bd27e0 · Manuel G. Marciani / ces_slurm_simulator

06 Jul, 2012 3 commits

Move srun loading of plugins earlier in the logic · 1b2ff838

Morris Jette authored Jul 06, 2012

This move reduces the risk of srun failing horribly due to code that
is inconsistent with the plugins if srun is running during a SLURM
upgrade, especially a major upgrade in which the plugin function
arguments can change

1b2ff838

Merge branch 'slurm-2.4' · 76a0e82e
Morris Jette authored Jul 05, 2012
```
Conflicts:
	src/slurmctld/job_scheduler.c
```
76a0e82e

Fix for incorrect partition point for job · dd1d573f

Carles Fenoy authored Jul 05, 2012

If job is submitted to more than one partition, it's partition pointer can
be set to an invalid value. This can result in the count of CPUs allocated
on a node being bad, resulting in over- or under-allocation of its CPUs.
Patch by Carles Fenoy, BSC.

Hi all,

After a tough day I've finally found the problem and a solution for 2.4.1
I was able to reproduce the explained behavior by submitting jobs to 2 partitions.
This makes the job to be allocated in one partition but in the schedule function the partition of the job is changed to the NON allocated one. This makes that the resources can not be free at the end of the job.

I've solved this by changing the IS_PENDING test some lines above in the schedule function in (job_scheduler.c)

This is the code from the git HEAD (Line 801). As this file has changed a lot from 2.4.x I have not done a patch but I'm commenting the solution here.
I've moved the if(!IS_JOB_PENDING) after the 2nd line (part_ptr...). This prevents the partition of the job to be changed if it is already starting in another partition.

job_ptr = job_queue_rec->job_ptr;

part_ptr = job_queue_rec->part_ptr;
job_ptr->part_ptr = part_ptr;
xfree(job_queue_rec);

if (!IS_JOB_PENDING(job_ptr))

continue; /* started in other partition */

Hope this is enough information to solve it.

I've just realized (while writing this mail) that my solution has a memory leak as job_queue_rec is not freed.

Regards,
Carles Fenoy

dd1d573f

05 Jul, 2012 6 commits
- switch/nrt: elimnates all known memory leaks in the code · 832fe7e8
  Morris Jette authored Jul 05, 2012
  
  832fe7e8
- switch/nrt: memory leak fix. · 7a140c93
  Morris Jette authored Jul 05, 2012
```
This code change is completely different from IBM's example code, but
eliminates memory leaks that exist in iBM's sample code.
```
  7a140c93
- switch/nrt: major memory leak fix · d7f384e4
  Morris Jette authored Jul 05, 2012
  
  d7f384e4
- switch/nrt: fix unload table logic · f56323de
  Morris Jette authored Jul 05, 2012
  
  f56323de
- switch/nrt: Fix memory leaks · d757077c
  Morris Jette authored Jul 05, 2012
  
  d757077c
- switch/nrt: Add missing unload_table logic · 59631af3
  Morris Jette authored Jul 05, 2012
  
  59631af3
04 Jul, 2012 2 commits
- Merge branch 'slurm-2.4' · 943b8d27
  Morris Jette authored Jul 03, 2012
```
Conflicts:
	NEWS
```
  943b8d27
- Tweak test for down node · 4dc4fe90
  Morris Jette authored Jul 03, 2012
  
  4dc4fe90
03 Jul, 2012 13 commits
- Fix typo in test · ee71e2a4
  Morris Jette authored Jul 03, 2012
  
  ee71e2a4
- Add test for switch/nrt network options · 336ae96e
  Nathan Yee authored Jul 03, 2012
  
  336ae96e
- Add regression test for switch/nrt · 0496ba81
  Morris Jette authored Jul 03, 2012
  
  0496ba81
- BLUEGENE - Correct potential deadlock issue when hardware goes bad and · f0949d91
  Danny Auble authored Jul 03, 2012
```
there are jobs running on that hardware.
```
  f0949d91
- Tweaks to license agreement · 77e072e0
  Morris Jette authored Jul 03, 2012
  
  77e072e0
- Support change of DebugFlag=Switch on reconfig or manual reset · eb66713a
  Morris Jette authored Jul 03, 2012
  
  eb66713a
- Add DebugFlag of Switch to log switch plugin details · 94a7fd3f
  Morris Jette authored Jul 03, 2012
  
  94a7fd3f
- Correction to logic for port range in slurm.conf node configuration · 3b971219
  Morris Jette authored Jul 03, 2012
  
  3b971219
- Merge branch 'slurm-2.4' · a4a4d585
  Morris Jette authored Jul 03, 2012
```
Conflicts:
	META
	NEWS
```
  a4a4d585
- Add gres count value check (>0 && <NO_VAL, 0xfffffffe) · 88ad2c61
  Morris Jette authored Jul 03, 2012
  
  88ad2c61
- Clarify time limit handling in man page. · d37cab14
  Lipari, Don authored Jul 03, 2012
  
  d37cab14
- Fix typo in bluegene web page · 00b78dfa
  Tim Wickberg authored Jul 03, 2012
  
  00b78dfa
- Add support for advanced reservations at core resolution · f87e3a0a
  Alexjandro Lucero Palau authored Jul 02, 2012
```
Add support for advanced reservation for specific cores rather than whole
nodes. Current limiations: homogeneous cluster, nodes idle when reservation
created, and no more than one reservation per node. Code is still under
development. Work by Alejandro Lucero Palau, et. al, BSC.
```
  f87e3a0a
02 Jul, 2012 7 commits
- Fix formatting problems, remove trailing spaces and new-lines · 32e5a428
  Morris Jette authored Jul 02, 2012
  
  32e5a428
- Merge branch 'slurm-2.4' · b633be93
  Morris Jette authored Jul 02, 2012
  
  b633be93
- Update META for tag 2.4.1 · c8651870
  Danny Auble authored Jul 02, 2012
  
  c8651870
- fix to make 2.4.0 work to 2.4.1 state · 219aa3e8
  Danny Auble authored Jul 02, 2012
  
  219aa3e8
- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved · 3bc86988
  Carles Fenoy authored Jul 02, 2012
```
correctly when transitioning.  This also applies for 2.4.0 -> 2.4.1, no
state will be lost. (Thanks to Carles Fenoy)
```
  3bc86988
- Note maximum gres count is 4G · f35ad166
  Morris Jette authored Jul 02, 2012
  
  f35ad166
- Note maximum gres count supported · 9410e98e
  Morris Jette authored Jul 02, 2012
  
  9410e98e
29 Jun, 2012 6 commits
- Add Part_Nodes flag to reservation request · 0c677bc3
  Bill Brophy authored Jun 29, 2012
```
Add reservation flag of Part_Nodes to allocate all nodes in a partition to
a reservation and automatically change the reservation when nodes are
added to or removed from the reservation. Based upon work by
Bill Brophy, Bull.
```
  0c677bc3
- Fix formatting of NEWS web page · 539f95d8
  Morris Jette authored Jun 29, 2012
  
  539f95d8
- Merge branch 'slurm-2.4' · 8884882f
  Morris Jette authored Jun 29, 2012
```
Conflicts:
	META
	NEWS
```
  8884882f
- Document that gang scheduled jobs all must fit into memory · 8bad9a3c
  Morris Jette authored Jun 29, 2012
  
  8bad9a3c
- fix mpi formatting problem in slurm.conf man page · f259fca4
  Morris Jette authored Jun 28, 2012
  
  f259fca4
- Permit multiple Port specifications on one node config line of slurm.conf · 05bc48d8
  Morris Jette authored Jun 28, 2012
```
When running with multiple slurmd daemons per node, enable specifying a
range of ports on a single line of the node configuration in slurm.conf.
For example:
NodeName=tux[0-999] NodeAddr=localhost Port=9000-9999 ...
```
  05bc48d8
28 Jun, 2012 3 commits
- Fix to make perlapi work with new hack function · 176e9950
  Danny Auble authored Jun 28, 2012
  
  176e9950
- Changes for 2.4 tag · a9266161
  Danny Auble authored Jun 28, 2012
  
  a9266161
- Changes for 2.4 tag · 94ea2e84
  Danny Auble authored Jun 28, 2012
  
  94ea2e84