Commits · a03cbbb0c467c09a2e02adcf0faf1dc4f21c07e0 · Manuel G. Marciani / ces_slurm_simulator

14 Feb, 2014 2 commits
- Update srun.1 man page documenting the PMI2 support. · 2c83c36e
  David Bigagli authored Feb 14, 2014
  
  2c83c36e
- Fix issue where if using munge and munge wasn't running and a slurmd · ddc0b5c3
  Danny Auble authored Feb 13, 2014
```
needed to forward a message the slurmd would core dump.
```
  ddc0b5c3
13 Feb, 2014 2 commits
- Run job scheduler when node enters service · 0d388a80
  Morris Jette authored Feb 13, 2014
  
  0d388a80
- Correct the slurm.conf man pages and checkpoint_blcr.html page · b5a79c9f
  David Bigagli authored Feb 12, 2014
```
describing that jobs must be drained from cluster before deploying
any checkpoint plugin.
```
  b5a79c9f
12 Feb, 2014 2 commits
- Fix typos in srun_cr man page. · ba39e394
  David Bigagli authored Feb 12, 2014
  
  ba39e394
- enforce cpus-per-task with mem-per-cpu option · cf367bb0
  Morris Jette authored Feb 12, 2014
```
Properly enforce a job's cpus-per-task option when a job's allocation is
constrained on some nodes by the mem-per-cpu option.
bug 590
```
  cf367bb0
11 Feb, 2014 1 commit
- Note group logic change went into v14.03-pre6 · e37ed154
  Morris Jette authored Feb 11, 2014
  
  e37ed154
10 Feb, 2014 5 commits
- Update the NEWS file. · 34c37f07
  David Bigagli authored Feb 10, 2014
  
  34c37f07
- Add SchedulerParameters option partition_job_depth · d798ea47
  Morris Jette authored Feb 10, 2014
```
limit scheduling logic depth by partition.
```
  d798ea47
- Start NEWS for v14-03-pre7 · 18acf289
  Morris Jette authored Feb 10, 2014
  
  18acf289
- Correction to merge · 8e26fd28
  Morris Jette authored Feb 10, 2014
  
  8e26fd28
- Updates for start of v2.6.7 work · 2c9b35c3
  Morris Jette authored Feb 10, 2014
  
  2c9b35c3
09 Feb, 2014 1 commit
- CRAY - fix memory leak when using accelerators · ac64f883
  Moe Jette authored Feb 08, 2014
  
  ac64f883
08 Feb, 2014 2 commits
- replace old commit note erroneously taken out · 099f00ab
  Danny Auble authored Feb 07, 2014
  
  099f00ab
- CRAY - fix issue with using CR_ONE_TASK_PER_CORE · fbb37db1
  Danny Auble authored Feb 07, 2014
  
  fbb37db1
07 Feb, 2014 2 commits
- Properly enforce GrpSubmit limit for job arrays. · 9469053d
  Morris Jette authored Feb 07, 2014
```
bug 586
```
  9469053d
- Set SLURM_JOB_PARTITION env var for Prolog · 3cc5f9e1
  Morris Jette authored Feb 06, 2014
```
Partial response to bug 521
```
  3cc5f9e1
06 Feb, 2014 3 commits
- Change set env SLURM_PARTITION to SLURM_JOB_PARTITION · b3b9bf17
  Morris Jette authored Feb 06, 2014
```
No change in logic, just change name of recently added env var
```
  b3b9bf17
- Set SLURM_PARTITION env var for all job types · a9e237e2
  Morris Jette authored Feb 06, 2014
```
Set the environment variable SLURM_PARTITION to the partition in
which a job is running. Set for salloc, sbatch and srun.
```
  a9e237e2
- Not about Cray fix · c05f38b3
  Danny Auble authored Feb 06, 2014
  
  c05f38b3
05 Feb, 2014 5 commits
- Update NEWS file and remove white spaces. · f28e1f3f
  David Bigagli authored Feb 05, 2014
  
  f28e1f3f
- Fix the bug where --cpu_bind=map_cpu is interpreted as mask_cpu. · bc4e6b85
  Martin Perry authored Feb 05, 2014
  
  bc4e6b85
- take back 2.6.6 · 9f97c2e9
  Danny Auble authored Feb 05, 2014
  
  9f97c2e9
- Added support for selecting AMD GP · f728ee8e
  Dominik Bartkiewicz authored Feb 05, 2014
```
Set GPU_DEVICE_ORDINAL environment variable.
```
  f728ee8e
- new news file · c59fa258
  Danny Auble authored Feb 04, 2014
  
  c59fa258
04 Feb, 2014 4 commits
- Fix to reserving all nodes in partition · c4c462a2
  Morris Jette authored Feb 04, 2014
```
Previous logic would try to pick a specific node count and on a
heterogeneous system, this would cause a problem. This change
largely reverts commit a270417b
```
  c4c462a2
- Modify the srun --slurmd-debug option to accept debug string tags · f1d7f295
  David Bigagli authored Feb 04, 2014
```
beside the numerical values.
```
  f1d7f295
- Retry task exit message from slurmstepd to srun on message timeout. · 2ccda7f2
  Danny Auble authored Feb 04, 2014
  
  2ccda7f2
- Enable gang scheduling with core specialization · 1ba4f07c
  Morris Jette authored Feb 04, 2014
```
Added whole_node field to job_resources structure
Enable gang scheduling for jobs with core specialization and other
jobs allocated whole nodes.
```
  1ba4f07c
03 Feb, 2014 1 commit
- Update documentation about QOS limits · f9cfa21a
  Danny Auble authored Feb 03, 2014
  
  f9cfa21a
31 Jan, 2014 3 commits

Removed obsolete slurm_terminate_job() API. · 31d409b7
David Bigagli authored Jan 31, 2014

31d409b7

Make sure node limits get assessed if no node count was given in request. · 5b0f9c39

Danny Auble authored Jan 31, 2014

i.e. salloc -n32 doesn't request the number of nodes and with the previous
code if this request used 4 nodes and only 1 was left in GrpNodes it
would just run with no issue since we were checking things before we
selected how many nodes it ran on.

Now we check this afterwards so we always check the limits on how many
nodes, cpus and how much memory is to be used.

5b0f9c39

Fix step allocation failure due to memory use · 8b76b93c

Morris Jette authored Jan 31, 2014

Fix step allocation when some CPUs are not available due to memory limits.
This happens when one step is active and using memory that blocks the
scheduling of another step on a portion of the CPUs needed. The new step
is now delayed rather than aborting with "Requested node configuration is
not available".
bug 577

8b76b93c

29 Jan, 2014 1 commit
- Fix the format of SLURM_STEP_RESV_PORTS. It was generated · c3c49196
  David Bigagli authored Jan 28, 2014
```
incorrectly when using the hostlist_push_host function and
input surrounded by [].
```
  c3c49196
28 Jan, 2014 1 commit
- BLUEGENE - If IONodesPerMP changes in bluegene.conf recalculate bitmaps · ee3844aa
  Danny Auble authored Jan 28, 2014
```
based on ionode count correctly on slurmctld restart.
```
  ee3844aa
25 Jan, 2014 2 commits

Fix a couple of typos in NEWS · d1081a6d
jette authored Jan 25, 2014

d1081a6d

Split job "shared" field · f21d21d6

Morris Jette authored Jan 24, 2014

Split a slurmctld's job record "shared" field into "share_res"
(share resource) and "whole_node" fields. Needed to better manage
allocation of whole nodes for core specialization without disabling
gang scheduling of such jobs.

f21d21d6

23 Jan, 2014 3 commits
- Allow scontrol suspend/resume to accept jobid in the format jobid_taskid · e220f1e0
  David Bigagli authored Jan 23, 2014
```
to suspend/resume array elements.
```
  e220f1e0
- Revert "Try to start mysql before starting slurmdbd." · 2edc0da8
  Danny Auble authored Jan 23, 2014
```
This reverts commit 34fd501c.
```
  2edc0da8
- MYSQL - If starting the plugin and the database isn't up attempt to · 9ef64da7
  Danny Auble authored Jan 23, 2014
```
connect in a loop instead of producing a fatal.
```
  9ef64da7