- 09 Jul, 2015 1 commit
-
-
Morris Jette authored
Changed spaces to tabs at start of lines. Minor changes to some formatting. Added the new files to the RPM (slurm.spec file). Prevent memory leak of "l_name" variable if which_power_layout() function is called more than once. Initialize "cpufreq" variable in powercap_get_cpufreq() function. Array "tmp_max_watts_dvfs" could be NULL and used if "max_watts_dvfs" variable is NULL in powercap_get_node_bitmap_maxwatts_dvfs() Variable "tmp_pcap_cpu_freq" could be used with uninitialized value in function _get_req_features() Variable "tmp_max_watts" could be used with uninitialized value in function _get_req_features() Array "tmp_max_watts_dvfs" could be used with uninitialized value in function _get_req_features() Array "allowed_freqs" could be NULL and used if "node_record_count" variable is zero in powercap_get_job_nodes_numfreq() Overwriting a memory buffer header (especially with different data types) is just asking for something bad to happen. This code from function powercap_get_job_nodes_numfreq(): allowed_freqs = xmalloc(sizeof(int)*((int)num_freq+2)); allowed_freqs[-1] = (int) num_freq; Clean up memory on slurmctld shutdown
-
- 08 Jul, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
-
- 07 Jul, 2015 6 commits
-
-
Trey Dockendorf authored
-
Trey Dockendorf authored
Add job record qos field and partition record allow_qos field.
-
Trey Dockendorf authored
-
Trey Dockendorf authored
This patch moves the QOS update of an existing job to be before the partition update. This ensures a new QOS value is the value used when doing validations against things like a partition's AllowQOS and DenyQOS. Currently if a two partitions have AllowQOS that do not share any QOS, the order of updates prevents a job from being moved from one partition to another using something like the following: scontrol update job=<jobID> partition=<new part> qos=<new qos>
-
David Bigagli authored
-
Morris Jette authored
Correct task layout with CR_Pack_Node option and more than 1 CPU per task. Previous logic would place one task per CPU launch too few tasks. bug 1781
-
- 06 Jul, 2015 2 commits
-
-
Morris Jette authored
Backfill scheduler now considers OverTimeLimit and KillWait configuration parameters to estimate when running jobs will exit. Initially the job's end time is estimated based upon it's time limit. After the time limit is reached, the end time estimate is based upon the OverTimeLimit and KillWait configuration parameters. bug 1774
-
Morris Jette authored
Backfill scheduler: The configured backfill_interval value (default 30 seconds) is now interpretted as a maximum run time for the backfill scheduler. Once reached, the scheduler will build a new job queue and start over, even if not all jobs have been tested. bub 1774
-
- 02 Jul, 2015 2 commits
-
-
Morris Jette authored
Original patch from LLNL assumed Munge was installed, which would result in a build error if the Munge development package was not installed
-
Morris Jette authored
Add association usage information to "scontrol show cache" command output.
-
- 01 Jul, 2015 2 commits
-
-
Brian Christiansen authored
When submitting a job with srun -n# the job may be allocated more than # because the job was given the whole core or socket (eg. CR_CORE, CR_SOCKET). sacct showed only what the step used and not the allocation. This commit shows the job and the step if job and step cpus are different.
-
Morris Jette authored
Major re-write of the sreport command to support --tres job option and permit users to select specific tracable resources to generate reports for. For most reports, each TRES is listed on a separate line of output with its name. The default TRES type is "cpu" to minimize changes to output.
-
- 30 Jun, 2015 2 commits
-
-
Thomas Cadeau authored
Bug 1745
-
Brian Christiansen authored
This reverts commit 3f91f4b2.
-
- 29 Jun, 2015 2 commits
-
-
Nathan Yee authored
Bug 1745
-
David Bigagli authored
-
- 26 Jun, 2015 2 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
Bug 1746
-
- 25 Jun, 2015 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 24 Jun, 2015 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
- 23 Jun, 2015 1 commit
-
-
David Bigagli authored
-
- 22 Jun, 2015 3 commits
-
-
Morris Jette authored
Updates of existing bluegene advanced reservations did not work at all. Some multi-core configurations resulting in an abort due to creating core_bitmaps for the reservation that only had one bit per node rather than one bit per core. These bugs were introduced in commit 5f258072
-
David Bigagli authored
-
David Bigagli authored
-
- 19 Jun, 2015 1 commit
-
-
David Bigagli authored
-
- 15 Jun, 2015 1 commit
-
-
Morris Jette authored
Logic was assuming the reservation had a node bitmap which was being used to check for overlapping jobs. If there is no node bitmap (e.g. a licenses only reservation), an abort would result.
-
- 12 Jun, 2015 3 commits
-
-
Brian Christiansen authored
Bug 1739
-
Brian Christiansen authored
Bug 1743
-
Brian Christiansen authored
Bug 1743
-
- 11 Jun, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1733
-
- 10 Jun, 2015 1 commit
-
-
Morris Jette authored
-
- 09 Jun, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
1. I submit a first job that uses 1 GPU: $ srun --gres gpu:1 --pty bash $ echo $CUDA_VISIBLE_DEVICES 0 2. while the first one is still running, a 2-GPU job asking for 1 task per node waits (and I don't really understand why): $ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash srun: job 2390816 queued and waiting for resources 3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually gets GPUs allocated from two different sockets! $ srun -n 1 --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash $ echo $CUDA_VISIBLE_DEVICES 1,2 With this change #2 works the same way as #3. bug 1725
-
Brian Christiansen authored
Bug 1572
-