- 09 Aug, 2012 11 commits
-
-
Stephen Trofinoff authored
-
Stephen Trofinoff authored
I performed more testing today and for the most part things worked except for one glitch. In the case of batch jobs, the GRES req field was being lost. It was not being uploaded into the DB and it was missing from "scontrol show job ...." This didn't occur with interactive jobs. After looking at this, it appears that we enter select_nodes twice for the batch jobs and only once for interactive jobs. Consequently, we call _fill_in_gres_fields twice for the batch jobs. On the first entry, the job_ptr->node_cnt entry is 0. Thus we don't perform the computation and append the string to gres. Because we are now using the same gres string field, what this means is that we tokenize the gres string and then don't rebuild it--thus it becomes blank. One solution is simply to comment out the second clause of the if-statement where it checks whether the node_cnt > 0. This works because it doesn't matter if on the first pass there is something like "gpu:0" in the gres string (0 being due to an initial node_cnt of 0) because we just need the type names of this string. We use that to extract from the gres_list the actual value requested. Thus as long as we have at least rebuilt the string to contain the type names, on the subsequent entry into _fill_in_gres_fields, when we do have the correct node_cnt value, the correct string will be built. A better solution is to avoid the entire block of code where we do the tokenization of the gres string if node_cnt is 0. This way, we would not even tokenize the string on the first entry and avoid some double work. Being that we enter twice for batch jobs, I also placed a condition around the the gres_alloc building part of the function so that we only attempt to build that string when there isn't already a string there (again avoiding double duty).
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
Close the batch job's environment file when it contains no data to avoid leaking file descriptors.
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Martin Perry authored
-
- 08 Aug, 2012 16 commits
-
-
Morris Jette authored
This applies to switch windows, CAU, immediate blocks and RDMA
-
Morris Jette authored
-
Danny Auble authored
Conflicts: NEWS
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
remove it when running the job so it is left as an argument for the job
-
- 07 Aug, 2012 9 commits
-
-
Brian Gilmer authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
to operate the same way salloc or sbatch did and assign a task per cpu by default instead of task per node.
-
Danny Auble authored
correctly. Before the errno wasn't being checked correctly
-
- 06 Aug, 2012 3 commits
-
-
Danny Auble authored
to operate the same way salloc or sbatch did and assign a task per cpu by default instead of task per node.
-
Morris Jette authored
switch/nrt: add test for ability of job to be suspended Adds most of the infrastructure to the switch plugin for suspend APIs still need to flesh out the functions for switch/nrt and add support to new RPC in slurmd. switch/nrt: and un/pack and use of switch info for job suspend/resume Major re-write of switch plugin web page for new job preemption functions switch/nrt: add more logging for job suspend/resume switch/nrt: document long-term job suspension support on web page
-
Morris Jette authored
-
- 03 Aug, 2012 1 commit
-
-
Danny Auble authored
-