- 20 Dec, 2017 14 commits
-
-
Morris Jette authored
-
Morris Jette authored
Previously written to "admin_comment" in commit 540d0e5c bug 4529
-
Danny Auble authored
-
Danny Auble authored
This reverts commit 36a86e50. Turns out this needs more work than expected. hash_part_inx is used afterwards and HASH_FCN no longer uses/sets it. For future reference it looks like in the new code you would just call HASH_TO_BKT after calling HASH_FCN. I didn't want to spend time testing it, so just rolled back the code. As this is the only place in the entirity of Slurm we currently use it I am not super concerned. NOTE: the macro HASH_FCN (which is really HASH_JEN) is identical in both places outside the already explained differences.
-
Alejandro Sanchez authored
On a job [pack]allocation RPC request, if the allocation succeed but the send response message back to the client failed (i.e. srun was killed before it could receive the response), then modify the job_record pointer so that the job_state is set to FAILED, the exit_code as if the job got a SIGTERM signal and the state_reason to FAIL_LAUNCH. Then users when querying the job with sacct can discern that something bad happend for this scenario, instead of STATE being showed as COMPLETED and the ExitCode as 0:0. Bug 4513.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Felip Moll authored
Use FREE_NULL_BUFFER instead, otherwise we could attempt to free_buffer this a second time if we jump to the rwfail label. Bug 4484.
-
Felip Moll authored
When printing fields in sacct with user specified units (--units), the nnodes field showed an incorrect string. This commit reverts a65fa572 and avoids the unit conversion, which does not make sense outside the context of a Blue Gene systems (deprecated) anyways. Bug 4490.
-
Felip Moll authored
Slurm may generate empty manifest files depending on configuration and library availability. Disable the new empty manifest check to allow builds to proceed with rpm 4.13+ / Fedora 25+. Bug 4453.
-
Felip Moll authored
(Fixing tim@schedmd.com's mistake on prior commit.) Bug 4467.
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
When set, do not flush the Lustre cache through the alpsc_flush_lustre() call, and do not drop caches through /proc/sys/vm/drop_caches either. This avoids a potential source of SIGBUG errors for other jobs sharing the node. Bug 4309.
-
- 19 Dec, 2017 7 commits
-
-
Danny Auble authored
before printing anything for a connection.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
field. Bug 4529
-
Morris Jette authored
fails. The description of the failure will be in the job's "Reason" field. Bug 4529
-
Morris Jette authored
buffer error. Bug 4529
-
Alejandro Sanchez authored
Bug 4222.
-
- 18 Dec, 2017 10 commits
-
-
Morris Jette authored
Bug 4528
-
Brian Christiansen authored
on startup. Just use the checkpointed job_id_sequence. get_next_job_id() will not use a jobid if it's in use in the system. Bug 4538
-
Morris Jette authored
Move "burst_buffer" from individual command data structures (srun_opt and salloc_opt) to common slurm_options. This is desired to have sbatch command support --bb option. bug 4528
-
Morris Jette authored
There was some development code accidentally included in the commit
-
Morris Jette authored
-
Dominik Bartkiewicz authored
Don't change the state when the RPC from slurmctld to slurmd is only queued bug 4531
-
Morris Jette authored
-
Morris Jette authored
node_features/knl_generic - If plugin can not fully load then do not spawn a background pthread (which will fail with invalid memory reference).
-
Morris Jette authored
-
Dominik Bartkiewicz authored
Add "Force=1" to knl_generic.conf to override (for testing) bug 4487
-
- 16 Dec, 2017 2 commits
-
-
Tim Wickberg authored
-
Felip Moll authored
This patch fix commit b31bb7 for big-endian machines inverting the __builtin_clzll and __builtin_cltll calls depending on the architecture. This caused to get the incorrect first and last bit set on the bitmap in bit_ffs bit_fls. bug 4494
-
- 15 Dec, 2017 7 commits
-
-
Morris Jette authored
completely instead of right after the parent job finishes. Bug 4516
-
Morris Jette authored
-
Yair Yarom authored
bug 3582
-
Brian Christiansen authored
when a job requests no tasks and more memory than MaxMemPer{CPU|NODE}. e.g. sbatch --wrap="sleep 10" Bug 4515
-
Brian Christiansen authored
This will give expected results. Found while working on Bug 4515.
-
Danny Auble authored
Bug 4478 comment 25.
-
Danny Auble authored
And print an appropriate fatal error message rather than relying upon random errno value. Bug 4523
-