- 16 Mar, 2012 8 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
It looks like changes were made to the man pages. However, --switch is still used for the info, usage, and help strings. The attached patch fixes those. Rod Schultz, Bull
-
Danny Auble authored
-
Danny Auble authored
already pinged it on startup the unresponding flag would be removed from the frontend node.
-
Danny Auble authored
-
Danny Auble authored
mark front end node down.
-
Danny Auble authored
-
- 14 Mar, 2012 2 commits
-
-
Morris Jette authored
Cray - For srun wrapper when creating a job allocation, set the default job name to the executable file's name. Ignore leading directory names in the path.
-
Morris Jette authored
This patch contains the bits of bad_dbtime.diff from CSCS which have not already been committed
-
- 13 Mar, 2012 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
permit the srun and salloc commands to be executed in the background on Cray systems
-
Morris Jette authored
Add new job state reason of "FrontEndDown" which applies only to Cray and IBM BlueGene systems.
-
Danny Auble authored
-
- 12 Mar, 2012 1 commit
-
-
Danny Auble authored
the queue when trying to place a larger than midplane job.
-
- 02 Mar, 2012 2 commits
-
-
Morris Jette authored
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet" option is used.
-
Morris Jette authored
Here's what seems to have happened: - A job was pending, waiting for resources. - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done. - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details. - scontrol reconfigure was done again. - read_slurm_conf called _restore_job_dependencies. - _restore_job_dependencies called build_feature_list for each job in the job list - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault. The problem was reported by a customer on Slurm 2.2.7. I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies. If the job state is JOB_FAILED, the job is skipped. Regards, Martin This is an alternative solutionh to bug316980fix.patch
-
- 29 Feb, 2012 1 commit
-
-
Morris Jette authored
-
- 28 Feb, 2012 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Rémi Palancher authored
Added default_time field in partition records in Lua job submit plugin.
-
Rémi Palancher authored
Added a new Lua library name to try loading with dlopen() in Lua based plugins.
-
- 27 Feb, 2012 1 commit
-
-
Morris Jette authored
Only report "gres/<name> lacks File parameter" if some nodes define File AND this node does not AND (new part here) the GRES count on this node is non-zero
-
- 24 Feb, 2012 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fixes for jobs with long argument lists
-
- 23 Feb, 2012 1 commit
-
-
Danny Auble authored
-
- 22 Feb, 2012 4 commits
-
-
Pär Andersson authored
Replace two xstrcat() calls per argument with a single xmalloc() call. This significantly speeds up handling of REQUEST_JOB_INFO RPCs when some jobs have long argument lists.
-
Pär Andersson authored
-
Pär Andersson authored
Change argc from uint16_t to uint32_t in slurmctld and slurmstepd. Rest of the code already use uint32_t for argc.
-
Pär Andersson authored
-
- 21 Feb, 2012 1 commit
-
-
jette authored
Fixes a bunch of warnings of this type warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body
-