- 18 Mar, 2012 2 commits
-
-
Mark A. Grondona authored
In task_cgroup_memory_fini() the implementation attempts to move the existing slurmstepd task to the root memory cgroup by writing the result of getpid(2) to the root memory's 'task' file. This does not work, however, because slurmstepd is multi-threaded and thus only the main thread is moved. This patch replaces the explicit write to 'tasks' with a call to the new xcgroup_move_process() call, which handles moving all threads in the process.
-
Mark A. Grondona authored
This patch adds a helper function to common/xcgroup.c to aid in moving processes between cgroups. If the cgroups.procs file is writable then writing the PID to that file is used, as this method moves all threads in a process atomically. If cgroups.procs is not writable, then each thread must be moved individually by walking the /proc/PID/task/ directory and writing each taskid individually to the 'tasks' file in the cgroup. The second method is racy if a process is concurrently creating threads, but it is better than the current method of just moving one of the process's threads.
-
- 16 Mar, 2012 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
It looks like changes were made to the man pages. However, --switch is still used for the info, usage, and help strings. The attached patch fixes those. Rod Schultz, Bull
-
Danny Auble authored
-
Danny Auble authored
already pinged it on startup the unresponding flag would be removed from the frontend node.
-
Danny Auble authored
-
Danny Auble authored
mark front end node down.
-
Danny Auble authored
-
- 14 Mar, 2012 2 commits
-
-
Morris Jette authored
Cray - For srun wrapper when creating a job allocation, set the default job name to the executable file's name. Ignore leading directory names in the path.
-
Morris Jette authored
This patch contains the bits of bad_dbtime.diff from CSCS which have not already been committed
-
- 13 Mar, 2012 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
permit the srun and salloc commands to be executed in the background on Cray systems
-
Morris Jette authored
Add new job state reason of "FrontEndDown" which applies only to Cray and IBM BlueGene systems.
-
Danny Auble authored
-
- 12 Mar, 2012 1 commit
-
-
Danny Auble authored
the queue when trying to place a larger than midplane job.
-
- 02 Mar, 2012 2 commits
-
-
Morris Jette authored
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet" option is used.
-
Morris Jette authored
Here's what seems to have happened: - A job was pending, waiting for resources. - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done. - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details. - scontrol reconfigure was done again. - read_slurm_conf called _restore_job_dependencies. - _restore_job_dependencies called build_feature_list for each job in the job list - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault. The problem was reported by a customer on Slurm 2.2.7. I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies. If the job state is JOB_FAILED, the job is skipped. Regards, Martin This is an alternative solutionh to bug316980fix.patch
-
- 29 Feb, 2012 1 commit
-
-
Morris Jette authored
-
- 28 Feb, 2012 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Rémi Palancher authored
Added default_time field in partition records in Lua job submit plugin.
-
Rémi Palancher authored
Added a new Lua library name to try loading with dlopen() in Lua based plugins.
-
- 27 Feb, 2012 1 commit
-
-
Morris Jette authored
Only report "gres/<name> lacks File parameter" if some nodes define File AND this node does not AND (new part here) the GRES count on this node is non-zero
-
- 24 Feb, 2012 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fixes for jobs with long argument lists
-
- 23 Feb, 2012 1 commit
-
-
Danny Auble authored
-
- 22 Feb, 2012 2 commits
-
-
Pär Andersson authored
Replace two xstrcat() calls per argument with a single xmalloc() call. This significantly speeds up handling of REQUEST_JOB_INFO RPCs when some jobs have long argument lists.
-
Pär Andersson authored
-