- 03 Apr, 2012 1 commit
-
-
Morris Jette authored
Add support for new SchedulerParameters of max_depend_depth defining the maximum number of jobs to test for circular dependencies (i.e. job A waits for job B to start and job B waits for job A to start). Default value is 10 jobs.
-
- 02 Apr, 2012 1 commit
-
-
Morris Jette authored
-
- 30 Mar, 2012 1 commit
-
-
Danny Auble authored
-
- 29 Mar, 2012 2 commits
-
-
Morris Jette authored
The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below). Quoting Martin.Perry@Bull.com: > Certain combinations of topology configuration and srun -N option produce > spurious job rejection with "Requested node configuration is not > available" with select/cons_res. The following example illustrates the > problem. > > [sulu] (slurm) etc> cat slurm.conf > ... > TopologyPlugin=topology/tree > SelectType=select/cons_res > SelectTypeParameters=CR_Core > ... > > [sulu] (slurm) etc> cat topology.conf > SwitchName=s1 Nodes=xna[13-26] > SwitchName=s2 Nodes=xna[41-45] > SwitchName=s3 Switches=s[1-2] > > [sulu] (slurm) etc> sinfo > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > ... > jkob up infinite 4 idle xna[14,19-20,41] > ... > > [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname > srun: Force Terminated job 79 > srun: error: Unable to allocate resources: Requested node configuration is > not available > > The problem does not occur with select/linear, or topology/none, or if -N > is omitted, or for certain other values for -N (for example, -N 4-4 and -N > 2-3 work ok). The problem seems to be in function _eval_nodes_topo in > src/plugins/select/cons_res/job_test.c. The srun man page states that when > -N is used, "the job will be allocated as many nodes as possible within > the range specified and without delaying the initiation of the job." > Consistent with this description, the requested number of nodes in the > above example is 4 (req_nodes=4). However, the code that selects the > best-fit topology switches appears to make the selection based on the > minimum required number of nodes (min_nodes=2). It therefore selects > switch s1. s1 has only 3 nodes from partition jkob. Since this is fewer > than req_nodes the job is rejected with the "node configuration" error. > > I'm not sure where the code is going wrong. It could be in the > calculation of the number of needed nodes in function _enough_nodes. Or > it could be in the code that initializes/updates req_nodes or rem_nodes. I > don't feel confident that I understand the logic well enough to propose a > fix without introducing a regression. > > Regards, > Martin
-
Morris Jette authored
-
- 27 Mar, 2012 2 commits
-
-
Morris Jette authored
When the optional max_time is not specified for --switches=count, the site max (SchedulerParameters=max_switch_wait=seconds) is used for the job. Based on patch from Rod Schultz.
-
Morris Jette authored
Patch by Bill Brophy, Bull.
-
- 26 Mar, 2012 1 commit
-
-
Morris Jette authored
Patch by Don Lipari, LLNL. https://github.com/chaos/slurm/commit/4de11bf0a8cd18207a60e7d3e1fa7a6fde0da431
-
- 21 Mar, 2012 4 commits
-
-
Morris Jette authored
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark node that is DOWN in ALPS as DOWN in SLURM).
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 20 Mar, 2012 2 commits
-
-
Morris Jette authored
Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull.
-
Morris Jette authored
task/cgroup: minor job step memcg fixes
-
- 18 Mar, 2012 3 commits
-
-
Mark A. Grondona authored
The current task/cgroup memory code writes to force_empty at job step completion and then waits for the release agent to be triggered to remove the memcg. However, force_empty only causes clean cache pages to be dropped from the memcg and does not actually move charges to the parent [1]. This has two unfortunate side-effects. First, pages that can't be dropped by force_empty are in-use and could stay that way indefinitely (e.g. system library that is in-use until just after force_empty completes). Thus, the step memcg never becomes 'empty' and the release agent is not activated. Second, cached pages that can be freed are likely associated with the job itself, and those files and libraries will have to be paged in again for subsequent job steps. In contrast, calling rmdir(2) on a memcg with no active tasks causes *all* current charges to move to parent, which is really what we want in this case. This allows cached libraries and binaries to stay resident and be associated with the job, and also ensures that the step memcg is removed immediately as the job step ends. Thus, this patch replaces the write to force_empty with a call to xcgroup_delete() on the step memcg, which in turn removes the memcg with rmdir(2). The functionality of this patch depends on the previous fix that uses xcgroup_move_process() to move slurmstepd to the root memcg. Otherwise, there will be leftover slurmstepd threads in the job step memcg, and the rmdir will fail with EBUSY. [1] Sec 4.3: http://www.kernel.org/doc/Documentation/cgroups/memory.txt
-
Mark A. Grondona authored
In task_cgroup_memory_fini() the implementation attempts to move the existing slurmstepd task to the root memory cgroup by writing the result of getpid(2) to the root memory's 'task' file. This does not work, however, because slurmstepd is multi-threaded and thus only the main thread is moved. This patch replaces the explicit write to 'tasks' with a call to the new xcgroup_move_process() call, which handles moving all threads in the process.
-
Mark A. Grondona authored
This patch adds a helper function to common/xcgroup.c to aid in moving processes between cgroups. If the cgroups.procs file is writable then writing the PID to that file is used, as this method moves all threads in a process atomically. If cgroups.procs is not writable, then each thread must be moved individually by walking the /proc/PID/task/ directory and writing each taskid individually to the 'tasks' file in the cgroup. The second method is racy if a process is concurrently creating threads, but it is better than the current method of just moving one of the process's threads.
-
- 16 Mar, 2012 9 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
It looks like changes were made to the man pages. However, --switch is still used for the info, usage, and help strings. The attached patch fixes those. Rod Schultz, Bull
-
Danny Auble authored
-
Danny Auble authored
already pinged it on startup the unresponding flag would be removed from the frontend node.
-
Danny Auble authored
-
Danny Auble authored
mark front end node down.
-
Danny Auble authored
-
- 14 Mar, 2012 2 commits
-
-
Morris Jette authored
Cray - For srun wrapper when creating a job allocation, set the default job name to the executable file's name. Ignore leading directory names in the path.
-
Morris Jette authored
This patch contains the bits of bad_dbtime.diff from CSCS which have not already been committed
-
- 13 Mar, 2012 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
permit the srun and salloc commands to be executed in the background on Cray systems
-
Morris Jette authored
Add new job state reason of "FrontEndDown" which applies only to Cray and IBM BlueGene systems.
-
Danny Auble authored
-
- 12 Mar, 2012 1 commit
-
-
Danny Auble authored
the queue when trying to place a larger than midplane job.
-
- 02 Mar, 2012 2 commits
-
-
Morris Jette authored
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet" option is used.
-
Morris Jette authored
Here's what seems to have happened: - A job was pending, waiting for resources. - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done. - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details. - scontrol reconfigure was done again. - read_slurm_conf called _restore_job_dependencies. - _restore_job_dependencies called build_feature_list for each job in the job list - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault. The problem was reported by a customer on Slurm 2.2.7. I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies. If the job state is JOB_FAILED, the job is skipped. Regards, Martin This is an alternative solutionh to bug316980fix.patch
-
- 29 Feb, 2012 1 commit
-
-
Morris Jette authored
-
- 28 Feb, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-