- 23 Mar, 2012 1 commit
-
-
Morris Jette authored
-
- 22 Mar, 2012 11 commits
-
-
Morris Jette authored
-
Morris Jette authored
Mistakenly changed jobacct_gather_g_getinfo() into jobacct_gather_g_setinfo()
-
Morris Jette authored
This avoids conflicts with the "info" function.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This fixes a race condition in error handling logic added a couple of days ago for slurmd/slurmstepd communications in commit https://github.com/SchedMD/slurm/commit/ed31e6c7fdb5bcc1b0f0a8e3cbf5327604e64887
-
Morris Jette authored
-
Matthieu Hautreux authored
Access to secured FS often requires to have a valid token in the user context. With SLURM, this token can be obtained using one of the possible pluggable architecture, SPANK or PAM. IO setup of SLURM can require to access secured FS (stdout/stderr files). This patch ensures that pluggable frameworks are activated and called prior to IO setup and that IO are terminated before calling pluggable framework exit calls.
-
Matthieu Hautreux authored
set PR_DUMPABLE as soon as possible, especially before any plugins are loaded. This will allow someone debugging to get a coredump.
-
Matthieu Hautreux authored
To prepare io_setup integration in _fork_all_tasks, error handling must be transformed to not always return SLURM_ERROR but be prepared to return SLURM_SUCCESS in case of an io_setup error.
-
- 21 Mar, 2012 14 commits
-
-
Morris Jette authored
Change the owner of slurmctld and slurmdbd log files to the appropriate user. Without this change the files will be created by and owned by the user starting the daemons (likely user root).
-
Morris Jette authored
-
Morris Jette authored
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark node that is DOWN in ALPS as DOWN in SLURM).
-
Morris Jette authored
-
Morris Jette authored
in the tightly coupled functions slurmd:stepd_completion and slurmstepd:_handle_completion, a jobacct structure is send from the main daemon to the step daemon to provide the statistics of the children slurmstepd and do the aggregation. The methodology used to send the structure is the use of jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE). As {setinfo,getinfo} use a common internal lock and reading or writing to a pipe is equivalent to holding a lock, slurmd and slurmstepd have to avoid using both setinfo and getinfo over a pipe or deadlock situations can occured. For example : slurmd(lockforread,write)/slurmstepd(write,lockforread). This patch remove the call to jobacct_gather_g_setinfo in slurmd and the call to jobacct_gather_g_getinfo in slurmstepd ensuring that slurmd only do getinfo operations over a pipe and slurmstepd only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack} are used to marshall/unmarshall the data for transmission over the pipe. Patch by Matthieu Hautreux, CEA. The patch committed here is a variation on the work by Matthieu. Specifically, the logic is added to slurmstepd to read a new format of RPC including an RPC version number and buffer with the data structure. The slurmd however will not send the RPC in the new format until SLURM version 2.5.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Replace some " \t" with just "\t" (that's a tab)
-
- 20 Mar, 2012 7 commits
-
-
Morris Jette authored
Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull.
-
Morris Jette authored
-
Morris Jette authored
Added PriorityFlags configuration parameter
-
Morris Jette authored
task/cgroup: minor job step memcg fixes
-
Morris Jette authored
Improve task binding logic by making fuller use of HWLOC library, especially with respect to Opteron 6000 series processors. Work contributed by Komoto Masahiro.
-
Carles Fenoy authored
-
Carles Fenoy authored
-
- 19 Mar, 2012 1 commit
-
-
Morris Jette authored
-
- 18 Mar, 2012 3 commits
-
-
Mark A. Grondona authored
The current task/cgroup memory code writes to force_empty at job step completion and then waits for the release agent to be triggered to remove the memcg. However, force_empty only causes clean cache pages to be dropped from the memcg and does not actually move charges to the parent [1]. This has two unfortunate side-effects. First, pages that can't be dropped by force_empty are in-use and could stay that way indefinitely (e.g. system library that is in-use until just after force_empty completes). Thus, the step memcg never becomes 'empty' and the release agent is not activated. Second, cached pages that can be freed are likely associated with the job itself, and those files and libraries will have to be paged in again for subsequent job steps. In contrast, calling rmdir(2) on a memcg with no active tasks causes *all* current charges to move to parent, which is really what we want in this case. This allows cached libraries and binaries to stay resident and be associated with the job, and also ensures that the step memcg is removed immediately as the job step ends. Thus, this patch replaces the write to force_empty with a call to xcgroup_delete() on the step memcg, which in turn removes the memcg with rmdir(2). The functionality of this patch depends on the previous fix that uses xcgroup_move_process() to move slurmstepd to the root memcg. Otherwise, there will be leftover slurmstepd threads in the job step memcg, and the rmdir will fail with EBUSY. [1] Sec 4.3: http://www.kernel.org/doc/Documentation/cgroups/memory.txt
-
Mark A. Grondona authored
In task_cgroup_memory_fini() the implementation attempts to move the existing slurmstepd task to the root memory cgroup by writing the result of getpid(2) to the root memory's 'task' file. This does not work, however, because slurmstepd is multi-threaded and thus only the main thread is moved. This patch replaces the explicit write to 'tasks' with a call to the new xcgroup_move_process() call, which handles moving all threads in the process.
-
Mark A. Grondona authored
This patch adds a helper function to common/xcgroup.c to aid in moving processes between cgroups. If the cgroups.procs file is writable then writing the PID to that file is used, as this method moves all threads in a process atomically. If cgroups.procs is not writable, then each thread must be moved individually by walking the /proc/PID/task/ directory and writing each taskid individually to the 'tasks' file in the cgroup. The second method is racy if a process is concurrently creating threads, but it is better than the current method of just moving one of the process's threads.
-
- 16 Mar, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Conflicts: NEWS
-
Morris Jette authored
-