1. 18 Mar, 2012 3 commits
    • Mark A. Grondona's avatar
      task/cgroup: delete job step memcg instead of using force_empty · a93afcd1
      Mark A. Grondona authored
      The current task/cgroup memory code writes to force_empty at job step
      completion and then waits for the release agent to be triggered to
      remove the memcg. However, force_empty only causes clean cache pages
      to be dropped from the memcg and does not actually move charges to
      the parent [1].
      
      This has two unfortunate side-effects. First, pages that can't be
      dropped by force_empty are in-use and could stay that way indefinitely
      (e.g. system library that is in-use until just after force_empty
      completes). Thus, the step memcg never becomes 'empty' and the release
      agent is not activated. Second, cached pages that can be freed are
      likely associated with the job itself, and those files and libraries
      will have to be paged in again for subsequent job steps.
      
      In contrast, calling rmdir(2) on a memcg with no active tasks
      causes *all* current charges to move to parent, which is really what
      we want in this case. This allows cached libraries and binaries to
      stay resident and be associated with the job, and also ensures that
      the step memcg is removed immediately as the job step ends.
      
      Thus, this patch replaces the write to force_empty with a call
      to xcgroup_delete() on the step memcg, which in turn removes
      the memcg with rmdir(2).
      
      The functionality of this patch depends on the previous fix that
      uses xcgroup_move_process() to move slurmstepd to the root memcg.
      Otherwise, there will be leftover slurmstepd threads in the job
      step memcg, and the rmdir will fail with EBUSY.
      
       [1] Sec 4.3: http://www.kernel.org/doc/Documentation/cgroups/memory.txt
      a93afcd1
    • Mark A. Grondona's avatar
      task/cgroup: use xcgroup_move_process to move slurmstepd to root memcg · 2dd13506
      Mark A. Grondona authored
      In task_cgroup_memory_fini() the implementation attempts to move
      the existing slurmstepd task to the root memory cgroup by writing
      the result of getpid(2) to the root memory's 'task' file. This
      does not work, however, because slurmstepd is multi-threaded and
      thus only the main thread is moved.
      
      This patch replaces the explicit write to 'tasks' with a call to
      the new xcgroup_move_process() call, which handles moving all
      threads in the process.
      2dd13506
    • Mark A. Grondona's avatar
      xcgroup: add xcgroup_move_process helper function · aa912e4a
      Mark A. Grondona authored
      This patch adds a helper function to common/xcgroup.c to aid
      in moving processes between cgroups. If the cgroups.procs file
      is writable then writing the PID to that file is used, as this
      method moves all threads in a process atomically.
      
      If cgroups.procs is not writable, then each thread must be moved
      individually by walking the /proc/PID/task/ directory and writing
      each taskid individually to the 'tasks' file in the cgroup. The
      second method is racy if a process is concurrently creating
      threads, but it is better than the current method of just moving
      one of the process's threads.
      aa912e4a
  2. 16 Mar, 2012 9 commits
  3. 14 Mar, 2012 2 commits
    • Morris Jette's avatar
      Set Cray srun default job name · 0b24e690
      Morris Jette authored
      Cray - For srun wrapper when creating a job allocation, set the default job
      name to the executable file's name. Ignore leading directory names in the path.
      0b24e690
    • Morris Jette's avatar
      Change read lock to write lock · d53b7c26
      Morris Jette authored
      This patch contains the bits of bad_dbtime.diff from CSCS which have
      not already been committed
      d53b7c26
  4. 13 Mar, 2012 5 commits
  5. 12 Mar, 2012 1 commit
  6. 02 Mar, 2012 2 commits
    • Morris Jette's avatar
      cray/srun wrapper, don't use aprun -q by default · ea9adc17
      Morris Jette authored
      In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
      option is used.
      ea9adc17
    • Morris Jette's avatar
      Fix for possible SEGV · ed56303c
      Morris Jette authored
      Here's what seems to have happened:
      
      - A job was pending, waiting for resources.
      - slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
      - As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
      - scontrol reconfigure was done again.
      - read_slurm_conf called _restore_job_dependencies.
      - _restore_job_dependencies called build_feature_list for each job in the job list
      - When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.
      
      The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.
      
      Regards,
      Martin
      
      This is an alternative solutionh to bug316980fix.patch
      ed56303c
  7. 29 Feb, 2012 1 commit
  8. 28 Feb, 2012 5 commits
  9. 27 Feb, 2012 1 commit
    • Morris Jette's avatar
      Reduce gres error logging · 670be35a
      Morris Jette authored
      Only report "gres/<name> lacks File parameter" if some nodes define
      File AND this node does not AND (new part here) the GRES count on
      this node is non-zero
      670be35a
  10. 24 Feb, 2012 9 commits
  11. 23 Feb, 2012 1 commit
  12. 22 Feb, 2012 1 commit