1. 03 May, 2012 1 commit
    • Matthieu Hautreux's avatar
      Fix segv in slurmctld for job step with relative option · 9bb178c3
      Matthieu Hautreux authored
      Here is the way to reproduce it :
      [root@cuzco27 georgioy]# salloc -n64 -N4 --exclusive
      salloc: Granted job allocation 8
      [root@cuzco27 georgioy]#srun -r 0 -n 30 -N 2 sleep 300&
      [root@cuzco27 georgioy]#srun -r 1 -n 40 -N 3 sleep 300&
      [root@cuzco27 georgioy]# srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
      srun: error: Unable to create job step: Zero Bytes were transmitted or received
      9bb178c3
  2. 27 Apr, 2012 1 commit
  3. 26 Apr, 2012 1 commit
  4. 25 Apr, 2012 1 commit
    • Don Albert's avatar
      Append "*" to default partition name with format and no size · 77645508
      Don Albert authored
      Show this HTML in a new window?
      There is a minor problem with the display of partition names in
      "sinfo".  Without options, the partition name field displays a
      asterisk "*" at the end of the name of the Default partition.  If you
      specify a formatting option which contains the %P field specifier with
      a width option (e.g., sinfo -o %8P) the asterisk also is appended to
      the default partition name.  With no width option, the "%P" displays
      the name based on the full length of the name string, however, no "*"
      is appended on the default partition name.
      
      The attached patch for version 2.4.0-pre4 corrects the problem so that
      the "*" is correctly appended when %P with no width specifier is
      used. The patch will also apply to version 2.3.4.
      
        -Don Albert-
      77645508
  5. 24 Apr, 2012 1 commit
  6. 23 Apr, 2012 2 commits
  7. 20 Apr, 2012 1 commit
  8. 17 Apr, 2012 1 commit
  9. 12 Apr, 2012 3 commits
  10. 10 Apr, 2012 6 commits
  11. 05 Apr, 2012 1 commit
    • Don Lipari's avatar
      Prevent users from extending the EndTime of running jobs · 62edab22
      Don Lipari authored
      While safeguards are in place to prevent unauthorized users from extending the
      TimeLimit of their running jobs, there were no such restrictions for extending
      the EndTime.  This patch adds the same constraints to modifying EndTime that
      currently exists for modifying TimeLimit.
      62edab22
  12. 03 Apr, 2012 1 commit
    • Morris Jette's avatar
      Limit depth of circular job dependency check · 0caecbc5
      Morris Jette authored
      Add support for new SchedulerParameters of max_depend_depth defining the
      maximum number of jobs to test for circular dependencies (i.e. job A waits
      for job B to start and job B waits for job A to start). Default value is
      10 jobs.
      0caecbc5
  13. 02 Apr, 2012 1 commit
  14. 30 Mar, 2012 1 commit
  15. 29 Mar, 2012 2 commits
    • Morris Jette's avatar
      Fix in select/cons_res+topology+job with node range count · f64b29a2
      Morris Jette authored
      The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).
      
      Quoting Martin.Perry@Bull.com:
      
      > Certain combinations of topology configuration and srun -N option produce
      > spurious job rejection with "Requested node configuration is not
      > available" with select/cons_res. The following example illustrates the
      > problem.
      >
      > [sulu] (slurm) etc> cat slurm.conf
      > ...
      > TopologyPlugin=topology/tree
      > SelectType=select/cons_res
      > SelectTypeParameters=CR_Core
      > ...
      >
      > [sulu] (slurm) etc> cat topology.conf
      > SwitchName=s1 Nodes=xna[13-26]
      > SwitchName=s2 Nodes=xna[41-45]
      > SwitchName=s3 Switches=s[1-2]
      >
      > [sulu] (slurm) etc> sinfo
      > PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
      > ...
      > jkob         up   infinite      4   idle xna[14,19-20,41]
      > ...
      >
      > [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
      > srun: Force Terminated job 79
      > srun: error: Unable to allocate resources: Requested node configuration is
      > not available
      >
      > The problem does not occur with select/linear, or topology/none, or if -N
      > is omitted, or for certain other values for -N (for example, -N 4-4 and -N
      > 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
      > src/plugins/select/cons_res/job_test.c. The srun man page states that when
      > -N is used, "the job will be allocated as many nodes as possible within
      > the range specified and without delaying the initiation of the job."
      > Consistent with this description, the requested number of nodes in the
      > above example is 4 (req_nodes=4).  However, the code that selects the
      > best-fit topology switches appears to make the selection based on the
      > minimum required number of nodes (min_nodes=2). It therefore selects
      > switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
      > than req_nodes the job is rejected with the "node configuration" error.
      >
      > I'm not sure where the code is going wrong.  It could be in the
      > calculation of the number of needed nodes in function _enough_nodes.  Or
      > it could be in the code that initializes/updates req_nodes or rem_nodes. I
      > don't feel confident that I understand the logic well enough to propose a
      > fix without introducing a regression.
      >
      > Regards,
      > Martin
      f64b29a2
    • Morris Jette's avatar
      Format change, no change in logic · ebca432e
      Morris Jette authored
      ebca432e
  16. 27 Mar, 2012 2 commits
  17. 26 Mar, 2012 1 commit
  18. 21 Mar, 2012 4 commits
  19. 20 Mar, 2012 2 commits
  20. 18 Mar, 2012 3 commits
    • Mark A. Grondona's avatar
      task/cgroup: delete job step memcg instead of using force_empty · a93afcd1
      Mark A. Grondona authored
      The current task/cgroup memory code writes to force_empty at job step
      completion and then waits for the release agent to be triggered to
      remove the memcg. However, force_empty only causes clean cache pages
      to be dropped from the memcg and does not actually move charges to
      the parent [1].
      
      This has two unfortunate side-effects. First, pages that can't be
      dropped by force_empty are in-use and could stay that way indefinitely
      (e.g. system library that is in-use until just after force_empty
      completes). Thus, the step memcg never becomes 'empty' and the release
      agent is not activated. Second, cached pages that can be freed are
      likely associated with the job itself, and those files and libraries
      will have to be paged in again for subsequent job steps.
      
      In contrast, calling rmdir(2) on a memcg with no active tasks
      causes *all* current charges to move to parent, which is really what
      we want in this case. This allows cached libraries and binaries to
      stay resident and be associated with the job, and also ensures that
      the step memcg is removed immediately as the job step ends.
      
      Thus, this patch replaces the write to force_empty with a call
      to xcgroup_delete() on the step memcg, which in turn removes
      the memcg with rmdir(2).
      
      The functionality of this patch depends on the previous fix that
      uses xcgroup_move_process() to move slurmstepd to the root memcg.
      Otherwise, there will be leftover slurmstepd threads in the job
      step memcg, and the rmdir will fail with EBUSY.
      
       [1] Sec 4.3: http://www.kernel.org/doc/Documentation/cgroups/memory.txt
      a93afcd1
    • Mark A. Grondona's avatar
      task/cgroup: use xcgroup_move_process to move slurmstepd to root memcg · 2dd13506
      Mark A. Grondona authored
      In task_cgroup_memory_fini() the implementation attempts to move
      the existing slurmstepd task to the root memory cgroup by writing
      the result of getpid(2) to the root memory's 'task' file. This
      does not work, however, because slurmstepd is multi-threaded and
      thus only the main thread is moved.
      
      This patch replaces the explicit write to 'tasks' with a call to
      the new xcgroup_move_process() call, which handles moving all
      threads in the process.
      2dd13506
    • Mark A. Grondona's avatar
      xcgroup: add xcgroup_move_process helper function · aa912e4a
      Mark A. Grondona authored
      This patch adds a helper function to common/xcgroup.c to aid
      in moving processes between cgroups. If the cgroups.procs file
      is writable then writing the PID to that file is used, as this
      method moves all threads in a process atomically.
      
      If cgroups.procs is not writable, then each thread must be moved
      individually by walking the /proc/PID/task/ directory and writing
      each taskid individually to the 'tasks' file in the cgroup. The
      second method is racy if a process is concurrently creating
      threads, but it is better than the current method of just moving
      one of the process's threads.
      aa912e4a
  21. 16 Mar, 2012 4 commits