1. 16 Apr, 2011 3 commits
  2. 15 Apr, 2011 3 commits
  3. 14 Apr, 2011 7 commits
  4. 13 Apr, 2011 5 commits
  5. 12 Apr, 2011 3 commits
  6. 11 Apr, 2011 7 commits
  7. 10 Apr, 2011 12 commits
    • Moe Jette's avatar
      tweaks to some tests to reflect changes recent changes in priority change · 5498bb90
      Moe Jette authored
      and dependency clearing logic
      5498bb90
    • Moe Jette's avatar
      api: remove unreferenced and undocumented function · 22ece52e
      Moe Jette authored
      This removes a function "slurm_pack_msg_no_header" which is nowhere referenced
      in the src tree and which also is not listed in any of the slurm manpages.
      
      As far as I understand the documentation, each slurm message needs to have a
      header, it could thus be that this function is from very old or initial code.
      22ece52e
    • Moe Jette's avatar
      scontrol: refactor if/else statement · 367c71ba
      Moe Jette authored
      367c71ba
    • Moe Jette's avatar
      protocol_defs: remove duplicate/identical test · 5c13acad
      Moe Jette authored
      This removes a duplicated test statement which appears identically twice.
      5c13acad
    • Moe Jette's avatar
      sprio: add support for the SLURM_CLUSTERS environment variable · 0a0efdf2
      Moe Jette authored
      This adds support for the SLURM_CLUSTERS environment variable also for sprio.
      It also makes the test for the priority plugin type dependent on whether
      running with multiple cluster support or not.
      0a0efdf2
    • Moe Jette's avatar
      scontrol: add support for the SLURM_CLUSTERS environment variable · c7045c83
      Moe Jette authored
      On our frontend host we support multiple clusters (Cray and non-Cray) by
      setting the SLURM_CLUSTERS environment variable accordingly.
      
      In order to use scontrol (e.g. for hold/release of a user job) from a
      frontend host to control jobs on a remote Cray system, we need support for
      the SLURM_CLUSTERS environment variable also in scontrol.
      c7045c83
    • Moe Jette's avatar
      slurmctld: keep original nice value when putting job on hold · b414712e
      Moe Jette authored
      The current code erases the old nice value (both negative and positive) when a job is
      put on hold so that the job has a 0 nice component upon release.
      
      This interaction causes difficulties if the nice value set at submission time had been
      set there for a reason, for instance when
       * a system administrator has allowed to set a negative nice value;
       * the user wanted to keep this as a low-priority job and wants his/her other jobs
         to go first (indenpendent of the hold option);
       * the nice value is used for other semantics - at our site for instance, we use it
         for computed "base priority values" that are computed by looking at how much of
         their quota a given group has already (over)used.
      
      Here is an example which illustrates the loss of original nice values:
      
        [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 55
        [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 66
        [2011-03-31T09:47:53] sched: update_job: setting priority to 0 for job_id 77
        [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 88
        [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 99
        [2011-03-31T09:47:54] sched: update_job: setting priority to 0 for job_id 110
      
      This is from user 'kraused' whose project 's310' is within the allocated quota and thus
      has an initial nice value of -542 (set via the job_submit/lua plugin).
      
      However, by putting his jobs on hold, he has lost this advantage:
      
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
           55  kraused      15181        153          0       5028      10000          0      0
           66  kraused      15181        153          0       5028      10000          0      0
           77  kraused      15181        153          0       5028      10000          0      0
           88  kraused      15178        150          0       5028      10000          0      0
           99  kraused      15178        150          0       5028      10000          0      0
          110  kraused      15178        150          0       5028      10000          0      0
      
      I believe that resetting the nice value has been there for a reason, thus the patch prevents
      reset of current nice value only if the operation is not user/administrator hold.
      b414712e
    • Moe Jette's avatar
      slurmctld: test job_specs->min_nodes before altering the value via partition setting · f8ea48bb
      Moe Jette authored
      This fixes a problem when trying to move a pending job from one partitition to another
      while not supplying any other parameters:
       * if a partition value is present, the job is pending and no min_nodes are supplied,
         the job_specs->min_nodes get set from the detail_ptr value;
       * this causes subsequent tests for job_specs->min_nodes ==/!= NO_VAL to fail.
      
      The following illustrates the behaviour, the example is taken from our system:
        palu2:0 ~>scontrol update jobid=3944 partition=night
        slurm_update error: Requested operation not supported on this system
      
        slurmctld.log
        [2011-04-06T14:39:51] update_job: setting partition to night for job_id 3944
        [2011-04-06T14:39:51] Change of size for job 3944 not supported
        [2011-04-06T14:39:51] updating accounting
        [2011-04-06T14:39:51] _slurm_rpc_update_job JobId=3944 uid=21215: Requested operation not supported on this system
      
      ==> The 'Change of size for job 3944' reveals that the !select_g_job_expand_allow() case was triggered,
          after setting the job_specs->min_nodes due to supplying a job_specs->partition.
      
      Fix:
      ====
       Since the test for select_g_job_expand_allow() is not dependent on the job state, moved it up, before
       the test for the job_specs->partition. At the same time, also moved the equality for INFINITE/NO_VAL
       min_nodes values to the same place.
       Tests for job_specs->min_nodes below the job_specs->partition setting depend on the job state, 
       - the 'Reset min and max node counts as needed, insure consistency' requires pending state;
       - the other remaining test is only for IS_JOB_RUNNING/SUSPENDED.
      f8ea48bb
    • Moe Jette's avatar
      slurmctld: case of authorized operator releasing user hold · 1895a10a
      Moe Jette authored
      This patch avoids that the priority is not recalculated on 'scontrol release',
      which  happens when an authorized operator releases a job, or if the job is
      released via e.g. the job_submit plugin.
      
      The patch reorders the tests in update_job() to 
       * test first if the job has been held by the user and, only if not,
       * test whether an authorized operator changed the priority or
         the updated priority is being reduced.
      
      Due to earlier permission checks, we have either
       * job_ptr->user_id == uid or 
       * authorized,
      where in both cases the release-user-hold operation is authorized.
      1895a10a
    • Moe Jette's avatar
      scontrol: set uid when releasing a job · 6353467b
      Moe Jette authored
      This fix is related to an earlier one and was observed when trying to 'scontrol release'
      a job previously submitted via 'sbatch --hold' by the same user.
      
      Within the job_submit/lua plugin, the user gets automatically assigned a partition. So,
      even if no submitter uid checks are usually expected, it can happen in the process of
      releasing a job, that a part_check is performed.
      
      In this case, the error message was
      
      [2011-03-30T18:37:17] _part_access_check: uid 4294967294 access to partition usup denied, bad group
      [2011-03-30T18:37:17] error: _slurm_rpc_update_job JobId=12856 uid=21215: User's group not permitted to use this partition
      
      and like before (in scontrol_update_job()), was fixed by supplying the UID of the requesting user.
      6353467b
    • Moe Jette's avatar
      add function args to header · a269b6f4
      Moe Jette authored
      a269b6f4
    • Moe Jette's avatar
      slurmstepd: avoid coredump in case of NULL job · e0d92b8a
      Moe Jette authored
      We build slurm with --enable-memory-leak-debug and encountered twice the same core
      dump when user 'root' was trying to run jobs during a maintenance session. 
      
      The root user is not in the accounting database, which explains the errors seen
      below. The gdb session shows that in this invocation 
      
      palu7:0 log>stat /var/crash/palu7-slurmstepd-6602.core 
      ...
      Modify: 2011-04-04 19:34:44.000000000 +0200
      
      slurmctld.log
      [2011-04-04T19:34:44] _slurm_rpc_submit_batch_job JobId=3254 usec=1773
      [2011-04-04T19:34:44] ALPS RESERVATION #5, JobId 3254: BASIL -n 1920 -N 0 -d 1 -m 1333
      [2011-04-04T19:34:44] sched: Allocate JobId=3254 NodeList=nid000[03-13,18-29,32-88] #CPUs=1920
      [2011-04-04T19:34:44] error: slurmd error 4005 running JobId=3254 on front_end=palu7: User not found on host
      [2011-04-04T19:34:44] update_front_end: set state of palu7 to DRAINING
      [2011-04-04T19:34:44] completing job 3254
      [2011-04-04T19:34:44] Requeue JobId=3254 due to node failure
      [2011-04-04T19:34:44] sched: job_complete for JobId=3254 successful
      [2011-04-04T19:34:44] requeue batch job 3254
      [2011-04-04T20:28:43] sched: Cancel of JobId=3254 by UID=0, usec=57285
      
      (gdb) core-file palu7-slurmstepd-6602.core 
      [New Thread 6604]
      Core was generated by `/opt/slurm/2.3.0/sbin/slurmstepd'.
      Program terminated with signal 11, Segmentation fault.
      #0  main (argc=1, argv=0x7fffd65a1fd8) at slurmstepd.c:413
      413             jobacct_gather_g_destroy(job->jobacct);
      (gdb) print job
      $1 = (slurmd_job_t *) 0x0
      (gdb) list
      408
      409     #ifdef MEMORY_LEAK_DEBUG
      410     static void
      411     _step_cleanup(slurmd_job_t *job, slurm_msg_t *msg, int rc)
      412     {
      413             jobacct_gather_g_destroy(job->jobacct);
      414             if (!job->batch)
      415                     job_destroy(job);
      416             /*
      417              * The message cannot be freed until the jobstep is complete
      (gdb) print msg
      $2 = (slurm_msg_t *) 0x916008
      (gdb) print rc
      $3 = -1
      (gdb) 
      
      The patch tests for a NULL job argument for the calls that need to dereference the job pointer.
      e0d92b8a