1. 21 Mar, 2012 40 commits
    • Mark A. Grondona's avatar
      spank: add spank_option_getopt to spank api · 22895652
      Mark A. Grondona authored
      Add a new call to process spank options from a plugin.
      
      The spank_option_getopt() function will search the current
      spank environment for use of the option passed as an argument.
      The current option cache, and the local environment are checked
      for the use of the given spank option. This call is an alternative
      to use of a global variable in combination with the option callback,
      and is also needed for processing options in the isolated contexts
      of slurm_spank_job_prolog() and slurm_spank_job_epilog().
      22895652
    • Mark A. Grondona's avatar
      spank: clear unneded spank option environment vars · 1dbecf48
      Mark A. Grondona authored
      Add spank_clear_remote_options_env() to clear any spank options
      passed through the environment after they are no longer needed.
      This is done in slurmd after running the spank job prolog || epilog,
      as well as in the spank_post_opt function, after the env has been
      searched for spank variables.
      1dbecf48
    • Mark A. Grondona's avatar
      spank: always set options in environment · e1aae025
      Mark A. Grondona authored
      Always set spank options in the environemnt and spank job environment
      to ensure that used options are propagated to the job prolog and
      epilog.  (Previously, spank options were set in the environment
      only in allocator context)
      e1aae025
    • Mark A. Grondona's avatar
      spank: avoid loading plugins with no callbacks for current context · 83f7922b
      Mark A. Grondona authored
      In slurmd and job prolog/epilog contexts, avoid loading plugins that
      have no callbacks in the context in which they are loaded. That is
      for slurmd, if there are no slurm_spank_slurmd_init or
      slurm_spank_slurmd_exit callbacks, there is no reason to keep the
      current plugin loaded.
      83f7922b
    • Mark A. Grondona's avatar
      slurmd: Refactor code to run prolog/epilog · cac3ae6d
      Mark A. Grondona authored
      We now want to return error on failure of either spank prolog/epilog
      or regular prolog/epilog scripts, so add a common function _run_job_script
      to handle return of shared error code.
      
      For now, we continue to run the normal prolog or epilog even if the spank
      prolog/epilog fail. In the future, a failure the spank prolog/epilog may
      short-circuit the run of the normal scripts.
      cac3ae6d
    • Mark A. Grondona's avatar
      slurmd: Call spank prolog and epilog hooks · e33b820c
      Mark A. Grondona authored
      Call spank_job_prolog() and spank_job_epilog() at prolog/epilog
      time by invoking "slurmstepd spank [prolog|epilog]"
      
      The prolog and epilog spank plugin hooks are not called within the
      virtual address space of slurmd for at least a couple of reasons,
      including
      
       1. Plugins dlopened in the address space of slurmd cannot be dlopened
         a second time. Therefore, static and global state in the DSO may
         be "dirty" in that some state may be preserved from the last epilog
         or prolog call, or even from the slurmd_init callback.
      
       2. The prolog and epilog need to be guaranteed reentrant. The safest
         way to guarantee this is to ensure prolog/epilog hooks are called
         from a separate address space.
      
       3. To satisfy "principle of least surprise" we want to have new plugins
         installed run their prolog/epilog hooks on the next job, just as
         if an update to the prolog/epilog script was made. The only way to
         guarantee this is to reload the spank plugin stack from plugstack.conf
         on each run. Because of #1 above, this needs to be done in a separate
         process.
      e33b820c
    • Mark A. Grondona's avatar
      slurmd: Always set conf->stepd_loc to slurmstepd path · 6722705f
      Mark A. Grondona authored
      Greatly simplify ability of code to get at current slurmstepd path
      by setting slurmd's conf->stepd_loc to the default slurmstped path
      if that path was not overridden on the command line.
      
      This allows slurmd code to directly use conf->stepd_loc, instead of
      requiring the duplicated code that created the default slurmstepd
      path if conf->stepd_loc was not set at each call site.
      6722705f
    • Mark A. Grondona's avatar
      use exponential backoff in waitpid_timeout · 8e201175
      Mark A. Grondona authored
      Make waitpid_timeout() return more quickly when the child exits before
      1s but after the initial call to waitpid(2).
      8e201175
    • Mark A. Grondona's avatar
      abstract timed waitpid from run_script to separate function · 08162bb1
      Mark A. Grondona authored
      Abstract the code for a waitpid(2) with timeout into a waitpid_timeout()
      function for future use from other callers. For now, the function goes
      into src/slurmd/common/run_script.c, since that is the original use
      of the functionality.
      08162bb1
    • Mark A. Grondona's avatar
      slurmstepd: refactor spank prolog/epilog code · e409986a
      Mark A. Grondona authored
      Add new handle_spank_mode() function in slurmstepd to handle
      when slurmstepd is called with "spank prolog" or "spank epilog".
      In this function, the slurmd_conf_lite is read to handle reinitializing
      the log facility as defined by slurmd config.
      e409986a
    • Mark A. Grondona's avatar
      slurmd/slurmstepd: factor out read/write of slurmd_conf_lite · 00e71ef3
      Mark A. Grondona authored
      Factor out the read and write of the packed slurmd_conf_lite
      data between slurmd and slurmstepd. This simplifies the code
      in which that data is handled, and will allow for other callers
      in the future.
      00e71ef3
    • Mark A. Grondona's avatar
      slurmstepd: Add new mode to run spank job prolog/epilog · 1e01c729
      Mark A. Grondona authored
      The spank_job_prolog() and spank_job_epilog() spank calls need
      to be run in a different address space from slurmd. This not allows
      reinitializing the spank plugin stack on each run of the prolog or
      epilog, but also ensures that any static data in plugins does not
      propagate to each invocation of the job prolog and epilog (e.g. global
      variables). Additionally, it is much safer to run these plugins
      in a new process because we may be calling prolog/epilog for multiple
      jobs at the same time.
      
      This patch runs spank_job_prolog() or spank_job_epilog() from slurmstepd
      when slurmstepd is invoked as
      
       slurmstepd spank [prolog|epilog]
      
      The environment variables SLURM_JOBID and SLURM_UID are used to set
      the jobid and uid for the prolog/epilog. Spank plugin options may
      also be passed through the current environment.
      1e01c729
    • Mark A. Grondona's avatar
      slurmstepd: Move handling of cmdline to a function · a136a5ab
      Mark A. Grondona authored
      Move special handling of slurmstepd cmdline to a function for
      future expansion.
      a136a5ab
    • Mark A. Grondona's avatar
      spank: add prolog and epilog callbacks to spank api · d3a6ec23
      Mark A. Grondona authored
      Add slurm_spank_job_prolog and slurm_spank_job_epilog callbacks
      to the spank API, to be called just before the job prolog and epilog
      scripts are executed.
      
      These callbacks are not active until the hooks spank_job_prolog and
      spank_job_epilog are added to slurmd.
      d3a6ec23
    • Mark A. Grondona's avatar
      spank: Add S_TYPE_JOB_SCRIPT context for prolog/epilog · 21773e76
      Mark A. Grondona authored
      Add new spank context "job script" for use during job prolog/epilog.
      21773e76
    • Mark A. Grondona's avatar
      d405c1ed
    • Mark A. Grondona's avatar
      spank: Add spank callbacks for slurmd · 069b164c
      Mark A. Grondona authored
      Add support for slurm_spank_slurmd_init and slurm_spank_slurmd_exit
      symbols in spank plugins, to be called at slurmd startup and shutdown.
      
      These are not functional yet until slurmd calls spank_slurmd_init()
      and spank_slurmd_exit().
      069b164c
    • Mark A. Grondona's avatar
      spank: handle slurmd context in some callbacks · 63765b58
      Mark A. Grondona authored
      Currently spank_get_item and spank_job_control* are not valid in
      slurmd context. Handle this case in relevant fucntions.
      63765b58
    • Mark A. Grondona's avatar
      spank: Add context type for slurmd · ab388e1e
      Mark A. Grondona authored
      Prepare for spank plugins run in the context of slurmd daemon by
      adding a new S_CTX_SLURMD context type.
      ab388e1e
    • Mark A. Grondona's avatar
      spank: remove spank_set_remote_options_env · d436efcd
      Mark A. Grondona authored
      The spank_set_remote_options_env() function is not used anywhere except
      internal to plugstack.c, so remove it from plugstack.h. Then redefine
      it to take a spank_stack argument so that it doesn't refer to the
      global_spank_stack. Finally rename to spank_stack_set_remote_options_env()
      to clarify the intent.
      d436efcd
    • Mark A. Grondona's avatar
      spank: refactor intialization code · 66cfa45a
      Mark A. Grondona authored
      Refactor the post_opt handling code embedded in _spank_init() into
      a spank_stack_post_opt() function, then call this in remote context
      from a new spank_init_remote() function.
      66cfa45a
    • Mark A. Grondona's avatar
      spank: handle missing plugstack.conf · 3344092a
      Mark A. Grondona authored
      Instead of trying to handle missing plugstack.conf early in the code,
      just treat missing plugstack.conf the same as an empty config.
      3344092a
    • Mark A. Grondona's avatar
      spank: abstract spank_stack initialization code · 443aee4d
      Mark A. Grondona authored
      Move struct spank_stack initialization code into a spank_stack_init()
      function so that it can be called from multiple call sites.
      443aee4d
    • Mark A. Grondona's avatar
      spank: consolidate common code in _do_call_stack · e4e3baab
      Mark A. Grondona authored
      Simplify code in _do_call_stack() by extracting case statement
      to assign current callback symbol to its own function. Since all
      spank functions have the same prototype we can then use the same
      code to call _all_ callbacks, reducing greatly the number of lines
      of code required.
      e4e3baab
    • Mark A. Grondona's avatar
      spank: consilidate checks for spank_get/set/unsetenv calls · 61cd1115
      Mark A. Grondona authored
      Consolidate common code in spank_getenv, spank_setenv, spank_unsetenv
      which checks for validity of the current context, spank handle, etc.
      61cd1115
    • Mark A. Grondona's avatar
      spank: consolidate error checks in job control functions · c3227f9a
      Mark A. Grondona authored
      Consilidate checks for correct spank context in spank_job_control*
      functions to avoid code duplication.
      c3227f9a
    • Mark A. Grondona's avatar
      spank: consolidate globals in plugstack.c · 2eb0b999
      Mark A. Grondona authored
      The use of globals in plugstack.c is cumbersome and prevents the
      future expansion of spank plugins, e.g. calling spank plugins from
      multiple contexts within the same process or reinitializing the
      spank plugin state.
      
      This patch consolidates the current globals (spank_stack, spank_ctx,
      spank_optval, and option_cache) into a global "spank stack" structure
      and expands many of the functions internal to plugstack.c to operate
      on a struct spank_stack instead of globally.
      2eb0b999
    • Mark A. Grondona's avatar
      spank: fix handling of remote spank_init_post_opt · 3a522459
      Mark A. Grondona authored
      There was likely a typo/thinko/patcho in the handling of the
      return code from _do_call_stack(SPANK_INIT_POST_OPT) in _spank_init
      in "remote" context. This error caused spank_init() to always
      succeed, since the test less than zero would always return 0 or 1.
      3a522459
    • Mark A. Grondona's avatar
      spank: refuse to load the same plugin more than once · 7a60bf95
      Mark A. Grondona authored
      Avoid loading the same plugin more than once in plugstack.c.
      Most likely this will be a configuration error, so we should
      catch it early. If the same .so appears in the plugin stack
      more than once, it is likely to cause very strange errors,
      since dlopen() will only map the library a single time.
      7a60bf95
    • Morris Jette's avatar
      change owner of slurmctld and slurmdbd log files · 3470c651
      Morris Jette authored
      Change the owner of slurmctld and slurmdbd log files to the appropriate
      user. Without this change the files will be created by and owned by the
      user starting the daemons (likely user root).
      3470c651
    • Morris Jette's avatar
      Merge branch 'slurm-2.3' · e78802d3
      Morris Jette authored
      e78802d3
    • Morris Jette's avatar
      CRAY: Fix support for SlurmdTimeout=0 · 4dd9e697
      Morris Jette authored
      CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
      node that is DOWN in ALPS as DOWN in SLURM).
      4dd9e697
    • Morris Jette's avatar
      Add delay to test for job info propagation · 7636f0f2
      Morris Jette authored
      7636f0f2
    • Morris Jette's avatar
      Modify the step completion RPC between slurmd and slurmstepd · ed31e6c7
      Morris Jette authored
      in the tightly coupled functions slurmd:stepd_completion and
      slurmstepd:_handle_completion, a jobacct structure is
      send from the main daemon to the step daemon to provide
      the statistics of the children slurmstepd and do the aggregation.
      
      The methodology used to send the structure is the use of
      jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE).
      As {setinfo,getinfo} use a common internal lock and reading
      or writing to a pipe is equivalent to holding a lock, slurmd and
      slurmstepd have to avoid using both setinfo and getinfo over a
      pipe or deadlock situations can occured. For example :
      slurmd(lockforread,write)/slurmstepd(write,lockforread).
      
      This patch remove the call to jobacct_gather_g_setinfo in slurmd
      and the call to jobacct_gather_g_getinfo in slurmstepd ensuring
      that slurmd only do getinfo operations over a pipe and slurmstepd
      only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack}
      are used to marshall/unmarshall the data for transmission over the
      pipe.
      Patch by Matthieu Hautreux, CEA.
      
      The patch committed here is a variation on the work by Matthieu.
      Specifically, the logic is added to slurmstepd to read a new format
      of RPC including an RPC version number and buffer with the data
      structure. The slurmd however will not send the RPC in the new format
      until SLURM version 2.5.
      ed31e6c7
    • Morris Jette's avatar
      Add possible reason for failure to test · 3bdcf40f
      Morris Jette authored
      3bdcf40f
    • Morris Jette's avatar
      Merge branch 'slurm-2.3' · 644fc9a7
      Morris Jette authored
      644fc9a7
    • Morris Jette's avatar
      Minor test mods for old RedHat distro · 455283c2
      Morris Jette authored
      455283c2
    • Morris Jette's avatar
      Merge branch 'slurm-2.3' · f23f6ccc
      Morris Jette authored
      f23f6ccc
    • Morris Jette's avatar
      make test work better on different systems · 47aebf2c
      Morris Jette authored
      47aebf2c
    • Morris Jette's avatar
      result of autogen.sh · 304cccb6
      Morris Jette authored
      304cccb6