- 21 Mar, 2012 40 commits
-
-
Mark A. Grondona authored
Add a new call to process spank options from a plugin. The spank_option_getopt() function will search the current spank environment for use of the option passed as an argument. The current option cache, and the local environment are checked for the use of the given spank option. This call is an alternative to use of a global variable in combination with the option callback, and is also needed for processing options in the isolated contexts of slurm_spank_job_prolog() and slurm_spank_job_epilog().
-
Mark A. Grondona authored
Add spank_clear_remote_options_env() to clear any spank options passed through the environment after they are no longer needed. This is done in slurmd after running the spank job prolog || epilog, as well as in the spank_post_opt function, after the env has been searched for spank variables.
-
Mark A. Grondona authored
Always set spank options in the environemnt and spank job environment to ensure that used options are propagated to the job prolog and epilog. (Previously, spank options were set in the environment only in allocator context)
-
Mark A. Grondona authored
In slurmd and job prolog/epilog contexts, avoid loading plugins that have no callbacks in the context in which they are loaded. That is for slurmd, if there are no slurm_spank_slurmd_init or slurm_spank_slurmd_exit callbacks, there is no reason to keep the current plugin loaded.
-
Mark A. Grondona authored
We now want to return error on failure of either spank prolog/epilog or regular prolog/epilog scripts, so add a common function _run_job_script to handle return of shared error code. For now, we continue to run the normal prolog or epilog even if the spank prolog/epilog fail. In the future, a failure the spank prolog/epilog may short-circuit the run of the normal scripts.
-
Mark A. Grondona authored
Call spank_job_prolog() and spank_job_epilog() at prolog/epilog time by invoking "slurmstepd spank [prolog|epilog]" The prolog and epilog spank plugin hooks are not called within the virtual address space of slurmd for at least a couple of reasons, including 1. Plugins dlopened in the address space of slurmd cannot be dlopened a second time. Therefore, static and global state in the DSO may be "dirty" in that some state may be preserved from the last epilog or prolog call, or even from the slurmd_init callback. 2. The prolog and epilog need to be guaranteed reentrant. The safest way to guarantee this is to ensure prolog/epilog hooks are called from a separate address space. 3. To satisfy "principle of least surprise" we want to have new plugins installed run their prolog/epilog hooks on the next job, just as if an update to the prolog/epilog script was made. The only way to guarantee this is to reload the spank plugin stack from plugstack.conf on each run. Because of #1 above, this needs to be done in a separate process.
-
Mark A. Grondona authored
Greatly simplify ability of code to get at current slurmstepd path by setting slurmd's conf->stepd_loc to the default slurmstped path if that path was not overridden on the command line. This allows slurmd code to directly use conf->stepd_loc, instead of requiring the duplicated code that created the default slurmstepd path if conf->stepd_loc was not set at each call site.
-
Mark A. Grondona authored
Make waitpid_timeout() return more quickly when the child exits before 1s but after the initial call to waitpid(2).
-
Mark A. Grondona authored
Abstract the code for a waitpid(2) with timeout into a waitpid_timeout() function for future use from other callers. For now, the function goes into src/slurmd/common/run_script.c, since that is the original use of the functionality.
-
Mark A. Grondona authored
Add new handle_spank_mode() function in slurmstepd to handle when slurmstepd is called with "spank prolog" or "spank epilog". In this function, the slurmd_conf_lite is read to handle reinitializing the log facility as defined by slurmd config.
-
Mark A. Grondona authored
Factor out the read and write of the packed slurmd_conf_lite data between slurmd and slurmstepd. This simplifies the code in which that data is handled, and will allow for other callers in the future.
-
Mark A. Grondona authored
The spank_job_prolog() and spank_job_epilog() spank calls need to be run in a different address space from slurmd. This not allows reinitializing the spank plugin stack on each run of the prolog or epilog, but also ensures that any static data in plugins does not propagate to each invocation of the job prolog and epilog (e.g. global variables). Additionally, it is much safer to run these plugins in a new process because we may be calling prolog/epilog for multiple jobs at the same time. This patch runs spank_job_prolog() or spank_job_epilog() from slurmstepd when slurmstepd is invoked as slurmstepd spank [prolog|epilog] The environment variables SLURM_JOBID and SLURM_UID are used to set the jobid and uid for the prolog/epilog. Spank plugin options may also be passed through the current environment.
-
Mark A. Grondona authored
Move special handling of slurmstepd cmdline to a function for future expansion.
-
Mark A. Grondona authored
Add slurm_spank_job_prolog and slurm_spank_job_epilog callbacks to the spank API, to be called just before the job prolog and epilog scripts are executed. These callbacks are not active until the hooks spank_job_prolog and spank_job_epilog are added to slurmd.
-
Mark A. Grondona authored
Add new spank context "job script" for use during job prolog/epilog.
-
Mark A. Grondona authored
-
Mark A. Grondona authored
Add support for slurm_spank_slurmd_init and slurm_spank_slurmd_exit symbols in spank plugins, to be called at slurmd startup and shutdown. These are not functional yet until slurmd calls spank_slurmd_init() and spank_slurmd_exit().
-
Mark A. Grondona authored
Currently spank_get_item and spank_job_control* are not valid in slurmd context. Handle this case in relevant fucntions.
-
Mark A. Grondona authored
Prepare for spank plugins run in the context of slurmd daemon by adding a new S_CTX_SLURMD context type.
-
Mark A. Grondona authored
The spank_set_remote_options_env() function is not used anywhere except internal to plugstack.c, so remove it from plugstack.h. Then redefine it to take a spank_stack argument so that it doesn't refer to the global_spank_stack. Finally rename to spank_stack_set_remote_options_env() to clarify the intent.
-
Mark A. Grondona authored
Refactor the post_opt handling code embedded in _spank_init() into a spank_stack_post_opt() function, then call this in remote context from a new spank_init_remote() function.
-
Mark A. Grondona authored
Instead of trying to handle missing plugstack.conf early in the code, just treat missing plugstack.conf the same as an empty config.
-
Mark A. Grondona authored
Move struct spank_stack initialization code into a spank_stack_init() function so that it can be called from multiple call sites.
-
Mark A. Grondona authored
Simplify code in _do_call_stack() by extracting case statement to assign current callback symbol to its own function. Since all spank functions have the same prototype we can then use the same code to call _all_ callbacks, reducing greatly the number of lines of code required.
-
Mark A. Grondona authored
Consolidate common code in spank_getenv, spank_setenv, spank_unsetenv which checks for validity of the current context, spank handle, etc.
-
Mark A. Grondona authored
Consilidate checks for correct spank context in spank_job_control* functions to avoid code duplication.
-
Mark A. Grondona authored
The use of globals in plugstack.c is cumbersome and prevents the future expansion of spank plugins, e.g. calling spank plugins from multiple contexts within the same process or reinitializing the spank plugin state. This patch consolidates the current globals (spank_stack, spank_ctx, spank_optval, and option_cache) into a global "spank stack" structure and expands many of the functions internal to plugstack.c to operate on a struct spank_stack instead of globally.
-
Mark A. Grondona authored
There was likely a typo/thinko/patcho in the handling of the return code from _do_call_stack(SPANK_INIT_POST_OPT) in _spank_init in "remote" context. This error caused spank_init() to always succeed, since the test less than zero would always return 0 or 1.
-
Mark A. Grondona authored
Avoid loading the same plugin more than once in plugstack.c. Most likely this will be a configuration error, so we should catch it early. If the same .so appears in the plugin stack more than once, it is likely to cause very strange errors, since dlopen() will only map the library a single time.
-
Morris Jette authored
Change the owner of slurmctld and slurmdbd log files to the appropriate user. Without this change the files will be created by and owned by the user starting the daemons (likely user root).
-
Morris Jette authored
-
Morris Jette authored
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark node that is DOWN in ALPS as DOWN in SLURM).
-
Morris Jette authored
-
Morris Jette authored
in the tightly coupled functions slurmd:stepd_completion and slurmstepd:_handle_completion, a jobacct structure is send from the main daemon to the step daemon to provide the statistics of the children slurmstepd and do the aggregation. The methodology used to send the structure is the use of jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE). As {setinfo,getinfo} use a common internal lock and reading or writing to a pipe is equivalent to holding a lock, slurmd and slurmstepd have to avoid using both setinfo and getinfo over a pipe or deadlock situations can occured. For example : slurmd(lockforread,write)/slurmstepd(write,lockforread). This patch remove the call to jobacct_gather_g_setinfo in slurmd and the call to jobacct_gather_g_getinfo in slurmstepd ensuring that slurmd only do getinfo operations over a pipe and slurmstepd only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack} are used to marshall/unmarshall the data for transmission over the pipe. Patch by Matthieu Hautreux, CEA. The patch committed here is a variation on the work by Matthieu. Specifically, the logic is added to slurmstepd to read a new format of RPC including an RPC version number and buffer with the data structure. The slurmd however will not send the RPC in the new format until SLURM version 2.5.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-