- 22 Mar, 2012 1 commit
-
-
Morris Jette authored
-
- 21 Mar, 2012 4 commits
-
-
Morris Jette authored
Change the owner of slurmctld and slurmdbd log files to the appropriate user. Without this change the files will be created by and owned by the user starting the daemons (likely user root).
-
Morris Jette authored
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark node that is DOWN in ALPS as DOWN in SLURM).
-
Morris Jette authored
in the tightly coupled functions slurmd:stepd_completion and slurmstepd:_handle_completion, a jobacct structure is send from the main daemon to the step daemon to provide the statistics of the children slurmstepd and do the aggregation. The methodology used to send the structure is the use of jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE). As {setinfo,getinfo} use a common internal lock and reading or writing to a pipe is equivalent to holding a lock, slurmd and slurmstepd have to avoid using both setinfo and getinfo over a pipe or deadlock situations can occured. For example : slurmd(lockforread,write)/slurmstepd(write,lockforread). This patch remove the call to jobacct_gather_g_setinfo in slurmd and the call to jobacct_gather_g_getinfo in slurmstepd ensuring that slurmd only do getinfo operations over a pipe and slurmstepd only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack} are used to marshall/unmarshall the data for transmission over the pipe. Patch by Matthieu Hautreux, CEA. The patch committed here is a variation on the work by Matthieu. Specifically, the logic is added to slurmstepd to read a new format of RPC including an RPC version number and buffer with the data structure. The slurmd however will not send the RPC in the new format until SLURM version 2.5.
-
Morris Jette authored
-
- 20 Mar, 2012 3 commits
-
-
Morris Jette authored
Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull.
-
Morris Jette authored
-
Morris Jette authored
Improve task binding logic by making fuller use of HWLOC library, especially with respect to Opteron 6000 series processors. Work contributed by Komoto Masahiro.
-
- 16 Mar, 2012 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
to a bluegene cluster.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
already pinged it on startup the unresponding flag would be removed from the frontend node.
-
Danny Auble authored
-
Danny Auble authored
mark front end node down.
-
Danny Auble authored
-
- 15 Mar, 2012 4 commits
-
-
Danny Auble authored
state change while the realtime server is running.
-
Danny Auble authored
running on.
-
Danny Auble authored
-
Danny Auble authored
mark front end node down.
-
- 14 Mar, 2012 2 commits
-
-
Morris Jette authored
Cray - For srun wrapper when creating a job allocation, set the default job name to the executable file's name. Ignore leading directory names in the path.
-
Morris Jette authored
Cray - Enable logging of BASIL communications with environment variables. Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log". Based on work by Steve Tronfinoff, CSCS.
-
- 13 Mar, 2012 5 commits
-
-
Morris Jette authored
permit the srun and salloc commands to be executed in the background on Cray systems
-
Morris Jette authored
permit the srun and salloc commands to be executed in the background on Cray systems
-
Morris Jette authored
Add new job state reason of "FrontEndDown" which applies only to Cray and IBM BlueGene systems.
-
Danny Auble authored
-
Danny Auble authored
-
- 12 Mar, 2012 1 commit
-
-
Danny Auble authored
the queue when trying to place a larger than midplane job.
-
- 09 Mar, 2012 1 commit
-
-
Danny Auble authored
-
- 07 Mar, 2012 1 commit
-
-
Danny Auble authored
an admin updates the node to idle/resume the compute nodes will go instantly to idle instead of idle* which means no response.
-
- 06 Mar, 2012 2 commits
-
-
Danny Auble authored
gone. Previously it had a timelimit which has proven to not be the right thing.
-
Danny Auble authored
-
- 02 Mar, 2012 1 commit
-
-
Morris Jette authored
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet" option is used.
-
- 29 Feb, 2012 1 commit
-
-
Morris Jette authored
-
- 28 Feb, 2012 1 commit
-
-
Morris Jette authored
-
- 24 Feb, 2012 3 commits
-
-
Morris Jette authored
Change default SchedulerParameters max_switch_wait field value from 60 to 300 seconds.
-
Morris Jette authored
-
Morris Jette authored
-