- 05 Dec, 2012 1 commit
-
-
Morris Jette authored
Especially for newly started jobs, the PrologSlurmctld can change a job's QOS based upon resource allocation.
-
- 04 Dec, 2012 1 commit
-
-
Danny Auble authored
DB2 so hard.
-
- 30 Nov, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
on them. This should only happen in extreme conditions.
-
- 29 Nov, 2012 7 commits
-
-
Danny Auble authored
with associations get the deleted associations as well.
-
Francois Diakhate authored
request resources that reach a 'Max' limit.
-
Danny Auble authored
-
Danny Auble authored
user mark the state canceled instead of completed.
-
Morris Jette authored
-
Danny Auble authored
so it gets sent again. This isn't a major problem since the start will happen when the job ends, but this does make things cleaner.
-
Morris Jette authored
-
- 28 Nov, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
you query against that with -N and -E you will get all jobs during that time instead of only the ones running on -N. Signed-off-by: Danny Auble <da@schedmd.com>
-
- 27 Nov, 2012 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
was already in error and isn't deallocating and underlying hardware goes bad one could get overlapping blocks in error making the code assert when a new job request comes in.
-
Danny Auble authored
overcommit.
-
Danny Auble authored
overcommit.
-
Morris Jette authored
Previously only requeued the job once
-
- 26 Nov, 2012 2 commits
-
-
Danny Auble authored
where needed)
-
jette authored
Otherwise an aborted slurmstepd can cause the srun process to hang indefinitely; a problem reported in trouble ticket 149.
-
- 22 Nov, 2012 1 commit
-
-
Danny Auble authored
introduce step accounting for a Cray.
-
- 21 Nov, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This is needed if the munge deamon is under very heavy load (e.g. with 1000 slurmd daemons per compute node).
-
- 20 Nov, 2012 3 commits
-
-
Danny Auble authored
slurmctld restart.
-
Morris Jette authored
Modify sbast logic to continue when slurmd daemon restarts Previously a file transmission in progress would be aborted when any of the slurmd daemons restarted. Now it reconnects, revalidates the credential, and resumes file transmission.
-
Morris Jette authored
-
- 19 Nov, 2012 3 commits
-
-
Danny Auble authored
allocation.
-
Morris Jette authored
NOTE: If you were setting the environment variable SLURMSTEPD_OOM_ADJ=-17, it should be set to -1000 for Linux 2.6.36 kernel or later.
-
Danny Auble authored
-
- 09 Nov, 2012 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
- 08 Nov, 2012 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
Signed-off-by: Danny Auble <da@schedmd.com>
-
- 07 Nov, 2012 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
aprun process instead of a perl script.
-
Janne Blomqvist authored
the attached patch changes the default timestamp format in logfiles to conform to RFC 5424 (the current version of the syslog RFC). It is identical to the current default "ISO 8601" timestamp used by slurm, with the exception that the timezone offset is appended. This has the benefits of 1) It's unambiguous. 2) Avoids potential confusion for admins running cluster(s) in different timezones. 3) Might help debug issues related to DST transitions. (More on that later..) (To be pedantic, a RFC 5424 timestamp is still a valid ISO 8601 timestamp, but the converse is not necessarily true. So there is RFC 3339 which is a "profile" of ISO 8601, that is a subset, recommended for internet protocols. The RFC 5424 timestamp, in turn, is a subset of the RFC 3339 timestamps.) The previous behavior of can be used by running configure with the --disable-rfc5424time flag.
-
Danny Auble authored
-
Danny Auble authored
specifying the number of tasks and not the number of nodes.
-
- 05 Nov, 2012 1 commit
-
-
Morris Jette authored
On job kill requeust, send SIGCONT, SIGTERM, wait KillWait and send SIGKILL. Previously just sent SIGKILL to tasks.
-