- 26 Oct, 2016 1 commit
-
-
Morris Jette authored
Add new SchedulerParameter (max_array_tasks) to limit the maximum number of tasks in a job array independently from the maximum task ID (MaxArraySize). bug 2676
-
- 25 Oct, 2016 17 commits
-
-
Morris Jette authored
-
Morris Jette authored
Add SbcastParameters configuration option to control default file destination directory and compression algorithm. bug 2977
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Replace sjstat, seff and sjobexit RPM packages with a single "contribs" package.
-
Danny Auble authored
-
Morris Jette authored
Remove separate slurm_blcr package. If Slurm is build with BLCR support, the files will now be part of the main Slurm packages. bug 2061
-
Morris Jette authored
-
Morris Jette authored
Document that node Weight takes precedence over load with LLN scheduling. bug 3204
-
Morris Jette authored
-
Dimitar Pashov authored
-
Brian Christiansen authored
The test cluster had jobs in the job table which prevented the cluster from being deleted. This also caused problems for other tests because accounts would be added to the test cluster and the accounts couldn't be deleted because the cluster still had jobs.
-
Brian Christiansen authored
This prevents it from being added to any stray clusters and allows it to be cleaned up easier. test21.36 wasn't destroying the created test cluster and this account was being added to the cluster and it couldn't delete the account on subsequent runs. test21.36 is solved now too.
-
Tim Wickberg authored
Follow on to commit c3266fca for 17.02+.
-
Tim Wickberg authored
-
Tim Wickberg authored
task/cray's _get_numa_nodes() function needs to run before task/cgroup cleans up the cgroup hierarchies, otherwise ALPS memory compaction will never run. Also move task_p_add_pid() outside the #ifdef HAVE_NATIVE_CRAY block so that the plugin will load (albeit without any functionality) on non-Cray systems for testing purposes. Revise documentation and provided slurm.conf templates as well. Bug 3154.
-
Morris Jette authored
Do not include SLURM_JOB_DERIVED_EC, SLURM_JOB_EXIT_CODE, or SLURM_JOB_EXIT_CODE in PrologSlurmctld environment (not available yet). bug 1431
-
- 24 Oct, 2016 17 commits
-
-
Morris Jette authored
-
Yu Watanabe authored
bug 2390
-
Tim Wickberg authored
Many open Coverity issues point back to this. The results of passing a negative value to strerror() are undefined, return a static string instead.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Doug Parisek authored
the pro/epilog of jobs.
-
Tim Wickberg authored
-
Yu Watanabe authored
Even if several commands, e.g. sview, are not built and installed, the corresponding man pages are still installed. The attached patch stop to install man pages when the corresponding commands are not built installed. bug 2393
-
Morris Jette authored
-
Danny Auble authored
This reverts commit 428347cf. Decided we didn't want a core dump on ever fatal, as fatal is used in other programs instead of just the daemons.
-
Jacek Budzowski authored
There is a problem with gathering batch step statistics for jobs which are allocated on more than one node. Sstat asks wrong node for batch step stats. It requests info from last node from hostlist while it should ask first host from hostlist (i.e. BatchHost), because only on the first node the batch step actually executes. For example, when you have a job allocated on nodes n000[1-2] with BatchHost=p0001. You should be able to check its statistics by running sstat [ with -vv switch for more verbose output] (e.g. sstat -j 1234.batch -vv). Then you can see lines: sstat: debug: slurm_job_step_stat: getting pid information of job 1234.4294967294 on nodes n0002 sstat: debug: job step 1234.4294967294 has already completed The problem lays in sstat source code. For batch step a hostlist variable is taken from the hostlist_pop function, which returns last host from given hostlist. This should be replaced with the hostlist_shift function, which returns first host from the given hostlist. Patch attached. bug 2975
-
Morris Jette authored
-
Morris Jette authored
burst_buffer/cray: Accept new jobs on backup slurmctld daemon without access to dw_wlm_cli command. No burst buffer actions will take place. Newly submitted jobs will be accepted and stay in pending state. Jobs depedent upon stage-in or stage-out will remain in their current state until the action can take place.
-
Brian Christiansen authored
-
Morris Jette authored
-
Dorian Krause authored
This commit fixes a bug in the multi-prog handling. When running salloc -N 2 srun -O --multi-prog mp.conf where mp.conf reads 0-192 true srun crashes can be observed. valgrind reports: ==6857== Invalid read of size 4 ==6857== at 0x45938D: bit_realloc (bitstring.c:189) ==6857== by 0x5977A9: _update_task_mask (multi_prog.c:335) ==6857== by 0x597A5E: _validate_ranks (multi_prog.c:403) ==6857== by 0x597D1E: verify_multi_name (multi_prog.c:469) ==6857== by 0x6E7B4BE: launch_p_handle_multi_prog_verify (launch_slurm.c:453) ==6857== by 0x58A25D: launch_g_handle_multi_prog_verify (launch.c:493) ==6857== by 0x58E556: _opt_args (opt.c:1927) ==6857== by 0x58A3B9: initialize_and_process_args (opt.c:270) ==6857== by 0x591F82: init_srun (srun_job.c:459) ==6857== by 0x427E70: srun (srun.c:193) ==6857== by 0x428E23: main (srun.wrapper.c:17) ==6857== Address 0x5ace440 is 16 bytes inside a block of size 28 free'd ==6857== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==6857== by 0x446886: slurm_xrealloc (xmalloc.c:139) ==6857== by 0x45944C: bit_realloc (bitstring.c:191) ==6857== by 0x5977A9: _update_task_mask (multi_prog.c:335) ==6857== by 0x597A5E: _validate_ranks (multi_prog.c:403) ==6857== by 0x597D1E: verify_multi_name (multi_prog.c:469) ==6857== by 0x6E7B4BE: launch_p_handle_multi_prog_verify (launch_slurm.c:453) ==6857== by 0x58A25D: launch_g_handle_multi_prog_verify (launch.c:493) ==6857== by 0x58E556: _opt_args (opt.c:1927) ==6857== by 0x58A3B9: initialize_and_process_args (opt.c:270) ==6857== by 0x591F82: init_srun (srun_job.c:459) ==6857== by 0x427E70: srun (srun.c:193)
-
- 21 Oct, 2016 5 commits
-
-
Morris Jette authored
Do not process SALLOC_HINT, SBATCH_HINT or SLURM_HINT environment variables if any of the following salloc, sbatch or srun command line options are specified: -B, --cpu_bind, --hint, --ntasks-per-core, or --threads-per-core.bug 3118
-
Tim Wickberg authored
-
Morris Jette authored
Without this change, only the error was available, but no identification of the specific plugin that failed.
-
Morris Jette authored
Coverity was complaining that the return value of s_p_get_* was ignored. I added typecasting of the return value to (void) where needed. No change in logic, just making Coverity happy ;)
-
Morris Jette authored
-