- 23 Mar, 2016 16 commits
-
-
Tim Wickberg authored
1) Use mmap'd I/O. 2) Move buffer handling down to the individual compression routines. This will eventually allow them to read more than buffer_len at a time in order to emit a buffer_len message, and thus minimize the number of messages send. (Current logic sends the same number of messages, but the payload for each can be compressed. I expect more performance gains are available by limit the message count.) 3) Move buffer_len to a global, and change function signatures.
-
Danny Auble authored
-
Danny Auble authored
Move files from common to new dir location for bcast. This also sets up the 3 locations that need to be linked to compression libs.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
And switch CHUNK macro -> chunk within the function.
-
Tim Wickberg authored
Add orig_len to data structure to avoid guessing uncompressed size. Have zlib use this as well to avoid xrealloc() calls. (Future improvement to zlib would avoid use of the temporary buffer + memcpy calls.) Note that this is handling each block independently. Stream mode would be better, switch to that in the future for additional performance gains.
-
Tim Wickberg authored
Also change sbcast info line to print type as int rather than as bool.
-
Tim Wickberg authored
Note one quirk - SLURM_COMPRESS env var must be set to a type. Setting to 'true' or '1' will not work, when it would have enabled zlib before.
-
Tim Wickberg authored
If no type given current default is zlib. 'lz4' and 'zlib' are the only two types supported currently. Print an error if unknown type given, but continue with compression disabled.
-
Tim Wickberg authored
Stub out lz4 functions, will need to be filled in later. sbcast/srun also need work to allow them to specify compression routine.
-
Brian Christiansen authored
-
Morris Jette authored
When a node's MCDRAM configuration changes, the amount of High Bandwidth Memory (HBM) changes also. HBM is treated like a GRES that can be allocated to jobs, so we update each node's GRES value when its MCDRAM configuration changes.
-
- 22 Mar, 2016 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
Just in case some job fails to terminate as expected.
-
- 21 Mar, 2016 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # src/plugins/burst_buffer/cray/burst_buffer_cray.c
-
Danny Auble authored
gang scheduling before doing code for gang scheduling.
-
Morris Jette authored
burst_buffer/cray: Set environment variables just before starting job rather than at job submission time to reflect persistent buffers created or modified while the job is pending. bug 2545
-
Danny Auble authored
buffer is found. Bug 2576 What happened was a function was doing a double read lock which isn't awesome to begin with, but not really horrible (if all you are doing is read locks anyway). The problem was after the first lock was locked a different thread was going for a write lock and so when the second read lock came in it created deadlocked.
-
Tim Wickberg authored
Coverity 77851.
-
- 18 Mar, 2016 4 commits
-
-
Morris Jette authored
Jobs below the specified threshold will not have resources reserved for them. bug 2565
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while creating the job step. The result is that the signal hanlder gets a argument (the signal received) of zero. Here's a log, window 1: $ srun hostname srun: Job step creation temporarily disabled, retrying srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 0 srun: Cancelled pending job step Window 2: $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 .... bug 2494
-
- 17 Mar, 2016 5 commits
-
-
Morris Jette authored
Copy logic from select/cons_res to select/serial that is equivalent to commit ec50cb2f
-
Morris Jette authored
Change how a node's allocated CPU count is calculated to avoid double counting CPUs allocated to multiple jobs at the same time. Previous logic would sum the maximum number of CPUs allocated by each partition for any time slice, which could double count CPUs allocated to multiple jobs. New logic ORs bitmap of allocated CPUs for every partition and time slice, then counts the total for a given node. This avoids double counting CPUs allocated to multiple jobs, but does not remove from the count CPUs which have been allocated to jobs which might be suspended by the gang scheduler (either for time slicing or preemption).
-
Tim Wickberg authored
-
Tim Wickberg authored
Update NEWS as well.
-
Tim Wickberg authored
The uid is used as part of the hash function, must remove old reference and recalculate if it may change, otherwise _delete_assoc_hash will not find it again when the association is removed, causing slurmctld to segfault. Bug 2560.
-
- 16 Mar, 2016 7 commits
-
-
Morris Jette authored
Add --gres-flags=enforce-binding option to salloc, sbatch and srun commands. If set, the only CPUs available to the job will be those bound to the selected GRES (i.e. the CPUs identifed in the gres.conf file will be strictly enforced rather than advisory). bug 1725
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Previous gang scheduling logic maintained information about resources originally allocated to the job and made scheduling decisions on that basis. bug 2494
-
Morris Jette authored
This will improve ability to diagnose problems if the srun is killed by a signal.
-
Morris Jette authored
Update gang scheduling table when job manually suspended or resumed. Prior logic could mess up job suspend/resume sequencing. bug 2494
-
Danny Auble authored
-