- 23 Mar, 2016 25 commits
-
-
Tim Wickberg authored
Relies on Linux-specific behavior of setfsuid/gid, so disable on other platforms for now.
-
Danny Auble authored
Fix linking issues to only have the compression libs link to file_bcast which intern get pulled in my anyone linking to the .la.
-
Danny Auble authored
Move compression stuff over to the file_bcast lib, so we only have to link directly with that and not pull libs all over the place.
-
Tim Wickberg authored
- Fix issue where max_out calculation for zlib was incorrect - use deflateBound to properly calculate required buffer size, 1024 is not sufficient padding for uncompressable input. - Work towards sending file offsets across the wire in preparation for mmap'd output for lz4.
-
Tim Wickberg authored
-
Tim Wickberg authored
Rather than compressing at most block_len into a message, compress up to (10 * block_len) into a single message. 10x arbitrarily chosen to mitigate buffer issues when uncompresssing in slurmd. Testing against a 100MB zero file, this reduces the messages required from 13 to 2.
-
Tim Wickberg authored
-
Tim Wickberg authored
Regression in last commit required zlib to send each chunk as a separate block, rather than packing multiple chunks per block.
-
Tim Wickberg authored
-
Tim Wickberg authored
1) Use mmap'd I/O. 2) Move buffer handling down to the individual compression routines. This will eventually allow them to read more than buffer_len at a time in order to emit a buffer_len message, and thus minimize the number of messages send. (Current logic sends the same number of messages, but the payload for each can be compressed. I expect more performance gains are available by limit the message count.) 3) Move buffer_len to a global, and change function signatures.
-
Danny Auble authored
-
Danny Auble authored
Move files from common to new dir location for bcast. This also sets up the 3 locations that need to be linked to compression libs.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
And switch CHUNK macro -> chunk within the function.
-
Tim Wickberg authored
Add orig_len to data structure to avoid guessing uncompressed size. Have zlib use this as well to avoid xrealloc() calls. (Future improvement to zlib would avoid use of the temporary buffer + memcpy calls.) Note that this is handling each block independently. Stream mode would be better, switch to that in the future for additional performance gains.
-
Tim Wickberg authored
Also change sbcast info line to print type as int rather than as bool.
-
Tim Wickberg authored
Note one quirk - SLURM_COMPRESS env var must be set to a type. Setting to 'true' or '1' will not work, when it would have enabled zlib before.
-
Tim Wickberg authored
If no type given current default is zlib. 'lz4' and 'zlib' are the only two types supported currently. Print an error if unknown type given, but continue with compression disabled.
-
Tim Wickberg authored
Stub out lz4 functions, will need to be filled in later. sbcast/srun also need work to allow them to specify compression routine.
-
Brian Christiansen authored
-
Morris Jette authored
When a node's MCDRAM configuration changes, the amount of High Bandwidth Memory (HBM) changes also. HBM is treated like a GRES that can be allocated to jobs, so we update each node's GRES value when its MCDRAM configuration changes.
-
- 22 Mar, 2016 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
Just in case some job fails to terminate as expected.
-
- 21 Mar, 2016 6 commits
-
-
Danny Auble authored
-
Danny Auble authored
# Conflicts: # src/plugins/burst_buffer/cray/burst_buffer_cray.c
-
Danny Auble authored
gang scheduling before doing code for gang scheduling.
-
Morris Jette authored
burst_buffer/cray: Set environment variables just before starting job rather than at job submission time to reflect persistent buffers created or modified while the job is pending. bug 2545
-
Danny Auble authored
buffer is found. Bug 2576 What happened was a function was doing a double read lock which isn't awesome to begin with, but not really horrible (if all you are doing is read locks anyway). The problem was after the first lock was locked a different thread was going for a write lock and so when the second read lock came in it created deadlocked.
-
Tim Wickberg authored
Coverity 77851.
-
- 18 Mar, 2016 4 commits
-
-
Morris Jette authored
Jobs below the specified threshold will not have resources reserved for them. bug 2565
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Avoid possibly aborting srun that gets simultaneous SIGSTOP+SIGCONT while creating the job step. The result is that the signal hanlder gets a argument (the signal received) of zero. Here's a log, window 1: $ srun hostname srun: Job step creation temporarily disabled, retrying srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 18 srun: I Got signal 0 srun: Cancelled pending job step Window 2: $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 $ kill -STOP 18696 ; kill -CONT 18696 .... bug 2494
-
- 17 Mar, 2016 3 commits
-
-
Morris Jette authored
Copy logic from select/cons_res to select/serial that is equivalent to commit ec50cb2f
-
Morris Jette authored
Change how a node's allocated CPU count is calculated to avoid double counting CPUs allocated to multiple jobs at the same time. Previous logic would sum the maximum number of CPUs allocated by each partition for any time slice, which could double count CPUs allocated to multiple jobs. New logic ORs bitmap of allocated CPUs for every partition and time slice, then counts the total for a given node. This avoids double counting CPUs allocated to multiple jobs, but does not remove from the count CPUs which have been allocated to jobs which might be suspended by the gang scheduler (either for time slicing or preemption).
-
Tim Wickberg authored
-