- 08 Aug, 2017 8 commits
-
-
Morris Jette authored
Coverity CID 45173
-
Isaac Hartung authored
If one or more clusters in a federation are down, then print an appropriate warning and exit rather than cause the test to fail. bug 4033
-
Tim Wickberg authored
Mistake made in f9687bbc; strlcpy returns a count of chars copied instead of a pointer to the string.
-
Tim Wickberg authored
Otherwise the log will be spammed with "Buffer size limit exceeded". Bug 3624.
-
Tim Wickberg authored
Gets rid of one weird strlcpy call.
-
Tim Wickberg authored
-
Tim Wickberg authored
Ensure proper termination in places that were otherwise missing it, and remove some awkward termination handling in other locations.
-
Tim Wickberg authored
-
- 07 Aug, 2017 9 commits
-
-
Tim Wickberg authored
-
Justin Lecher authored
Starting from glibc-2.25 the macros major and minor are only available from sys/sysmacros.h. This patch uses an autoconf macro to detect the location and includes the header accordingly. Bug 3982.
-
Artem Polyakov authored
Unlike MPI, UCX requires each sender to have a unique tag, otherwise messages will get mixed up. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Fix direct connection table size to be equal to number of nodes int the job instead of job step. This was triggering assert when, for example, allocation was 16 nodes but srun was only using 2 of them and not the first ones. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Danny Auble authored
middle of the other id_*. I think it is ok to have the _id at the end for this since we want the offset near it.
-
Danny Auble authored
double ',' as we do with QOS in the TRES limits. So I removed the logic to check for it.
-
Danny Auble authored
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Bug 4019
-
- 05 Aug, 2017 1 commit
-
-
Morris Jette authored
Change to slurm_mutex_init Coverity CID 171460
-
- 04 Aug, 2017 22 commits
-
-
Morris Jette authored
-
Artem Polyakov authored
-
Danny Auble authored
just unfreed memory, no real concern.
-
Artem Polyakov authored
-
Danny Auble authored
-
Danny Auble authored
-
Artem Polyakov authored
Replase "bool" with "int" as the return type of `pmixp_io_fd()`. This was causing an interesting hidden bug affecting the performance. Since this function was returning boolean value it was always returning "1" instead of actual fd number. fd=1 is set to /dev/null for slurmstepd, this is a char device that is always read/write ready from the poll() perspective. So poll was continuously interrupting and progress was fine, but CPU usage was ~100%.
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
(introduced in prev commit "Fix collective error path (timeout)" Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
UCP worker has to be re-armed after ucp_progress was called on it. Original implementaition was done with wrong considerations in mind. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Danny Auble authored
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
-
Artem Polyakov authored
Memory hooks are installed by UCX during it's load time. To prevent that we need to `export UCX_MEM_MALLOC_HOOKS=no`. With the previous approach ucx lib was loaded during pmix plugin dlopen and we had no control over environment variables from the plugin itself. The only working variant was to add mentioned variable to slurmd's environment. To improve user experience with this feature we want to be able to transparently disable memory hooks from the plugin itself. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Put request that is completed already directly to the _rcv_complete list. Ping-pong latency effect: size UCXv8 UCXv9 1 17.7 17.8 2 17.7 17.7 4 17.9 17.7 8 18.3 18.1 16 18.4 18.3 32 18.3 18.3 64 19.0 18.8 128 19.4 19.3 256 19.8 19.5 512 19.9 19.7 1024 19.7 19.6 2048 20.9 20.7 4096 23.7 23.5 8192 27.2 27.2 16384 30.3 30.0 32768 35.5 35.3 65536 47.4 47.2 131072 71.4 71.2 262144 2197.0 2202.2 524288 2727.1 2745.2 1048576 3580.4 3577.7 2097152 6120.7 6125.0 4194304 10329.2 10265.2 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Ping-pong latency effect: size UCXv7 UCXv8 1 18.8 17.7 2 18.7 17.7 4 18.7 17.9 8 19.2 18.3 16 19.3 18.4 32 19.4 18.3 64 19.5 19.0 128 20.2 19.4 256 20.4 19.8 512 20.6 19.9 1024 20.5 19.7 2048 21.6 20.9 4096 24.6 23.7 8192 27.7 27.2 16384 30.7 30.3 32768 36.1 35.5 65536 47.8 47.4 131072 72.2 71.4 262144 2229.3 2197.0 524288 2817.8 2727.1 1048576 3693.1 3580.4 2097152 6148.5 6120.7 4194304 10230.7 10329.2 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- Avoid extra ucx_worker_progress call - Avoid extra initialization in _ucx_progress call The effect on ping-pong latency test (~500 ns improvement): size UCXv6 UCXv7 1 19.2 18.8 2 19.2 18.7 4 19.2 18.7 8 19.6 19.2 16 19.9 19.3 32 19.8 19.4 64 20.1 19.5 128 20.5 20.2 256 21.0 20.4 512 21.1 20.6 1024 20.9 20.5 2048 22.1 21.6 4096 25.3 24.6 8192 28.5 27.7 16384 31.2 30.7 32768 37.0 36.1 65536 48.1 47.8 131072 72.6 72.2 262144 2104.7 2229.3 524288 2722.0 2817.8 1048576 3756.2 3693.1 2097152 6206.3 6148.5 4194304 10281.3 10230.7 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Although a degradation observed for some of the message sizes, overall performance is more stable-growing and most of the points are improved. size UCXv5 UCXv6 1 20.3 19.2 2 18.3 19.2 4 17.7 19.2 8 20.3 19.6 16 20.3 19.9 32 20.3 19.8 64 21.0 20.1 128 21.0 20.5 256 21.7 21.0 512 22.0 21.1 1024 22.0 20.9 2048 23.0 22.1 4096 26.0 25.3 8192 29.0 28.5 16384 31.7 31.2 32768 37.3 37.0 65536 49.0 48.1 131072 74.3 72.6 262144 2227.3 2104.7 524288 2801.0 2722.0 1048576 3795.0 3756.2 2097152 6292.3 6206.3 4194304 10314.3 10281.3 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-