- 04 Aug, 2017 40 commits
-
-
Artem Polyakov authored
-
Danny Auble authored
just unfreed memory, no real concern.
-
Artem Polyakov authored
-
Danny Auble authored
-
Danny Auble authored
-
Artem Polyakov authored
Replase "bool" with "int" as the return type of `pmixp_io_fd()`. This was causing an interesting hidden bug affecting the performance. Since this function was returning boolean value it was always returning "1" instead of actual fd number. fd=1 is set to /dev/null for slurmstepd, this is a char device that is always read/write ready from the poll() perspective. So poll was continuously interrupting and progress was fine, but CPU usage was ~100%.
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
(introduced in prev commit "Fix collective error path (timeout)" Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
UCP worker has to be re-armed after ucp_progress was called on it. Original implementaition was done with wrong considerations in mind. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Danny Auble authored
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
-
Artem Polyakov authored
Memory hooks are installed by UCX during it's load time. To prevent that we need to `export UCX_MEM_MALLOC_HOOKS=no`. With the previous approach ucx lib was loaded during pmix plugin dlopen and we had no control over environment variables from the plugin itself. The only working variant was to add mentioned variable to slurmd's environment. To improve user experience with this feature we want to be able to transparently disable memory hooks from the plugin itself. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Put request that is completed already directly to the _rcv_complete list. Ping-pong latency effect: size UCXv8 UCXv9 1 17.7 17.8 2 17.7 17.7 4 17.9 17.7 8 18.3 18.1 16 18.4 18.3 32 18.3 18.3 64 19.0 18.8 128 19.4 19.3 256 19.8 19.5 512 19.9 19.7 1024 19.7 19.6 2048 20.9 20.7 4096 23.7 23.5 8192 27.2 27.2 16384 30.3 30.0 32768 35.5 35.3 65536 47.4 47.2 131072 71.4 71.2 262144 2197.0 2202.2 524288 2727.1 2745.2 1048576 3580.4 3577.7 2097152 6120.7 6125.0 4194304 10329.2 10265.2 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Ping-pong latency effect: size UCXv7 UCXv8 1 18.8 17.7 2 18.7 17.7 4 18.7 17.9 8 19.2 18.3 16 19.3 18.4 32 19.4 18.3 64 19.5 19.0 128 20.2 19.4 256 20.4 19.8 512 20.6 19.9 1024 20.5 19.7 2048 21.6 20.9 4096 24.6 23.7 8192 27.7 27.2 16384 30.7 30.3 32768 36.1 35.5 65536 47.8 47.4 131072 72.2 71.4 262144 2229.3 2197.0 524288 2817.8 2727.1 1048576 3693.1 3580.4 2097152 6148.5 6120.7 4194304 10230.7 10329.2 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- Avoid extra ucx_worker_progress call - Avoid extra initialization in _ucx_progress call The effect on ping-pong latency test (~500 ns improvement): size UCXv6 UCXv7 1 19.2 18.8 2 19.2 18.7 4 19.2 18.7 8 19.6 19.2 16 19.9 19.3 32 19.8 19.4 64 20.1 19.5 128 20.5 20.2 256 21.0 20.4 512 21.1 20.6 1024 20.9 20.5 2048 22.1 21.6 4096 25.3 24.6 8192 28.5 27.7 16384 31.2 30.7 32768 37.0 36.1 65536 48.1 47.8 131072 72.6 72.2 262144 2104.7 2229.3 524288 2722.0 2817.8 1048576 3756.2 3693.1 2097152 6206.3 6148.5 4194304 10281.3 10230.7 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Although a degradation observed for some of the message sizes, overall performance is more stable-growing and most of the points are improved. size UCXv5 UCXv6 1 20.3 19.2 2 18.3 19.2 4 17.7 19.2 8 20.3 19.6 16 20.3 19.9 32 20.3 19.8 64 21.0 20.1 128 21.0 20.5 256 21.7 21.0 512 22.0 21.1 1024 22.0 20.9 2048 23.0 22.1 4096 26.0 25.3 8192 29.0 28.5 16384 31.7 31.2 32768 37.3 37.0 65536 49.0 48.1 131072 74.3 72.6 262144 2227.3 2104.7 524288 2801.0 2722.0 1048576 3795.0 3756.2 2097152 6292.3 6206.3 4194304 10314.3 10281.3 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
This helps to slightly improve the performance: size UCXv4 UCXv5 1 20.0 20.3 2 20.0 18.3 4 20.0 17.7 8 21.0 20.3 16 21.0 20.3 32 21.0 20.3 64 22.0 21.0 128 22.0 21.0 256 22.7 21.7 512 23.0 22.0 1024 23.0 22.0 2048 24.7 23.0 4096 27.3 26.0 8192 30.7 29.0 16384 33.0 31.7 32768 38.7 37.3 65536 51.0 49.0 131072 76.0 74.3 262144 2275.0 2227.3 524288 2969.7 2801.0 1048576 3835.3 3795.0 2097152 6274.7 6292.3 4194304 10728.0 10314.3 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Effect on the ping-pong latency test: size UCXv3 UCXv4 1 24 20 2 22 20 4 23 20 8 30 21 16 27 21 32 22 21 64 22 22 128 29 22 256 24 22 512 26 23 1024 24 23 2048 27 24 4096 33 27 8192 31 30 16384 32 33 32768 58 38 65536 56 51 131072 79 76 262144 2258 2231 524288 3044 2868 1048576 3899 3749 2097152 6328 6210 4194304 10828 10641 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Improves ping-pong latency test results: size UCXv2 UCXv3 1 30 25 2 30 25 4 30 25 8 30 26 16 30 26 32 31 26 64 31 27 128 31 27 256 32 28 512 33 28 1024 33 28 2048 34 30 4096 37 32 8192 42 38 16384 45 42 32768 52 47 65536 64 59 131072 88 83 262144 2248 2224 524288 2879 2782 1048576 3770 3624 2097152 6234 6135 4194304 10473 10427 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Analysis indicates that slurm's `List` introduces significant overhead (up to 7 us for append operation). More lightweight and less generic version of a list container was developed for PMIx plugin with performance considerations in mind. Internal pingpong latency test results demonstrating improvements: size UCXv1 UCX v2 1 46 30 2 56 30 4 72 30 8 85 30 16 93 30 32 122 31 64 120 31 128 126 31 256 137 32 512 155 33 1024 142 33 2048 153 34 4096 153 37 8192 170 42 16384 173 45 32768 177 52 65536 218 64 131072 236 88 262144 2735 2248 524288 3254 2879 1048576 4303 3770 2097152 6917 6234 4194304 11579 10473 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Introduce 2 ways of measuring p2p latency: - node0 sends messages from the main thread, all other communication is done in the progress thread. - all communication is performed in the progress thread. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Pack header directly in the send buffer, so we don't need iovec. This helps with future UCX support. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Boris Karasev authored
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
-
Artem Polyakov authored
- Remove debug leftover that was causing undeleted temp directories - Fix compiler warnings Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- fix I/O engine finalization order. - _send_progress may be called when I/O engine it's not yet operating. - fix I/O engine message queues management Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-