- 04 Aug, 2017 20 commits
-
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- fix I/O engine finalization order. - _send_progress may be called when I/O engine it's not yet operating. - fix I/O engine message queues management Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Latency results: 1 0.000079 2 0.000082 4 0.000078 8 0.000084 16 0.000084 32 0.000083 64 0.000091 128 0.000100 256 0.000113 512 0.000125 1024 0.000177 2048 0.000427 4096 0.000514 8192 0.000522 16384 0.000590 32768 0.000872 65536 0.001745 131072 0.002758 262144 0.004833 524288 0.009263 1048576 0.018100 2097152 0.036140 4194304 0.071862 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Since we setting TCP_NODELAY option on the socket, data is not buffered anymore. This means that original approach that assumes that we write the header and then move to the message payload will result in 2 TCP messages instead desired one. This commit uses writev to enable atomic non-contiguous data send. Latency: 1 0.000084 2 0.000112 4 0.000109 8 0.000107 16 0.000102 32 0.000101 64 0.000102 128 0.000114 256 0.000119 512 0.000134 1024 0.000173 2048 0.000411 4096 0.000517 8192 0.000526 16384 0.000587 32768 0.000894 65536 0.001694 131072 0.002778 262144 0.004981 524288 0.009312 1048576 0.018522 2097152 0.036365 4194304 0.072543 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Use TCP_NODELAY to solve performance overhead. Latency results for direct TCP connections: (small message latency significantly improved) 1 0.000244 2 0.000238 4 0.000239 8 0.000238 16 0.000237 32 0.000240 64 0.000250 128 0.000252 256 0.000250 512 0.000254 1024 0.000335 2048 0.000327 4096 0.000530 8192 0.000540 16384 0.000576 32768 0.000953 65536 0.001634 131072 0.002767 262144 0.004726 524288 0.009316 1048576 0.018161 2097152 0.036179 4194304 0.071893 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Add some developer code that will allow to measure the latency of the communication subsystem used. Here is initial results for existing communication options: Environment: SLURM_PMIX_DIRECT_CONN=[true | false] SLURM_PMIX_TIMEOUT=1000000 SLURM_PMIX_WANT_PP=1 SLURM_PMIX_PP_LOW_PWR2=0 SLURM_PMIX_PP_UP_PWR2=22 SLURM_PMIX_PP_ITER_SMALL=100 SLURM_PMIX_PP_ITER_LARGE=20 Launch cmdline: $ srun --mpi=pmix -N2 -n2 sleep 10000 SLURM RPC results: 1 0.000669 2 0.000669 4 0.000669 8 0.000659 16 0.000668 32 0.000664 64 0.000668 128 0.000661 256 0.000668 512 0.000692 1024 0.000776 2048 0.000750 4096 0.000743 8192 0.000819 16384 0.001230 32768 0.001385 65536 0.002137 131072 0.003798 262144 0.006664 524288 0.011702 1048576 0.021605 2097152 0.042588 4194304 0.084673 Direct TCP connections (better for large messages, but not yet for small): 1 0.078999 2 0.078999 4 0.078999 8 0.078999 16 0.078999 32 0.078999 64 0.079000 128 0.078999 256 0.079000 512 0.079000 1024 0.078998 2048 0.000409 4096 0.000639 8192 0.000521 16384 0.000618 32768 0.000933 65536 0.001622 131072 0.002749 262144 0.004704 524288 0.009281 1048576 0.018133 2097152 0.036172 4194304 0.071875 Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- SLURM protocol is used to initiate communication and deliver sender's TCP port number to the receiver. - After the first message is received all communication goes through the direct TCP connections. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- Introduce nitation of endpoint to distinguish between broadcast and point-to-point exchanges. - Fix magic number naming: change all PMIX_ prefixes to PMIXP_ to avoid conflicts with PMIx library macroses. - Fix I/O engine cleanup logic as now I/O engines are used more than once. - make a namespace subsystem to return xmalloc'd pointers as opposed to malloc'd to avoid confusion as everything else is xmallox'd. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- Introduce data structures reuse to avoid extra malloc/free overhead. - Introduce direct TCP connection notation (disabled for now) This commit is intended for testing SLURM protocol portion of the code to see that nothing is broken. Direct communication will be enabled in the next commit. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- Provide ability to initialize I/O engine without knowing fd this will be used when we will be sending connection request using SLURM protocol and queuing all other send's waiting for connection to be established. - Simplify some I/O API and I/O internals. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- sort out API: headers and callbacks - Change "pmix_" prefix to "pmixp_" for previously unused send functions. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
Remove "health check" functionality as it is not needed. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
This field is a stub for now, but will be used to pass the port number where corresponding stepd is listening for incoming connections. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Artem Polyakov authored
- fix some function names - separate generic I/O from the slurm protocol specific. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
-
Morris Jette authored
Coverity CID 45177
-
Morris Jette authored
Coverity, CID 44795
-
Morris Jette authored
Coverity CID 171452
-
- 03 Aug, 2017 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
All of these were pre-existing Coverity errors, but I changed nearby code, variable names, etc. so they looked like new errors.
-
Morris Jette authored
-
Morris Jette authored
Coverity reported problem, CID 45194
-
Morris Jette authored
CID 44936
-
Morris Jette authored
-
Morris Jette authored
Coverity CID 171494
-
Morris Jette authored
-
Morris Jette authored
-
- 02 Aug, 2017 10 commits
-
-
Tim Wickberg authored
Bug 3956.
-
Tim Shaw authored
Add translation code for the RPCs as well. Bug 3956.
-
Morris Jette authored
-
Morris Jette authored
Add pack_job_id and pack_job_offset to accounting database. Modified sacct to accept pack job ID specification using "#+#" notation. Modified sstat to accept pack job ID specification using "#+#" notation.
-
Morris Jette authored
-
Tim Wickberg authored
-
Dominik Bartkiewicz authored
NULL is returned if the token is not found, testing against '\0' is wrong (although does work okay in older compilers). Fixes new GCC 7.1 warning.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
was matching more than expected.
-