Numba multi threaded parallelization of _filter_by_index_sum

in 24 seconds, using 0 compute credits, and was queued for 0 seconds
latest