fn run_core_latency_bench(nr_cpus: usize) -> Vec<Vec<f64>>Expand description
Run a core-to-core latency benchmark using atomic ping-pong.
Hot loop uses only wrapping_add(1) — no multiply or checked add —
so debug builds don’t inflate measurements with overflow checks.
Runs 3 attempts per pair with warmup, takes the minimum.