Profiling shows that this is a highly contested mutex, causing dimishing results compared to a OS lock. std::mutex implementations can spin for a while before falling back to an OS lock. This avoids wasting precious CPU cycles in a no-op.