This commit adds srcu_read_{,un}lock_fast(), which is similar
to srcu_read_{,un}lock_lite(), but avoids the array-indexing and
pointer-following overhead. On a microbenchmark featuring tight
loops around empty readers, this results in about a 20% speedup
compared to RCU Tasks Trace on my x86 laptop.
Please note that SRCU-fast has drawbacks compared to RCU Tasks
Trace, including:
o Lack of CPU stall warnings.
o SRCU-fast readers permitted only where rcu_is_watching().
o A pointer-sized return value from srcu_read_lock_fast() must
be passed to the corresponding srcu_read_unlock_fast().
o In the absence of readers, a synchronize_srcu() having _fast()
readers will incur the latency of at least two normal RCU grace
periods.
o RCU Tasks Trace priority boosting could be easily added.
Boosting SRCU readers is more difficult.
SRCU-fast also has a drawback compared to SRCU-lite, namely that the
return value from srcu_read_lock_fast()-fast is a 64-bit pointer and
that from srcu_read_lock_lite() is only a 32-bit int.
[ paulmck: Apply feedback from Akira Yokosawa. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
||
|---|---|---|
| .. | ||
| acpi | ||
| asm-generic | ||
| clocksource | ||
| crypto | ||
| cxl | ||
| drm | ||
| dt-bindings | ||
| hyperv | ||
| keys | ||
| kunit | ||
| kvm | ||
| linux | ||
| math-emu | ||
| media | ||
| memory | ||
| misc | ||
| net | ||
| pcmcia | ||
| ras | ||
| rdma | ||
| rv | ||
| scsi | ||
| soc | ||
| sound | ||
| target | ||
| trace | ||
| uapi | ||
| ufs | ||
| vdso | ||
| video | ||
| xen | ||