twx-linux/include
Waiman Long e76d28bdf9 cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked()
When cgroup_rstat_updated() isn't being called concurrently with
cgroup_rstat_flush_locked(), its run time is pretty short. When
both are called concurrently, the cgroup_rstat_updated() run time
can spike to a pretty high value due to high cpu_lock hold time in
cgroup_rstat_flush_locked(). This can be problematic if the task calling
cgroup_rstat_updated() is a realtime task running on an isolated CPU
with a strict latency requirement. The cgroup_rstat_updated() call can
happen when there is a page fault even though the task is running in
user space most of the time.

The percpu cpu_lock is used to protect the update tree -
updated_next and updated_children. This protection is only needed when
cgroup_rstat_cpu_pop_updated() is being called. The subsequent flushing
operation which can take a much longer time does not need that protection
as it is already protected by cgroup_rstat_lock.

To reduce the cpu_lock hold time, we need to perform all the
cgroup_rstat_cpu_pop_updated() calls up front with the lock
released afterward before doing any flushing. This patch adds a new
cgroup_rstat_updated_list() function to return a singly linked list of
cgroups to be flushed.

Some instrumentation code are added to measure the cpu_lock hold time
right after lock acquisition to after releasing the lock. Parallel
kernel build on a 2-socket x86-64 server is used as the benchmarking
tool for measuring the lock hold time.

The maximum cpu_lock hold time before and after the patch are 100us and
29us respectively. So the worst case time is reduced to about 30% of
the original. However, there may be some OS or hardware noises like NMI
or SMI in the test system that can worsen the worst case value. Those
noises are usually tuned out in a real production environment to get
a better result.

OTOH, the lock hold time frequency distribution should give a better
idea of the performance benefit of the patch.  Below were the frequency
distribution before and after the patch:

     Hold time        Before patch       After patch
     ---------        ------------       -----------
       0-01 us           804,139         13,738,708
      01-05 us         9,772,767          1,177,194
      05-10 us         4,595,028              4,984
      10-15 us           303,481              3,562
      15-20 us            78,971              1,314
      20-25 us            24,583                 18
      25-30 us             6,908                 12
      30-40 us             8,015
      40-50 us             2,192
      50-60 us               316
      60-70 us                43
      70-80 us                 7
      80-90 us                 2
        >90 us                 3

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-11-12 15:23:07 -06:00
..
acpi
asm-generic Kbuild updates for v6.7 2023-11-04 08:07:19 -10:00
clocksource
crypto
drm drm next and fixes for 6.7-rc1 2023-11-07 17:10:02 -08:00
dt-bindings linux-watchdog 6.7-rc1 tag 2023-11-09 13:54:25 -08:00
keys
kunit
kvm KVM/arm64 updates for 6.7 2023-10-31 16:37:07 -04:00
linux cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() 2023-11-12 15:23:07 -06:00
math-emu
media
memory
misc
net Follow up to networking PR for 6.7 2023-11-01 16:33:20 -10:00
pcmcia
ras
rdma
rv
scsi SCSI misc on 20231102 2023-11-02 15:13:50 -10:00
soc IOMMU Updates for Linux v6.7 2023-11-09 13:37:28 -08:00
sound
target
trace Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
uapi drm next and fixes for 6.7-rc1 2023-11-07 17:10:02 -08:00
ufs
vdso
video
xen