twx-linux/tools/testing/selftests
Jakub Kicinski 95d1815f09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter.

2) Add DATA_SENT state to SCTP connection tracking helper, from
   Sriram Yagnaraman.

3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal.

4) Add bitmask support for ipset, from Vishwanath Pai.

5) Handle icmpv6 redirects as RELATED, from Florian Westphal.

6) Add WARN_ON_ONCE() to impossible case in flowtable datapath,
   from Li Qiong.

7) A large batch of IPVS updates to replace timer-based estimators by
   kthreads to scale up wrt. CPUs and workload (millions of estimators).

Julian Anastasov says:

	This patchset implements stats estimation in kthread context.
It replaces the code that runs on single CPU in timer context every 2
seconds and causing latency splats as shown in reports [1], [2], [3].
The solution targets setups with thousands of IPVS services,
destinations and multi-CPU boxes.

	Spread the estimation on multiple (configured) CPUs and multiple
time slots (timer ticks) by using multiple chains organized under RCU
rules.  When stats are not needed, it is recommended to use
run_estimation=0 as already implemented before this change.

RCU Locking:

- As stats are now RCU-locked, tot_stats, svc and dest which
hold estimator structures are now always freed from RCU
callback. This ensures RCU grace period after the
ip_vs_stop_estimator() call.

Kthread data:

- every kthread works over its own data structure and all
such structures are attached to array. For now we limit
kthreads depending on the number of CPUs.

- even while there can be a kthread structure, its task
may not be running, eg. before first service is added or
while the sysctl var is set to an empty cpulist or
when run_estimation is set to 0 to disable the estimation.

- the allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators

- a task and its structure may be released if all
estimators are unlinked from its chains, leaving the
slot in the array empty

- every kthread data structure allows limited number
of estimators. Kthread 0 is also used to initially
calculate the max number of estimators to allow in every
chain considering a sub-100 microsecond cond_resched
rate. This number can be from 1 to hundreds.

- kthread 0 has an additional job of optimizing the
adding of estimators: they are first added in
temp list (est_temp_list) and later kthread 0
distributes them to other kthreads. The optimization
is based on the fact that newly added estimator
should be estimated after 2 seconds, so we have the
time to offload the adding to chain from controlling
process to kthread 0.

- to add new estimators we use the last added kthread
context (est_add_ktid). The new estimators are linked to
the chains just before the estimated one, based on add_row.
This ensures their estimation will start after 2 seconds.
If estimators are added in bursts, common case if all
services and dests are initially configured, we may
spread the estimators to more chains and as result,
reducing the initial delay below 2 seconds.

Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.

The new IPVS estimators do not use workqueue infrastructure
because:

- The estimation can take long time when using multiple IPVS rules (eg.
  millions estimator structures) and especially when box has multiple
  CPUs due to the for_each_possible_cpu usage that expects packets from
  any CPU. With est_nice sysctl we have more control how to prioritize the
  estimation kthreads compared to other processes/kthreads that have
  latency requirements (such as servers). As a benefit, we can see these
  kthreads in top and decide if we will need some further control to limit
  their CPU usage (max number of structure to estimate per kthread).

- with kthreads we run code that is read-mostly, no write/lock
  operations to process the estimators in 2-second intervals.

- work items are one-shot: as estimators are processed every
  2 seconds, they need to be re-added every time. This again
  loads the timers (add_timer) if we use delayed works, as there are
  no kthreads to do the timings.

[1] Report from Yunhong Jiang:
    https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
    https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  ipvs: run_estimation should control the kthread tasks
  ipvs: add est_cpulist and est_nice sysctl vars
  ipvs: use kthreads for stats estimation
  ipvs: use u64_stats_t for the per-cpu counters
  ipvs: use common functions for stats allocation
  ipvs: add rcu protection to stats
  netfilter: flowtable: add a 'default' case to flowtable datapath
  netfilter: conntrack: set icmpv6 redirects as RELATED
  netfilter: ipset: Add support for new bitmask parameter
  netfilter: conntrack: merge ipv4+ipv6 confirm functions
  netfilter: conntrack: add sctp DATA_SENT state
  netfilter: nft_inner: fix IS_ERR() vs NULL check
====================

Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12 14:45:36 -08:00
..
alsa selftests: alsa: Handle pkg-config failure more gracefully 2022-05-31 18:02:18 +02:00
amd-pstate cpufreq: amd-pstate: Add explanation for X86_AMD_PSTATE_UT 2022-10-05 11:05:18 -06:00
arm64 Merge branch 'for-next/kselftest' into for-next/core 2022-09-30 09:18:11 +01:00
bpf selftests/bpf: test case for relaxed prunning of active_lock.id 2022-12-10 13:36:22 -08:00
breakpoints
capabilities
cgroup - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
clone3
core
cpu-hotplug selftests/cpu-hotplug: Add log info when test success 2022-10-05 11:05:18 -06:00
cpufreq
damon selftest/damon: add a test for duplicate context dirs creation 2022-10-03 14:03:06 -07:00
dma selftests dma: fix compile error for dma_map_benchmark 2022-06-16 14:03:21 -06:00
dmabuf-heaps
drivers selftests: mlxsw: Move IPv6 decap_error test to shared directory 2022-12-08 18:46:32 -08:00
efivarfs
exec linux-kselftest-next-5.18-rc1 2022-03-23 12:53:00 -07:00
filesystems Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
firmware selftests: firmware: Add firmware upload selftests 2022-04-29 16:49:36 +02:00
fpu
ftrace selftests/ftrace: fix dynamic_events dependency check 2022-10-18 14:27:23 -06:00
futex selftests/futex: fix build for clang 2022-10-18 14:13:11 -06:00
gpio selftests: gpio: fix include path to kernel headers for out of tree builds 2022-07-20 14:35:18 +02:00
ia64
intel_pstate selftests/intel_pstate: fix build for ARCH=x86_64 2022-10-18 14:13:19 -06:00
ipc
ir kselftests/ir : Improve readability of modprobe error message 2022-05-16 13:34:19 -06:00
kcmp selftests/kcmp: Make the test output consistent and clear 2022-07-08 10:55:43 -06:00
kexec selftests/kexec: fix build for ARCH=x86_64 2022-10-18 14:13:25 -06:00
kmod
kselftest
kvm KVM: selftests: add svm part to triple_fault_test 2022-11-17 11:40:00 -05:00
landlock selftests/landlock: Build without static libraries 2022-10-19 22:10:56 +02:00
lib
livepatch Merge branch 'for-6.1/sysfs-patched-object' into for-linus 2022-10-05 13:00:03 +02:00
lkdtm lkdtm: Update tests for memcpy() run-time warnings 2022-09-07 16:37:27 -07:00
locking
media_tests
membarrier
memfd
memory-hotplug selftests/memory-hotplug: Remove the redundant warning information 2022-10-18 14:21:18 -06:00
mincore
mount
mount_setattr
move_mount_set_group
mqueue selftests: mqueue: drop duplicate min definition 2022-04-19 19:28:47 -06:00
nci NFC: nci: Extend virtual NCI deinit test 2022-11-21 10:49:58 +00:00
net selftests: net: Fix O=dir builds 2022-12-08 19:26:18 -08:00
netfilter netfilter: conntrack: set icmpv6 redirects as RELATED 2022-11-30 23:01:20 +01:00
nolibc selftests/nolibc: Avoid generated files being committed 2022-08-31 05:17:45 -07:00
nsfs
ntb
openat2
perf_events selftests/perf_events: Add a SIGTRAP stress test with disables 2022-10-17 16:32:06 +02:00
pid_namespace selftests: fix header dependency for pid_namespace selftests 2022-04-04 13:32:31 -06:00
pidfd selftests/pidfd_test: Remove the erroneous ',' 2022-11-02 03:09:57 -06:00
powerpc selftests/powerpc: Update bhrb filter sampling test for multiple branch filters 2022-09-28 19:22:13 +10:00
prctl
proc proc: test how it holds up with mapping'less process 2022-10-11 18:51:11 -07:00
pstore
ptp
ptrace
rcutorture torture: Create kvm-check-branches.sh output in proper location 2022-06-21 15:57:04 -07:00
resctrl selftests/resctrl: Fix null pointer dereference on open failed 2022-04-26 09:20:00 -06:00
rlimits
rseq selftests/rseq: check if libc rseq support is registered 2022-06-28 09:08:28 +02:00
rtc
safesetid LSM: SafeSetID: add setgroups() testing to selftest 2022-07-15 18:24:42 +00:00
sched
seccomp selftests/seccomp: Fix compile warning when CC=clang 2022-07-27 12:12:16 -07:00
sgx selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning 2022-08-15 16:50:07 -06:00
sigaltstack
size
sparc64
splice
static_keys
sync remove CONFIG_ANDROID 2022-07-01 10:41:09 +02:00
syscall_user_dispatch
sysctl selftests/sysctl: add sysctl macro test 2022-05-03 10:15:07 +02:00
tc-testing selftests: tc-testing: Add matchJSON to tdc 2022-10-26 20:22:33 -07:00
timens Revert "selftests/timens: add a test for vfork+exit" 2022-09-13 10:38:43 -07:00
timers selftests: timers: clocksource-switch: adapt to kselftest framework 2022-07-14 14:36:52 -06:00
tmpfs
tpm2 selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle 2022-10-05 00:25:56 +03:00
uevent
user
user_events tracing/user_events: Use bits vs bytes for enabled status page data 2022-09-29 10:17:37 -04:00
vDSO selftests/vDSO: fix array_size.cocci warning 2022-04-04 13:27:11 -06:00
vm - Alistair Popple has a series which addresses a race which causes page 2022-10-14 12:28:43 -07:00
watchdog
wireguard wireguard: selftests: do not install headers on UML 2022-09-20 11:26:14 -07:00
x86 selftests/x86/corrupt_xstate_header: Use provided __cpuid_count() macro 2022-04-25 15:13:03 -06:00
zram
.gitignore
gen_kselftest_tar.sh
kselftest_deps.sh selftests: Make the usage formatting consistent in kselftest_deps.sh 2022-06-27 14:14:27 -06:00
kselftest_harness.h selftests/harness: Pass variant to teardown 2022-04-04 13:37:48 -06:00
kselftest_install.sh
kselftest_module.h selftest: Taint kernel when test module loaded 2022-07-11 16:58:11 -06:00
kselftest.h selftests: Provide local define of __cpuid_count() 2022-04-25 15:12:36 -06:00
lib.mk selftests: net: Fix cross-tree inclusion of scripts 2022-10-20 21:09:22 -07:00
Makefile selftests: Add a basic HSR test. 2022-12-01 20:26:22 -08:00
run_kselftest.sh