twx-linux/tools/include/uapi/linux
Jason Xing 45e359be1c net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10 14:48:29 +02:00
..
tc_act headers: Remove some left-over license text 2022-09-27 07:48:01 -07:00
bits.h tools headers: Synchronize uapi/linux/bits.h with the kernel sources 2025-05-20 12:57:19 -03:00
bpf_common.h
bpf_perf_event.h
bpf.h bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE 2025-05-30 19:53:51 -07:00
btf.h docs/bpf: Document the semantics of BTF tags with kind_flag 2025-02-05 16:17:59 -08:00
const.h treewide: fix typo 'unsigned __init128' -> 'unsigned __int128' 2025-03-05 12:00:03 -05:00
elf.h tools/include: Add uapi/linux/elf.h 2025-03-03 20:00:12 +01:00
erspan.h
fadvise.h
fanotify.h selftests/fs/mount-notify: build with tools include dir 2025-05-12 11:40:12 +02:00
filter.h
fs.h tools headers UAPI: sync linux/fs.h with the kernel sources 2025-05-11 17:48:16 -07:00
fscrypt.h tools headers: Update the fs headers with the kernel sources 2025-06-16 14:05:10 -03:00
hw_breakpoint.h Move bp_type_idx to include/linux/hw_breakpoint.h 2023-03-10 21:05:16 +01:00
if_link.h netkit: Allow for configuring needed_{head,tail}room 2025-01-06 09:48:49 +01:00
if_tun.h
if_xdp.h net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
in.h tools headers: Update the socket headers with the kernel sources 2025-04-10 09:28:24 -07:00
io_uring.h tools headers: Grab copy of io_uring.h 2023-10-19 16:42:03 -06:00
kcmp.h
kvm.h tools headers UAPI: Sync linux/kvm.h with the kernel sources 2025-06-16 14:05:11 -03:00
memfd.h selftests/mm: fix additional build errors for selftests 2024-04-25 20:56:42 -07:00
mman.h mm: add MAP_DROPPABLE for designating always lazily freeable mappings 2024-07-19 20:22:12 +02:00
mount.h selftests/fs/statmount: build with tools include dir 2025-05-12 11:40:12 +02:00
netdev.h net: devmem: TCP tx netlink api 2025-05-13 11:12:48 +02:00
netlink.h
nsfs.h selftests/fs/statmount: build with tools include dir 2025-05-12 11:40:12 +02:00
perf_event.h perf/uapi: Clean up <uapi/linux/perf_event.h> a bit 2025-05-22 11:03:41 +02:00
pkt_cls.h net/sched: Remove uapi support for tcindex classifier 2024-01-02 14:25:51 +00:00
pkt_sched.h net/sched: Remove uapi support for CBQ qdisc 2024-01-02 14:25:51 +00:00
prctl.h tools headers: Synchronize prctl.h ABI header 2025-05-21 13:57:41 +02:00
seccomp.h tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems 2023-09-13 08:48:48 -03:00
seg6_local.h
seg6.h
stat.h tools headers: Update the fs headers with the kernel sources 2025-06-16 14:05:10 -03:00
stddef.h stddef: make __struct_group() UAPI C++-friendly 2024-12-20 09:05:53 -08:00
tcp.h
tls.h
types.h tools/include: make uapi/linux/types.h usable from assembly 2025-04-06 12:55:31 -07:00
userfaultfd.h selftests/mm: fix additional build errors for selftests 2024-04-25 20:56:42 -07:00