twx-linux/tools/include/uapi/linux
Lorenz Bauer 9c02bec959 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-25 13:55:55 -07:00
..
tc_act headers: Remove some left-over license text 2022-09-27 07:48:01 -07:00
bpf_common.h
bpf_perf_event.h tools, headers: Sync struct bpf_perf_event_data 2021-01-26 00:15:03 +01:00
bpf.h bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
btf.h bpf: Add btf enum64 support 2022-06-07 10:20:42 -07:00
const.h tools headers UAPI: Sync the linux/const.h with the kernel headers 2023-05-10 14:19:20 -03:00
erspan.h
ethtool.h tools: include: Add ethtool_drvinfo definition to UAPI header 2021-08-24 14:48:40 -07:00
fadvise.h
fcntl.h tools headers uapi: Sync linux/fcntl.h with the kernel sources 2023-07-11 12:29:23 -03:00
filter.h
fs.h treewide: uapi: Replace zero-length arrays with flexible-array members 2022-06-28 21:26:05 +02:00
fscrypt.h tools headers UAPI: Sync linux/fscrypt.h with the kernel sources 2022-12-19 12:46:36 -03:00
hw_breakpoint.h Move bp_type_idx to include/linux/hw_breakpoint.h 2023-03-10 21:05:16 +01:00
if_link.h macvlan: Add netlink attribute for broadcast cutoff 2023-03-29 09:03:32 +01:00
if_tun.h treewide: uapi: Replace zero-length arrays with flexible-array members 2022-06-28 21:26:05 +02:00
if_xdp.h selftests/xsk: add basic multi-buffer test 2023-07-19 09:56:50 -07:00
in.h tools headers UAPI: Sync the linux/in.h with the kernel sources 2023-05-26 16:03:27 -03:00
kcmp.h
kvm.h tools headers UAPI: Sync linux/kvm.h with the kernel sources 2023-07-11 12:36:38 -03:00
mman.h tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources 2023-07-11 11:41:15 -03:00
mount.h tools include UAPI: Sync linux/mount.h copy with the kernel sources 2023-07-11 13:01:23 -03:00
netdev.h xsk: add new netlink attribute dedicated for ZC max frags 2023-07-19 09:56:49 -07:00
netlink.h
openat2.h tools headers UAPI: Sync openat2.h with the kernel sources 2021-03-06 16:54:22 -03:00
perf_event.h tools include UAPI: Sync uapi/linux/perf_event.h with the kernel sources 2023-04-10 19:25:12 -03:00
pkt_cls.h treewide: uapi: Replace zero-length arrays with flexible-array members 2022-06-28 21:26:05 +02:00
pkt_sched.h sch_htb: Hierarchical QoS hardware offload 2021-01-22 20:41:29 -08:00
prctl.h tools headers UAPI: Sync linux/prctl.h with the kernel sources 2023-07-11 13:30:40 -03:00
sched.h
seg6_local.h
seg6.h treewide: uapi: Replace zero-length arrays with flexible-array members 2022-06-28 21:26:05 +02:00
stat.h tools headers uapi: Sync linux/stat.h with the kernel sources 2022-10-25 17:40:48 -03:00
stddef.h tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI 2022-11-03 13:45:21 +01:00
tcp.h
tls.h
types.h
usbdevice_fs.h treewide: uapi: Replace zero-length arrays with flexible-array members 2022-06-28 21:26:05 +02:00
vhost.h tools include UAPI: Sync linux/vhost.h with the kernel sources 2023-07-14 10:16:03 -03:00