5d26eaae15
82119 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
fbde105f13 |
bpf-fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
/hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
+Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
=77JH
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Finish constification of 1st parameter of bpf_d_path() (Rong Tao)
- Harden userspace-supplied xdp_desc validation (Alexander Lobakin)
- Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
Borkmann)
- Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)
- Use correct context to unpin bpf hash map with special types (KaFai
Wan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for unpinning htab with internal timer struct
bpf: Avoid RCU context warning when unpinning htab with internal structs
xsk: Harden userspace-supplied xdp_desc validation
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
bpf: Finish constification of 1st parameter of bpf_d_path()
|
||
|
|
8bd9238e51 |
Some messenger improvements from Eric and Max, a patch to address the
issue (also affected userspace) of incorrect permissions being granted to users who have access to multiple different CephFS instances within the same cluster from Kotresh and a bunch of assorted CephFS fixes from Slava. -----BEGIN PGP SIGNATURE----- iQFFBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmjpShsTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzizoaB/C7qTw5Olh8NDpX8I+ljEO50XNduurf fp2eNn0ub5brhcvh8iACSPKE2oer/bDvv3b2SN9310GmBX3f7H2Ht5TeH2tGBRN0 clg+C2DmY/2watHovo+ua7YAd+HiPH2XMbpeU38Pu1nEdmiU6cQ0YaOn8n2p+c1E bID0dMHWb4HTmFRURqWqKPDkM1fLHRxIVgyOMaov5vs0T7XdglwPja3S2W6epvqF hKSMSvO/j9qYlOsBM6G6IuHDMJomzBqOQKqsQqC4XZN6uXeaKPTLYRnzxKfJUEWj P5JTaum7NGGtfIs0L9wr6zpou/GY2zTFiyXhLZsLJMn894bBO5nArg== =wGIB -----END PGP SIGNATURE----- Merge tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: - some messenger improvements (Eric and Max) - address an issue (also affected userspace) of incorrect permissions being granted to users who have access to multiple different CephFS instances within the same cluster (Kotresh) - a bunch of assorted CephFS fixes (Slava) * tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client: ceph: add bug tracking system info to MAINTAINERS ceph: fix multifs mds auth caps issue ceph: cleanup in ceph_alloc_readdir_reply_buffer() ceph: fix potential NULL dereference issue in ceph_fill_trace() libceph: add empty check to ceph_con_get_out_msg() libceph: pass the message pointer instead of loading con->out_msg libceph: make ceph_con_get_out_msg() return the message pointer ceph: fix potential race condition on operations with CEPH_I_ODIRECT flag ceph: refactor wake_up_bit() pattern of calling ceph: fix potential race condition in ceph_ioctl_lazyio() ceph: fix overflowed constant issue in ceph_do_objects_copy() ceph: fix wrong sizeof argument issue in register_session() ceph: add checking of wait_for_completion_killable() return value ceph: make ceph_start_io_*() killable libceph: Use HMAC-SHA256 library instead of crypto_shash |
||
|
|
07ca98f906 |
xsk: Harden userspace-supplied xdp_desc validation
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function old new delta
xskq_cons_peek_desc 299 330 +31
xsk_tx_peek_release_desc_batch 973 1002 +29
xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
80b7065ec1 |
Bunch of unrelated fixes
- polling fix for trans fd that ought to have been fixed otherwise back in March, but apparently came back somewhere else... - USB transport buffer overflow fix - Some dentry lifetime rework to handle metadata update for currently opened files in uncached mode, or inode type change in cached mode - a double-put on invalid flush found by syzbot - and finally /sys/fs/9p/caches not advancing buffer and overwriting itself for large contents Thanks to everyone involved! -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmjns5cACgkQq06b7GqY 5nDGLg/+Ltsiwj51CDwyJxFPX4ki1jM3j2iA8BNpdjjzCev6Jp1vuWBPiV+sTtgM zKmIw9joyvzaIXKzA3h7tMe7h7y8H4p7KZyD3YSDSNW5XIb5UYXeQwYVw39OxENk nVQkQ41H+bPepmz3R0t/F4la9myVLBtfdcPWJN+JfkOLgdeemCGf+TShJ6F+9Qaq PTd3UKfdv8JRlKQmeLH7pPlBoWYRoC2Tq3pZbyC3D1A50bWkTfxusW0k3MUzRyHO t/jNbhHt0ai+densQ6O5M+ALMI7KIIzR+obVaBHfzSUcwFeGhmW2x84Mo0qX6YMF 0B/fY3mCXi8QO6my1hfwmbRtY5mEV967wM/uF7E/SuvyNVizOMBZn8FgWfVPVszr ix1iNBgQJdoOEMbHvDfCAtGNorFrMiybyLvoabL7YCwFo68LdYqj2br3/feajoYM 5g46nUcidH4fVBLtiBGE8w0ko8aa3zLB4iu/fdATuEZ1r+gPt40SWo1mKahoCgY3 BOtYxEZTnd4l4GPJDQIaYgVckgSiex0xcE9pRJ5LtBwdyw9g4jwfb39WHx4MT2sE u5xWDbadU/ZM6ppLJ/iBkotvU9D7ohKTviN1IfCwjqL7IMBNz5c+nBwLQZWK4YTR 2ySKpv1VOMROglt1KvnHfdpmdJ+CjAJLLNE+XSz9AHXinK5v8aM= =bUN1 -----END PGP SIGNATURE----- Merge tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: "A bunch of unrelated fixes: - polling fix for trans fd that ought to have been fixed otherwise back in March, but apparently came back somewhere else... - USB transport buffer overflow fix - Some dentry lifetime rework to handle metadata update for currently opened files in uncached mode, or inode type change in cached mode - a double-put on invalid flush found by syzbot - and finally /sys/fs/9p/caches not advancing buffer and overwriting itself for large contents Thanks to everyone involved!" * tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux: 9p: sysfs_init: don't hardcode error to ENOMEM 9p: fix /sys/fs/9p/caches overwriting itself 9p: clean up comment typos 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN net/9p: fix double req put in p9_fd_cancelled net/9p: Fix buffer overflow in USB transport layer fs/9p: Add p9_debug(VFS) in d_revalidate fs/9p: Invalidate dentry if inode type change detected in cached mode fs/9p: Refresh metadata in d_revalidate for uncached mode too |
||
|
|
18a7e218cf |
Including fixes from netfilter.
Current release - regressions:
- mlx5: fix pre-2.40 binutils assembler error
Current release - new code bugs:
- net: psp: don't assume reply skbs will have a socket
- eth: fbnic: fix missing programming of the default descriptor
Previous releases - regressions:
- page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
- tcp:
- take care of zero tp->window_clamp in tcp_set_rcvlowat()
- don't call reqsk_fastopen_remove() in tcp_conn_request().
- eth: ice: release xa entry on adapter allocation failure
- eth: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
Previous releases - always broken:
- netfilter: validate objref and objrefmap expressions
- sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
- eth: mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
- eth: mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
- eth: ocelot: fix use-after-free caused by cyclic delayed work
Misc:
- add support for MediaTek PCIe 5G HP DRMR-H01
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjnlD0SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkQnYP/iQAwtLSyE2lqdP+OsfnwhdpeeMpy2B/
rfN7hk4OS1Rs1WBEf+Y45Lm5yC2poRleEf3QrixcqAHrtr4SED1JgZynjzjtYJ6c
mmCdzZPZMfBEVG7rcOusSMI+ebWCA1yvSzjqw6CctwzW6Sac4S3KRyYzUNuTJ+De
3APf6NptD324veJuyP01opafaExBpu21DfaEXeu4kP030S74J5TMfjt7dNvVpYDt
qwb7s6syJ9O+U3INmi6SVxJY9711Cj8tjUXzz94FWc/kC28D3UnIKUO1tL/qNthe
do0OhNa1YlsmoGJuRthxQa9ubyScnP6hKcy8VlanvuDya0JVyzdod/RDO6A2L9bp
bdJIM2wnLd4Sqe7QcGPBFj0KNahgqG1lzj0uHKhrWBetp+5wWySuk9Mohg8yZaGP
w53X0+WhAeferpcdMPFbwBL9pcgmHo2FxDMaHKcIrIAFb+TIzSZA6CXWB2RFRSgH
smgQ9EYUbfUtP6nZCs1lfOux7CdN94tuKA9jnUXFrlIda44akxVe/wVi9bRazrVx
TcpF0BEeQTyM4wCoVsk/JzTNeK2VZtj8i1B7TrZvR68UkwGrXm8DB4xG42LxdRaw
PEtu8mbUGiWtkojmUAVpYc/TfMdQOVyBUijd2uAe1pXV4yPDz5cWNkEPX8vvAiRu
rIe8v1VAz+gg
=LlXo
-----END PGP SIGNATURE-----
Merge tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- mlx5: fix pre-2.40 binutils assembler error
Current release - new code bugs:
- net: psp: don't assume reply skbs will have a socket
- eth: fbnic: fix missing programming of the default descriptor
Previous releases - regressions:
- page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
- tcp:
- take care of zero tp->window_clamp in tcp_set_rcvlowat()
- don't call reqsk_fastopen_remove() in tcp_conn_request()
- eth:
- ice: release xa entry on adapter allocation failure
- usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
Previous releases - always broken:
- netfilter: validate objref and objrefmap expressions
- sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
- eth:
- mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
- mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
- ocelot: fix use-after-free caused by cyclic delayed work
Misc:
- add support for MediaTek PCIe 5G HP DRMR-H01"
* tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: airoha: Fix loopback mode configuration for GDM2 port
selftests: drv-net: pp_alloc_fail: add necessary optoins to config
selftests: drv-net: pp_alloc_fail: lower traffic expectations
selftests: drv-net: fix linter warnings in pp_alloc_fail
eth: fbnic: fix reporting of alloc_failed qstats
selftests: drv-net: xdp: add test for interface level qstats
selftests: drv-net: xdp: rename netnl to ethnl
eth: fbnic: fix saving stats from XDP_TX rings on close
eth: fbnic: fix accounting of XDP packets
eth: fbnic: fix missing programming of the default descriptor
selftests: netfilter: query conntrack state to check for port clash resolution
selftests: netfilter: nft_fib.sh: fix spurious test failures
bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
netfilter: nft_objref: validate objref and objrefmap expressions
net: pse-pd: tps23881: Fix current measurement scaling
net/mlx5: fix pre-2.40 binutils assembler error
net/mlx5e: Do not fail PSP init on missing caps
net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed
net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables
net: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
...
|
||
|
|
6140f1d43b |
libceph: add empty check to ceph_con_get_out_msg()
This moves the list_empty() checks from the two callers (v1 and v2) into the base messenger.c library. Now the v1/v2 specializations do not need to know about con->out_queue; that implementation detail is now hidden behind the ceph_con_get_out_msg() function. [ idryomov: instead of changing prepare_write_message() to return a bool, move ceph_con_get_out_msg() call out to arrive to the same pattern as in messenger_v2.c ] Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
||
|
|
7399212dcf |
libceph: pass the message pointer instead of loading con->out_msg
This pointer is in a register anyway, so let's use that instead of reloading from memory everywhere. [ idryomov: formatting ] Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
||
|
|
59699a5a71 |
libceph: make ceph_con_get_out_msg() return the message pointer
The caller in messenger_v1.c loads it anyway, so let's keep the pointer in the register instead of reloading it from memory. This eliminates a tiny bit of unnecessary overhead. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
||
|
|
27c0a7b05d |
libceph: Use HMAC-SHA256 library instead of crypto_shash
Use the HMAC-SHA256 library functions instead of crypto_shash. This is simpler and faster. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
||
|
|
bbf0c98b3a |
bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
net/bridge/br_private.h:1627 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 7 locks held by socat/410: #0: ffff88800d7a9c90 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_stream_connect+0x43/0xa0 #1: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x62/0x1830 [..] #6: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: nf_hook.constprop.0+0x8a/0x440 Call Trace: lockdep_rcu_suspicious.cold+0x4f/0xb1 br_vlan_fill_forward_path_pvid+0x32c/0x410 [bridge] br_fill_forward_path+0x7a/0x4d0 [bridge] Use to correct helper, non _rcu variant requires RTNL mutex. Fixes: bcf2766b1377 ("net: bridge: resolve forwarding path for VLAN tag actions in bridge devices") Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> |
||
|
|
f359b809d5 |
netfilter: nft_objref: validate objref and objrefmap expressions
Referencing a synproxy stateful object from OUTPUT hook causes kernel
crash due to infinite recursive calls:
BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
[...]
Call Trace:
__find_rr_leaf+0x99/0x230
fib6_table_lookup+0x13b/0x2d0
ip6_pol_route+0xa4/0x400
fib6_rule_lookup+0x156/0x240
ip6_route_output_flags+0xc6/0x150
__nf_ip6_route+0x23/0x50
synproxy_send_tcp_ipv6+0x106/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
nft_synproxy_do_eval+0x263/0x310
nft_do_chain+0x5a8/0x5f0 [nf_tables
nft_do_chain_inet+0x98/0x110
nf_hook_slow+0x43/0xc0
__ip6_local_out+0xf0/0x170
ip6_local_out+0x17/0x70
synproxy_send_tcp_ipv6+0x1a2/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
[...]
Implement objref and objrefmap expression validate functions.
Currently, only NFT_OBJECT_SYNPROXY object type requires validation.
This will also handle a jump to a chain using a synproxy object from the
OUTPUT hook.
Now when trying to reference a synproxy object in the OUTPUT hook, nft
will produce the following error:
synproxy_crash.nft: Error: Could not process rule: Operation not supported
synproxy name mysynproxy
^^^^^^^^^^^^^^^^^^^^^^^^
Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support")
Reported-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com>
Closes: https://bugzilla.suse.com/1250237
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
||
|
|
2215336295 |
hyperv-next for v6.18
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ =OR26 -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt |
||
|
|
23f3770e1a |
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
Cilium has a BPF egress gateway feature which forces outgoing K8s Pod
traffic to pass through dedicated egress gateways which then SNAT the
traffic in order to interact with stable IPs outside the cluster.
The traffic is directed to the gateway via vxlan tunnel in collect md
mode. A recent BPF change utilized the bpf_redirect_neigh() helper to
forward packets after the arrival and decap on vxlan, which turned out
over time that the kmalloc-256 slab usage in kernel was ever-increasing.
The issue was that vxlan allocates the metadata_dst object and attaches
it through a fake dst entry to the skb. The latter was never released
though given bpf_redirect_neigh() was merely setting the new dst entry
via skb_dst_set() without dropping an existing one first.
Fixes: b4ab31414970 ("bpf: Add redirect_neigh helper as redirect drop-in")
Reported-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Reported-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jordan Rife <jrife@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251003073418.291171-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
81538c8e42 |
NFSD 6.18 Release Notes
Mike Snitzer has prototyped a mechanism for disabling I/O caching in
NFSD. This is introduced in v6.18 as an experimental feature. This
enables scaling NFSD in /both/ directions:
- NFS service can be supported on systems with small memory
footprints, such as low-cost cloud instances
- Large NFS workloads will be less likely to force the eviction of
server-local activity, helping it avoid thrashing
Jeff Layton contributed a number of fixes to the new attribute
delegation implementation (based on a pending Internet RFC) that we
hope will make attribute delegation reliable enough to enable by
default, as it is on the Linux NFS client.
The remaining patches in this pull request are clean-ups and minor
optimizations. Many thanks to the contributors, reviewers, testers,
and bug reporters who participated during the v6.18 NFSD development
cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmjjxZMACgkQM2qzM29m
f5dhcA//f6ha4lQ2qq3HfJCa4jLe55LEEYiPPgsBLyuAJtQcEQEVJtb1+0qmdeoE
4ROW/ooE6TmgRX3hJxpUqMeJPj99lC/Y29gsDfWvl9+EM1Dii4JborEBXFwKXn5b
KQhM/UK0H8nsZSDys5nr45eai0LQhjL6kbG2k2OoRxDjj1VpOdqVMYWKBVFDwYLc
LDqv/zh4mB30faMATfLkLpGQb2J9sWd6+Ks45m0H6+KkEEjINlHB5WTX601Omyw5
7YgIiR9/OPLiEcY0rK1iDjiY5g3sFza/W+hqzBADi9i5BDdadYUhUvQWgR+Y4SDD
04UltS5eVRtLM74S3JErWqQ4yuS5/aiUcinNeHV6wyfunuipusG2Hc5ZehcJAH78
Pt8D9KcBCsCMv2sBJl3vH7++rpBeiP9WHONrydK25Omk5as7eOGFjxE++KQ69BOB
+AtHIKEtw1VqX8IpLaG1EL3Ux22z4/PRw2sVx1Oql3mgOCXIsBTr7KPVhjwfBoQO
qlGpnj+xGV1b082BBMfTp1TLtRdD2PKOMaUQeF8XhyEXw92iGQfi0fCrAsbpFKfn
D3+BK3mR+tOcaEhpHhqU/XwiYfSTU0NYrimmjEvGbFhsT0Tz+RSoNQUBth2tJaqZ
hMEBmVujtu5LWnD0+dUNuylL10I4XDge+NVEmJa9pg2nFmjcojw=
=/nLS
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Mike Snitzer has prototyped a mechanism for disabling I/O caching in
NFSD. This is introduced in v6.18 as an experimental feature. This
enables scaling NFSD in /both/ directions:
- NFS service can be supported on systems with small memory
footprints, such as low-cost cloud instances
- Large NFS workloads will be less likely to force the eviction of
server-local activity, helping it avoid thrashing
Jeff Layton contributed a number of fixes to the new attribute
delegation implementation (based on a pending Internet RFC) that we
hope will make attribute delegation reliable enough to enable by
default, as it is on the Linux NFS client.
The remaining patches in this pull request are clean-ups and minor
optimizations. Many thanks to the contributors, reviewers, testers,
and bug reporters who participated during the v6.18 NFSD development
cycle"
* tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
nfsd: discard nfserr_dropit
SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
NFSD: Add io_cache_{read,write} controls to debugfs
NFSD: Do the grace period check in ->proc_layoutget
nfsd: delete unnecessary NULL check in __fh_verify()
NFSD: Allow layoutcommit during grace period
NFSD: Disallow layoutget during grace period
sunrpc: fix "occurence"->"occurrence"
nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in
nfsd: nfserr_jukebox in nlm_fopen should lead to a retry
NFSD: Reduce DRC bucket size
NFSD: Delay adding new entries to LRU
SUNRPC: Move the svc_rpcb_cleanup() call sites
NFS: Remove rpcbind cleanup for NFSv4.0 callback
nfsd: unregister with rpcbind when deleting a transport
NFSD: Drop redundant conversion to bool
sunrpc: eliminate return pointer in svc_tcp_sendmsg()
sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length
nfsd: decouple the xprtsec policy check from check_nfsd_access()
NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()
...
|
||
|
|
21b29e74ff |
tcp: take care of zero tp->window_clamp in tcp_set_rcvlowat()
Some applications (like selftests/net/tcp_mmap.c) call SO_RCVLOWAT
on their listener, before accept().
This has an unfortunate effect on wscale selection in
tcp_select_initial_window() during 3WHS.
For instance, tcp_mmap was negotiating wscale 4, regardless
of tcp_rmem[2] and sysctl_rmem_max.
Do not change tp->window_clamp if it is zero
or bigger than our computed value.
Zero value is special, it allows tcp_select_initial_window()
to enable autotuning.
Note that SO_RCVLOWAT use on listener is probably not wise,
because tp->scaling_ratio has a default value, possibly wrong.
Fixes: d1361840f8c5 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251003184119.2526655-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
95920c2ed0 |
page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on boot on his 32-bit parisc machine. The cause of this is the mask is set too wide, so the page_pool_page_is_pp() incurs false positives which crashes the machine. Just disabling the check in page_pool_is_pp() will lead to the page_pool code itself malfunctioning; so instead of doing this, this patch changes the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel pointers for page_pool-tagged pages. The fix relies on the kernel pointers that alias with the pp_magic field always being above PAGE_OFFSET. With this assumption, we can use the lowest bit of the value of PAGE_OFFSET as the upper bound of the PP_DMA_INDEX_MASK, which should avoid the false positives. Because we cannot rely on PAGE_OFFSET always being a compile-time constant, nor on it always being >0, we fall back to disabling the dma_index storage when there are not enough bits available. This leaves us in the situation we were in before the patch in the Fixes tag, but only on a subset of architecture configurations. This seems to be the best we can do until the transition to page types in complete for page_pool pages. v2: - Make sure there's at least 8 bits available and that the PAGE_OFFSET bit calculation doesn't wrap Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/ Fixes: ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") Cc: stable@vger.kernel.org # 6.15+ Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Tested-by: Helge Deller <deller@gmx.de> Link: https://patch.msgid.link/20250930114331.675412-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2e7cbbbe3d |
tcp: Don't call reqsk_fastopen_remove() in tcp_conn_request().
syzbot reported the splat below in tcp_conn_request(). [0] If a listener is close()d while a TFO socket is being processed in tcp_conn_request(), inet_csk_reqsk_queue_add() does not set reqsk->sk and calls inet_child_forget(), which calls tcp_disconnect() for the TFO socket. After the cited commit, tcp_disconnect() calls reqsk_fastopen_remove(), where reqsk_put() is called due to !reqsk->sk. Then, reqsk_fastopen_remove() in tcp_conn_request() decrements the last req->rsk_refcnt and frees reqsk, and __reqsk_free() at the drop_and_free label causes the refcount underflow for the listener and double-free of the reqsk. Let's remove reqsk_fastopen_remove() in tcp_conn_request(). Note that other callers make sure tp->fastopen_rsk is not NULL. [0]: refcount_t: underflow; use-after-free. WARNING: CPU: 12 PID: 5563 at lib/refcount.c:28 refcount_warn_saturate (lib/refcount.c:28) Modules linked in: CPU: 12 UID: 0 PID: 5563 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:refcount_warn_saturate (lib/refcount.c:28) Code: ab e8 8e b4 98 ff 0f 0b c3 cc cc cc cc cc 80 3d a4 e4 d6 01 00 75 9c c6 05 9b e4 d6 01 01 48 c7 c7 e8 df fb ab e8 6a b4 98 ff <0f> 0b e9 03 5b 76 00 cc 80 3d 7d e4 d6 01 00 0f 85 74 ff ff ff c6 RSP: 0018:ffffa79fc0304a98 EFLAGS: 00010246 RAX: d83af4db1c6b3900 RBX: ffff9f65c7a69020 RCX: d83af4db1c6b3900 RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: ffffffffac78a280 RBP: 000000009d781b60 R08: 0000000000007fff R09: ffffffffac6ca280 R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9f65c7b4f100 R13: ffff9f65c7d23c00 R14: ffff9f65c7d26000 R15: ffff9f65c7a64ef8 FS: 00007f9f962176c0(0000) GS:ffff9f65fcf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000180 CR3: 000000000dbbe006 CR4: 0000000000372ef0 Call Trace: <IRQ> tcp_conn_request (./include/linux/refcount.h:400 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/sock.h:1965 ./include/net/request_sock.h:131 net/ipv4/tcp_input.c:7301) tcp_rcv_state_process (net/ipv4/tcp_input.c:6708) tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1670) tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1906) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438) ip6_input (net/ipv6/ip6_input.c:500) ipv6_rcv (net/ipv6/ip6_input.c:311) __netif_receive_skb (net/core/dev.c:6104) process_backlog (net/core/dev.c:6456) __napi_poll (net/core/dev.c:7506) net_rx_action (net/core/dev.c:7569 net/core/dev.c:7696) handle_softirqs (kernel/softirq.c:579) do_softirq (kernel/softirq.c:480) </IRQ> Fixes: 45c8a6cc2bcd ("tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251001233755.1340927-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2f3119686e |
net/sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
If new_asoc->peer.adaptation_ind=0 and sctp_ulpevent_make_authkey=0
and sctp_ulpevent_make_authkey() returns 0, then the variable
ai_ev remains zero and the zero will be dereferenced
in the sctp_ulpevent_free() function.
Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Fixes: 30f6ebf65bc4 ("sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT")
Link: https://patch.msgid.link/20251002091448.11-1-alsp705@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
070a542f08 |
NFS Client Updates for Linux 6.18
New Features:
* Add a Kconfig option to redirect dfprintk() to the trace buffer
* Enable use of the RWF_DONTCACHE flag on the NFS client
* Add striped layout handling to pNFS flexfiles
* Add proper localio handling for READ and WRITE O_DIRECT
Bugfixes:
* Handle NFS4ERR_GRACE errors during delegation recall
* Fix NFSv4.1 backchannel max_resp_sz verification check
* Fix mount hang after CREATE_SESSION failure
* Fix d_parent->d_inode locking in nfs4_setup_readdir()
Other Cleanups and Improvements:
* Improvements to write handling tracepoints
* Fix a few trivial spelling mistakes
* Cleanups to the rpcbind cleanup call sites
* Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page
* Remove unused NFS_WBACK_BUSY() macro
* Remove __GFP_NOWARN flags
* Unexport rpc_malloc() and rpc_free()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmjgOEkACgkQ18tUv7Cl
QOuX7RAA33AUq4NBxzDOgz4u4eNU/a8z2AazRgAtfmPVLTitrx/kqcfVEtHAdHFi
cWkN2SO+TVzxIGOrudqNyjV2cjfUJV4ZJBkNY6lJvxPNAH27Dk9P2iMF12QYtHOq
qyrwqoUQcyBkmtpgFUyHzydA4J17JDl5A7I/tOkro3ZfV4gmYAUwVdS+VtJoosLp
7FnXv+W5FBWkfKrIT+vPyiBqxl0gZXmzUkJK2lG9m9NvE2Jk2MbPFyhdUEA5JybJ
akNLdBnFwNWw2rLulSqs68ZbCGz6NY634q1Z+ZsRJ907ZdBqJ7zIBFv4yc/bMpZm
Q9kh1M0OyvK0MlLRFe3efOLxRoN9nJPd+kuaw9eYw5V57Jrwj6QGV4nud2C8nzs8
iB+LuJli+FRCeD84SY8NnMFKpXphHCeMXcBMRMsLTOSotJZFithO95+w1pKlK64A
lxY1JXOQYelwJZxfhGPovwac4t1arpDjsumRlTmq12KaQnM3Z1gR2PUgeLxEPHQM
f6gEiN9KDOhW/gZrFQxNs2hVAH68RDKpWxeR2XeVJlJYf37Hgh8bNGEiURi3G57n
ED7tFbK9lzHVFR07hiP3Cvzop4z2mzadgHo+1vzdXmZZQA4gc4MSFfszFLCnQopw
LEb7RqpVVXtQb+f7A+LuD+a2rLEnW+gTf6iLqCR8hAB5k1xmcYQ=
=8wnU
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add a Kconfig option to redirect dfprintk() to the trace buffer
- Enable use of the RWF_DONTCACHE flag on the NFS client
- Add striped layout handling to pNFS flexfiles
- Add proper localio handling for READ and WRITE O_DIRECT
Bugfixes:
- Handle NFS4ERR_GRACE errors during delegation recall
- Fix NFSv4.1 backchannel max_resp_sz verification check
- Fix mount hang after CREATE_SESSION failure
- Fix d_parent->d_inode locking in nfs4_setup_readdir()
Other Cleanups and Improvements:
- Improvements to write handling tracepoints
- Fix a few trivial spelling mistakes
- Cleanups to the rpcbind cleanup call sites
- Convert the SUNRPC xdr_buf to use a scratch folio instead of
scratch page
- Remove unused NFS_WBACK_BUSY() macro
- Remove __GFP_NOWARN flags
- Unexport rpc_malloc() and rpc_free()"
* tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits)
NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
nfs/localio: add tracepoints for misaligned DIO READ and WRITE support
nfs/localio: add proper O_DIRECT support for READ and WRITE
nfs/localio: refactor iocb initialization
nfs/localio: refactor iocb and iov_iter_bvec initialization
nfs/localio: avoid issuing misaligned IO using O_DIRECT
nfs/localio: make trace_nfs_local_open_fh more useful
NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
sunrpc: unexport rpc_malloc() and rpc_free()
NFSv4/flexfiles: Add support for striped layouts
NFSv4/flexfiles: Update layout stats & error paths for striped layouts
NFSv4/flexfiles: Write path updates for striped layouts
NFSv4/flexfiles: Commit path updates for striped layouts
NFSv4/flexfiles: Read path updates for striped layouts
NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
NFSv4/flexfiles: Add data structure support for striped layouts
NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
NFSv4/flexfiles: Remove cred local variable dependency
nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing
NFS: Enable use of the RWF_DONTCACHE flag on the NFS client
...
|
||
|
|
1b54b0756f |
net: doc: Fix typos in docs
Fix typos in doc comments. Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com> Link: https://patch.msgid.link/20251001105715.50462-1-bhanuseshukumar@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
7a0f94361f |
net: psp: don't assume reply skbs will have a socket
Rx path may be passing around unreferenced sockets, which means
that skb_set_owner_edemux() may not set skb->sk and PSP will crash:
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:psp_reply_set_decrypted (./include/net/psp/functions.h:132 net/psp/psp_sock.c:287)
tcp_v6_send_response.constprop.0 (net/ipv6/tcp_ipv6.c:979)
tcp_v6_send_reset (net/ipv6/tcp_ipv6.c:1140 (discriminator 1))
tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1683)
tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1912)
Fixes: 659a2899a57d ("tcp: add datapath logic for PSP with inline key exchange")
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251001022426.2592750-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
07fdad3a93 |
Networking changes for 6.18.
Core & protocols
----------------
- Improve drop account scalability on NUMA hosts for RAW and UDP sockets
and the backlog, almost doubling the Pps capacity under DoS.
- Optimize the UDP RX performance under stress, reducing contention,
revisiting the binary layout of the involved data structs and
implementing NUMA-aware locking. This improves UDP RX performance by
an additional 50%, even more under extreme conditions.
- Add support for PSP encryption of TCP connections; this mechanism has
some similarities with IPsec and TLS, but offers superior HW offloads
capabilities.
- Ongoing work to support Accurate ECN for TCP. AccECN allows more than
one congestion notification signal per RTT and is a building block for
Low Latency, Low Loss, and Scalable Throughput (L4S).
- Reorganize the TCP socket binary layout for data locality, reducing
the number of touched cachelines in the fastpath.
- Refactor skb deferral free to better scale on large multi-NUMA hosts,
this improves TCP and UDP RX performances significantly on such HW.
- Increase the default socket memory buffer limits from 256K to 4M to
better fit modern link speeds.
- Improve handling of setups with a large number of nexthop, making dump
operating scaling linearly and avoiding unneeded synchronize_rcu() on
delete.
- Improve bridge handling of VLAN FDB, storing a single entry per bridge
instead of one entry per port; this makes the dump order of magnitude
faster on large switches.
- Restore IP ID correctly for encapsulated packets at GSO segmentation
time, allowing GRO to merge packets in more scenarios.
- Improve netfilter matching performance on large sets.
- Improve MPTCP receive path performance by leveraging recently
introduced core infrastructure (skb deferral free) and adopting recent
TCP autotuning changes.
- Allow bridges to redirect to a backup port when the bridge port is
administratively down.
- Introduce MPTCP 'laminar' endpoint that con be used only once per
connection and simplify common MPTCP setups.
- Add RCU safety to dst->dev, closing a lot of possible races.
- A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing
code duplication.
- Supports pulling data from an skb frag into the linear area of an XDP
buffer.
Things we sprinkled into general kernel code
--------------------------------------------
- Generate netlink documentation from YAML using an integrated
YAML parser.
Driver API
----------
- Support using IPv6 Flow Label in Rx hash computation and RSS queue
selection.
- Introduce API for fetching the DMA device for a given queue, allowing
TCP zerocopy RX on more H/W setups.
- Make XDP helpers compatible with unreadable memory, allowing more
easily building DevMem-enabled drivers with a unified XDP/skbs
datapath.
- Add a new dedicated ethtool callback enabling drivers to provide the
number of RX rings directly, improving efficiency and clarity in RX
ring queries and RSS configuration.
- Introduce a burst period for the health reporter, allowing better
handling of multiple errors due to the same root cause.
- Support for DPLL phase offset exponential moving average, controlling
the average smoothing factor.
Device drivers
--------------
- Add a new Huawei driver for 3rd gen NIC (hinic3).
- Add a new SpacemiT driver for K1 ethernet MAC.
- Add a generic abstraction for shared memory communication devices
(dibps)
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- Use multiple per-queue doorbell, to avoid MMIO contention issues
- support adjacent functions, allowing them to delegate their
SR-IOV VFs to sibling PFs
- support RSS for IPSec offload
- support exposing raw cycle counters in PTP and mlx5
- support for disabling host PFs.
- Intel (100G, ice, idpf):
- ice: support for SRIOV VFs over an Active-Active link aggregate
- ice: support for firmware logging via debugfs
- ice: support for Earliest TxTime First (ETF) hardware offload
- idpf: support basic XDP functionalities and XSk
- Broadcom (bnxt):
- support Hyper-V VF ID
- dynamic SRIOV resource allocations for RoCE
- Meta (fbnic):
- support queue API, zero-copy Rx and Tx
- support basic XDP functionalities
- devlink health support for FW crashes and OTP mem corruptions
- expand hardware stats coverage to FEC, PHY, and Pause
- Wangxun:
- support ethtool coalesce options
- support for multiple RSS contexts
- Ethernet virtual:
- Macsec:
- replace custom netlink attribute checks with policy-level checks
- Bonding:
- support aggregator selection based on port priority
- Microsoft vNIC:
- use page pool fragments for RX buffers instead of full pages to
improve memory efficiency
- Ethernet NICs consumer, and embedded:
- Qualcomm: support Ethernet function for IPQ9574 SoC
- Airoha: implement wlan offloading via NPU
- Freescale
- enetc: add NETC timer PTP driver and add PTP support
- fec: enable the Jumbo frame support for i.MX8QM
- Renesas (R-Car S4): support HW offloading for layer 2 switching
- support for RZ/{T2H, N2H} SoCs
- Cadence (macb): support TAPRIO traffic scheduling
- TI:
- support for Gigabit ICSS ethernet SoC (icssm-prueth)
- Synopsys (stmmac): a lot of cleanups
- Ethernet PHYs:
- Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
driver
- Support bcm63268 GPHY power control
- Support for Micrel lan8842 PHY and PTP
- Support for Aquantia AQR412 and AQR115
- CAN:
- a large CAN-XL preparation work
- reorganize raw_sock and uniqframe struct to minimize memory usage
- rcar_canfd: update the CAN-FD handling
- WiFi:
- extended Neighbor Awareness Networking (NAN) support
- S1G channel representation cleanup
- improve S1G support
- WiFi drivers:
- Intel (iwlwifi):
- major refactor and cleanup
- Broadcom (brcm80211):
- support for AP isolation
- RealTek (rtw88/89) rtw88/89:
- preparation work for RTL8922DE support
- MediaTek (mt76):
- HW restart improvements
- MLO support
- Qualcomm/Atheros (ath10k_
- GTK rekey fixes
- Bluetooth drivers:
- btusb: support for several new IDs for MT7925
- btintel: support for BlazarIW core
- btintel_pcie: support for _suspend() / _resume()
- btintel_pcie: support for Scorpious, Panther Lake-H484 IDs
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB
zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f
q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU
VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb
ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR
dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS
0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD
doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm
pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X
NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO
Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx
ST2QF6lZeWFh
=w2Kz
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core & protocols:
- Improve drop account scalability on NUMA hosts for RAW and UDP
sockets and the backlog, almost doubling the Pps capacity under DoS
- Optimize the UDP RX performance under stress, reducing contention,
revisiting the binary layout of the involved data structs and
implementing NUMA-aware locking. This improves UDP RX performance
by an additional 50%, even more under extreme conditions
- Add support for PSP encryption of TCP connections; this mechanism
has some similarities with IPsec and TLS, but offers superior HW
offloads capabilities
- Ongoing work to support Accurate ECN for TCP. AccECN allows more
than one congestion notification signal per RTT and is a building
block for Low Latency, Low Loss, and Scalable Throughput (L4S)
- Reorganize the TCP socket binary layout for data locality, reducing
the number of touched cachelines in the fastpath
- Refactor skb deferral free to better scale on large multi-NUMA
hosts, this improves TCP and UDP RX performances significantly on
such HW
- Increase the default socket memory buffer limits from 256K to 4M to
better fit modern link speeds
- Improve handling of setups with a large number of nexthop, making
dump operating scaling linearly and avoiding unneeded
synchronize_rcu() on delete
- Improve bridge handling of VLAN FDB, storing a single entry per
bridge instead of one entry per port; this makes the dump order of
magnitude faster on large switches
- Restore IP ID correctly for encapsulated packets at GSO
segmentation time, allowing GRO to merge packets in more scenarios
- Improve netfilter matching performance on large sets
- Improve MPTCP receive path performance by leveraging recently
introduced core infrastructure (skb deferral free) and adopting
recent TCP autotuning changes
- Allow bridges to redirect to a backup port when the bridge port is
administratively down
- Introduce MPTCP 'laminar' endpoint that con be used only once per
connection and simplify common MPTCP setups
- Add RCU safety to dst->dev, closing a lot of possible races
- A significant crypto library API for SCTP, MPTCP and IPv6 SR,
reducing code duplication
- Supports pulling data from an skb frag into the linear area of an
XDP buffer
Things we sprinkled into general kernel code:
- Generate netlink documentation from YAML using an integrated YAML
parser
Driver API:
- Support using IPv6 Flow Label in Rx hash computation and RSS queue
selection
- Introduce API for fetching the DMA device for a given queue,
allowing TCP zerocopy RX on more H/W setups
- Make XDP helpers compatible with unreadable memory, allowing more
easily building DevMem-enabled drivers with a unified XDP/skbs
datapath
- Add a new dedicated ethtool callback enabling drivers to provide
the number of RX rings directly, improving efficiency and clarity
in RX ring queries and RSS configuration
- Introduce a burst period for the health reporter, allowing better
handling of multiple errors due to the same root cause
- Support for DPLL phase offset exponential moving average,
controlling the average smoothing factor
Device drivers:
- Add a new Huawei driver for 3rd gen NIC (hinic3)
- Add a new SpacemiT driver for K1 ethernet MAC
- Add a generic abstraction for shared memory communication
devices (dibps)
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- Use multiple per-queue doorbell, to avoid MMIO contention
issues
- support adjacent functions, allowing them to delegate their
SR-IOV VFs to sibling PFs
- support RSS for IPSec offload
- support exposing raw cycle counters in PTP and mlx5
- support for disabling host PFs.
- Intel (100G, ice, idpf):
- ice: support for SRIOV VFs over an Active-Active link
aggregate
- ice: support for firmware logging via debugfs
- ice: support for Earliest TxTime First (ETF) hardware offload
- idpf: support basic XDP functionalities and XSk
- Broadcom (bnxt):
- support Hyper-V VF ID
- dynamic SRIOV resource allocations for RoCE
- Meta (fbnic):
- support queue API, zero-copy Rx and Tx
- support basic XDP functionalities
- devlink health support for FW crashes and OTP mem corruptions
- expand hardware stats coverage to FEC, PHY, and Pause
- Wangxun:
- support ethtool coalesce options
- support for multiple RSS contexts
- Ethernet virtual:
- Macsec:
- replace custom netlink attribute checks with policy-level
checks
- Bonding:
- support aggregator selection based on port priority
- Microsoft vNIC:
- use page pool fragments for RX buffers instead of full pages
to improve memory efficiency
- Ethernet NICs consumer, and embedded:
- Qualcomm: support Ethernet function for IPQ9574 SoC
- Airoha: implement wlan offloading via NPU
- Freescale
- enetc: add NETC timer PTP driver and add PTP support
- fec: enable the Jumbo frame support for i.MX8QM
- Renesas (R-Car S4):
- support HW offloading for layer 2 switching
- support for RZ/{T2H, N2H} SoCs
- Cadence (macb): support TAPRIO traffic scheduling
- TI:
- support for Gigabit ICSS ethernet SoC (icssm-prueth)
- Synopsys (stmmac): a lot of cleanups
- Ethernet PHYs:
- Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
driver
- Support bcm63268 GPHY power control
- Support for Micrel lan8842 PHY and PTP
- Support for Aquantia AQR412 and AQR115
- CAN:
- a large CAN-XL preparation work
- reorganize raw_sock and uniqframe struct to minimize memory
usage
- rcar_canfd: update the CAN-FD handling
- WiFi:
- extended Neighbor Awareness Networking (NAN) support
- S1G channel representation cleanup
- improve S1G support
- WiFi drivers:
- Intel (iwlwifi):
- major refactor and cleanup
- Broadcom (brcm80211):
- support for AP isolation
- RealTek (rtw88/89) rtw88/89:
- preparation work for RTL8922DE support
- MediaTek (mt76):
- HW restart improvements
- MLO support
- Qualcomm/Atheros (ath10k):
- GTK rekey fixes
- Bluetooth drivers:
- btusb: support for several new IDs for MT7925
- btintel: support for BlazarIW core
- btintel_pcie: support for _suspend() / _resume()
- btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"
* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
net: stmmac: Add support for Allwinner A523 GMAC200
dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
Revert "Documentation: net: add flow control guide and document ethtool API"
octeontx2-pf: fix bitmap leak
octeontx2-vf: fix bitmap leak
net/mlx5e: Use extack in set rxfh callback
net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
net/mlx5e: Introduce mlx5e_rss_init_params
net/mlx5e: Remove unused mdev param from RSS indir init
net/mlx5: Improve QoS error messages with actual depth values
net/mlx5e: Prevent entering switchdev mode with inconsistent netns
net/mlx5: HWS, Generalize complex matchers
net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
selftests/net: add tcp_port_share to .gitignore
Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
net: add NUMA awareness to skb_attempt_defer_free()
net: use llist for sd->defer_list
net: make softnet_data.defer_count an atomic
selftests: drv-net: psp: add tests for destroying devices
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
...
|
||
|
|
d8e97cc476 |
SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it. This unblocks the eventual removal of the selection of CRYPTO from NFSD_V4, which will no longer be needed by nfsd itself due to switching to the crypto library functions. But NFSD_V4 selects RPCSEC_GSS_KRB5, which still needs CRYPTO. It makes more sense for RPCSEC_GSS_KRB5 to select CRYPTO itself, like most other kconfig options that need CRYPTO do. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> |
||
|
|
f1455695d2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8). Conflicts: tools/testing/selftests/drivers/net/bonding/Makefile 87951b566446 selftests: bonding: add test for passive LACP mode c2377f1763e9 selftests: bonding: add test for LACP actor port priority Adjacent changes: drivers/net/ethernet/cadence/macb.h fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue 89934dbf169e net: macb: Add TAPRIO traffic scheduling support drivers/net/ethernet/cadence/macb_main.c fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue 89934dbf169e net: macb: Add TAPRIO traffic scheduling support Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
1a98f5699b |
Revert "Documentation: net: add flow control guide and document ethtool API"
This reverts commit 7bd80ed89d72285515db673803b021469ba71ee8. I should not have merged it to begin with due to pending review and changes to be addressed. Link: https://patch.msgid.link/c6f3af12df9b7998920a02027fc8893ce82afc4c.1759239721.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
ae28ed4578 |
bpf-next-6.18
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/ UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg= =LeQi -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) * tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ... |
||
|
|
94b04355e6 |
Drivers: hv: Add CONFIG_HYPERV_VMBUS option
At present VMBus driver is hinged off of CONFIG_HYPERV which entails lot of builtin code and encompasses too much. It's not always clear what depends on builtin hv code and what depends on VMBus. Setting CONFIG_HYPERV as a module and fudging the Makefile to switch to builtin adds even more confusion. VMBus is an independent module and should have its own config option. Also, there are scenarios like baremetal dom0/root where support is built in with CONFIG_HYPERV but without VMBus. Lastly, there are more features coming down that use CONFIG_HYPERV and add more dependencies on it. So, create a fine grained HYPERV_VMBUS option and update Kconfigs for dependency on VMBus. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Signed-off-by: Wei Liu <wei.liu@kernel.org> |
||
|
|
ffe381923d |
sunrpc: unexport rpc_malloc() and rpc_free()
These are not used outside of sunrpc code. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> |
||
|
|
56a0810d8c |
audit/stable-6.18 PR 20250926
-----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmjWq5oUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPTjRAAwYapnw+ZGdFtTGIDT63HtlKjCGHg DRR8J1RYWhxQL78dInjl7hlGPd4ULdpdF6zsh27X/8OsdFotw4NhDyPJwS1qWZv9 uBJMy/s1Qi1V/rrtDygLGgkQ9ICfl/hgVh3L+g9AXU8H9IapMULp33z+2ueFU4rA PXgXppgNQTOhIQml0tagY7iPlLaaI1uPv/Dbvt792CSrKZReC+uiDSQKD6SUy5oJ NBRs0emdCqbllo8Eo7wTGdfzUttsPWYHe7X9BGCMK2bHp0BpMnFBDtuipUAgjNE8 O16EkAtBMpEBW9VEFvDYW1jMFO7ccD8b09CbqPLdE7E0GeigTiODg+FdncKEpZn0 Dl4xPbIoPBHVrDHKFK3HcuEdUs0FZH3NpTLFRg0/nWbg3CfSOFq1ZKhSbwLTZ48V 2Iq22G0hIIl3yTEePSoR8xCSQkWf6hA1SVvzBqw5Xn1tnkdIUuM+KzeZUPKxCOiH r5b3ufrN5YMAcmc59q393sNuSMd7s97fohhK8/HouB93EcVNM2UjLEKVJnhMhYRE N21O17jwQG9F+OYTnmtMzuUF6yxwSAmkzQOg6F+lalJ8MECnNrZOEeyuA3d5ISi5 4ZrXHWw90qaDy9lCV1o0UwWt9na+WxeMCJNpI07h5V1k3x7BULiI6WeP7J1qnY9r YlLv/6Hgx29dtqE= =iQal -----END PGP SIGNATURE----- Merge tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Proper audit support for multiple LSMs As the audit subsystem predated the work to enable multiple LSMs, some additional work was needed to support logging the different LSM labels for the subjects/tasks and objects on the system. Casey's patches add new auxillary records for subjects and objects that convey the additional labels. - Ensure fanotify audit events are always generated Generally speaking security relevant subsystems always generate audit events, unless explicitly ignored. However, up to this point fanotify events had been ignored by default, but starting with this pull request fanotify follows convention and generates audit events by default. - Replace an instance of strcpy() with strscpy() - Minor indentation, style, and comment fixes * tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix skb leak when audit rate limit is exceeded audit: init ab->skb_list earlier in audit_buffer_alloc() audit: add record for multiple object contexts audit: add record for multiple task security contexts lsm: security_lsmblob_to_secctx module selection audit: create audit_stamp structure audit: add a missing tab audit: record fanotify event regardless of presence of rules audit: fix typo in auditfilter.c comment audit: Replace deprecated strcpy() with strscpy() audit: fix indentation in audit_log_exit() |
||
|
|
5628f3fe3b |
net: add NUMA awareness to skb_attempt_defer_free()
Instead of sharing sd->defer_list & sd->defer_count with many cpus, add one pair for each NUMA node. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250928084934.3266948-4-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
844c9db7f7 |
net: use llist for sd->defer_list
Get rid of sd->defer_lock and adopt llist operations. We optimize skb_attempt_defer_free() for the common case, where the packet is queued. Otherwise sd->defer_count is increasing, until skb_defer_free_flush() clears it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250928084934.3266948-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
9c94ae6bb0 |
net: make softnet_data.defer_count an atomic
This is preparation work to remove the softnet_data.defer_lock, as it is contended on hosts with large number of cores. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250928084934.3266948-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
f857478d62 |
netdevsim: a basic test PSP implementation
Provide a PSP implementation for netdevsim. Use psp_dev_encapsulate() and psp_dev_rcv() to do actual encapsulation and decapsulation on skbs, but perform no encryption or decryption. In order to make encryption with a bad key result in a drop on the peer's rx side, we stash our psd's generation number in the first byte of each key before handing to the peer. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-2-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
a1b501a8c6 |
page_pool: Clamp pool size to max 16K pages
page_pool_init() returns E2BIG when the page_pool size goes above 32K pages. As some drivers are configuring the page_pool size according to the MTU and ring size, there are cases where this limit is exceeded and the queue creation fails. The page_pool size doesn't have to cover a full queue, especially for larger ring size. So clamp the size instead of returning an error. Do this in the core to avoid having each driver do the clamping. The current limit was deemed to high [1] so it was reduced to 16K to avoid page waste. [1] https://lore.kernel.org/all/1758532715-820422-3-git-send-email-tariqt@nvidia.com/ Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250926131605.2276734-2-dtatulea@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
2ade91705b |
tipc: adjust tipc_nodeid2string() to return string length
Since the value returned by 'tipc_nodeid2string()' is not used, the function may be adjusted to return the length of the result, which is helpful to drop a few calls to 'strlen()' in 'tipc_link_create()' and 'tipc_link_bc_create()'. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250926074113.914399-1-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
9c328f5474 |
net: nfc: nci: Add parameter validation for packet data
Syzbot reported an uninitialized value bug in nci_init_req, which was
introduced by commit 5aca7966d2a7 ("Merge tag
'perf-tools-fixes-for-v6.17-2025-09-16' of
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools").
This bug arises due to very limited and poor input validation
that was done at nic_valid_size(). This validation only
validates the skb->len (directly reflects size provided at the
userspace interface) with the length provided in the buffer
itself (interpreted as NCI_HEADER). This leads to the processing
of memory content at the address assuming the correct layout
per what opcode requires there. This leads to the accesses to
buffer of `skb_buff->data` which is not assigned anything yet.
Following the same silent drop of packets of invalid sizes at
`nic_valid_size()`, add validation of the data in the respective
handlers and return error values in case of failure. Release
the skb if error values are returned from handlers in
`nci_nft_packet` and effectively do a silent drop
Possible TODO: because we silently drop the packets, the
call to `nci_request` will be waiting for completion of request
and will face timeouts. These timeouts can get excessively logged
in the dmesg. A proper handling of them may require to export
`nci_request_cancel` (or propagate error handling from the
nft packets handlers).
Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=740e04c2a93467a0f8c8
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Tested-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Deepak Sharma <deepak.sharma.472935@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250925132846.213425-1-deepak.sharma.472935@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
||
|
|
7bd80ed89d |
Documentation: net: add flow control guide and document ethtool API
Introduce a new document, flow_control.rst, to provide a comprehensive guide on Ethernet Flow Control in Linux. The guide explains how flow control works, how autonegotiation resolves pause capabilities, and how to configure it using ethtool and Netlink. In parallel, document the pause and pause-stat attributes in the ethtool.yaml netlink spec. This enables the ynl tool to generate kernel-doc comments for the corresponding enums in the UAPI header, making the C interface self-documenting. Finally, replace the legacy flow control section in phy.rst with a reference to the new document and add pointers in the relevant C source files. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
4ed9db2dc5 |
net: rtnetlink: fix typo in rtnl_unregister_all() comment
Corrected "rtnl_unregster()" -> "rtnl_unregister()" in the documentation comment of "rtnl_unregister_all()" Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250929085418.49200-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
59701b1870 |
mptcp: minor move_skbs_to_msk() cleanup
Such function is called only by __mptcp_data_ready(), which in turn is always invoked when msk is not owned by the user: we can drop the redundant, related check. Additionally mptcp needs to propagate the socket error only for current subflow. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-7-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
68c7af988b |
mptcp: factor out a basic skb coalesce helper
The upcoming patch will introduced backlog processing for MPTCP socket, and we want to leverage coalescing in such data path. Factor out the relevant bits not touching memory accounting to deal with such use-case. Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-6-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
c4ebc4ee4e |
mptcp: remove unneeded mptcp_move_skb()
Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"),
sk_shutdown can't change during the main recvmsg loop, we can drop
the related race breaker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-5-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
9a0afe0db4 |
mptcp: introduce the mptcp_init_skb helper
Factor out all the skb initialization step in a new helper and use it. Note that this change moves the MPTCP CB initialization earlier: we can do such step as soon as the skb leaves the subflow socket receive queues. Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-4-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
e118cdc34d |
mptcp: rcvbuf auto-tuning improvement
Apply to the MPTCP auto-tuning the same improvements introduced for the
TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch
'tcp-receive-side-improvements'").
The main difference is that TCP subflow and the main MPTCP socket need
to account separately for OoO: MPTCP does not care for TCP-level OoO
and vice versa, as a consequence do not reflect MPTCP-level rcvbuf
increase due to OoO packets at the subflow level.
This refeactor additionally allow dropping the msk receive buffer update
at receive time, as the latter only intended to cope with subflow receive
buffer increase due to OoO packets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-3-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
a755677974 |
tcp: make tcp_rcvbuf_grow() accessible to mptcp code
To leverage the auto-tuning improvements brought by commit 2da35e4b4df9
("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need
to access the mentioned helper.
Acked-by: Geliang Tang <geliang@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-2-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
9aa59323f2 |
mptcp: leverage skb deferral free
Usage of the skb deferral API is straight-forward; with multiple subflows actives this allow moving part of the received application load into multiple CPUs. Also fix a typo in the related comment. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-1-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
f017c1f768 |
tcp: use skb->len instead of skb->truesize in tcp_can_ingest()
Some applications are stuck to the 20th century and still use
small SO_RCVBUF values.
After the blamed commit, we can drop packets especially
when using LRO/hw-gro enabled NIC and small MSS (1500) values.
LRO/hw-gro NIC pack multiple segments into pages, allowing
tp->scaling_ratio to be set to a high value.
Whenever the receive queue gets full, we can receive a small packet
filling RWIN, but with a high skb->truesize, because most NIC use 4K page
plus sk_buff metadata even when receiving less than 1500 bytes of payload.
Even if we refine how tp->scaling_ratio is estimated,
we could have an issue at the start of the flow, because
the first round of packets (IW10) will be sent based on
the initial tp->scaling_ratio (1/2)
Relax tcp_can_ingest() to use skb->len instead of skb->truesize,
allowing the peer to use final RWIN, assuming a 'perfect'
scaling_ratio of 1.
Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
d210ee58da |
bluetooth-next pull request for net-next:
core: - MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver - Avoid a couple dozen -Wflex-array-member-not-at-end warnings - bcsp: receive data only if registered - HCI: Fix using LE/ACL buffers for ISO packets - hci_core: Detect if an ISO link has stalled - ISO: Don't initiate CIS connections if there are no buffers - ISO: Use sk_sndtimeo as conn_timeout drivers: - btusb: Check for unexpected bytes when defragmenting HCI frames - btusb: Add new VID/PID 13d3/3627 for MT7925 - btusb: Add new VID/PID 13d3/3633 for MT7922 - btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1 - btintel: Add support for BlazarIW core - btintel_pcie: Add support for _suspend() / _resume() - btintel_pcie: Define hdev->wakeup() callback - btintel_pcie: Add Bluetooth core/platform as comments - btintel_pcie: Add id of Scorpious, Panther Lake-H484 - btintel_pcie: Refactor Device Coredump -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmjYBiUZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKW1wD/9z6vGnsfEB1uogUCX6dxZw yysJ41FqsUKESKiWREEDs4cwnPNJRRDNUCjjaex36qXTc8LTUJwxeiTdIPd5qgPC ZJ0Xp+uo9FnqnEPLJMsUZ1j1tCCSLUSHF9Y570XpvRupPbHiVwrFVYF/R36CnuUV mZknBSf1NQSHAUbAoMqa3xls3XUGy12NDA9ObBUmYt5530kigYNOmx0OjSqZKvWa Sd/quLY+oJ41wy7TAMqGGe5rKrDYZdicpGzjdmaRETrFnB8ebevw/1QTLyjdwbMt CI+sOvwPGpfQUpETCrjgmscz4UDkgP+XAwEV2xn6t0Y9MGbRSZrYO8DQ0Zk5UhKt e1fi4yHp5xOA/IvBeQonL1Ln6shMkIt5GF+0D125jb4OfxamOqn3qJ2isy3WVJDj tNRj4Z9OsokQUfcCkJq+9AaFGpnNaIc4/Vu65sqJhnnYAU+gL3epPEziTl0tVAlq H0urgd/oX6Egip9jsRwQWhblodGwPEAMbAafyhjx4dc3RFkj45bCAK6TY9BaMz2T lPndUgmCvBalJrVDXrx/M5ozCRnWqUPfIv5d7RGHs/W1KFIxw4wFEEJrdwhqv/ps RFFDmeQUPtECp97Tbk81Whb5h19V5zWBygcFq96jEbmz9JUlwqmaE8EvUrMCHbDK yThQP9erV7ULkO44QWEniQ== =vr6/ -----END PGP SIGNATURE----- Merge tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver - Avoid a couple dozen -Wflex-array-member-not-at-end warnings - bcsp: receive data only if registered - HCI: Fix using LE/ACL buffers for ISO packets - hci_core: Detect if an ISO link has stalled - ISO: Don't initiate CIS connections if there are no buffers - ISO: Use sk_sndtimeo as conn_timeout drivers: - btusb: Check for unexpected bytes when defragmenting HCI frames - btusb: Add new VID/PID 13d3/3627 for MT7925 - btusb: Add new VID/PID 13d3/3633 for MT7922 - btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1 - btintel: Add support for BlazarIW core - btintel_pcie: Add support for _suspend() / _resume() - btintel_pcie: Define hdev->wakeup() callback - btintel_pcie: Add Bluetooth core/platform as comments - btintel_pcie: Add id of Scorpious, Panther Lake-H484 - btintel_pcie: Refactor Device Coredump * tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (30 commits) Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements Bluetooth: ISO: don't leak skb in ISO_CONT RX Bluetooth: ISO: free rx_skb if not consumed Bluetooth: ISO: Fix possible UAF on iso_conn_free Bluetooth: SCO: Fix UAF on sco_conn_free Bluetooth: bcsp: receive data only if registered Bluetooth: btusb: Add new VID/PID 13d3/3633 for MT7922 Bluetooth: btusb: Add new VID/PID 13d3/3627 for MT7925 Bluetooth: remove duplicate h4_recv_buf() in header Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames Bluetooth: hci_core: Print information of hcon on hci_low_sent Bluetooth: hci_core: Print number of packets in conn->data_q Bluetooth: Add function and line information to bt_dbg Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO Bluetooth: hci_core: Detect if an ISO link has stalled Bluetooth: ISO: Use sk_sndtimeo as conn_timeout Bluetooth: HCI: Fix using LE/ACL buffers for ISO packets Bluetooth: ISO: Don't initiate CIS connections if there are no buffers MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver ... ==================== Link: https://patch.msgid.link/20250927154616.1032839-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
1fb0e47161 |
net: remove one stac/clac pair from move_addr_to_user()
Convert the get_user() and __put_user() code to the
fast masked_user_access_begin()/unsafe_{get|put}_user()
variant.
This patch increases the performance of an UDP recvfrom()
receiver (netserver) on 120 bytes messages by 7 %
on an AMD EPYC 7B12 64-Core Processor platform.
Presence of audit_sockaddr() makes difficult
to avoid the stac/clac pair in the copy_to_user() call,
this is left for a future patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250925230929.3727873-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
2b235765e9 |
scm: use masked_user_access_begin() in put_cmsg()
Use the greatest and latest uaccess construct to get an optimal code. Before : lea (%r9,%rcx,1),%r10 movabs $<USER_PTR_MAX>,%r11 mov $0xfffffff2,%eax cmp %rcx,%r10 jb ffffffff81cdc312 <put_cmsg+0x152> cmp %r11,%r10 ja ffffffff81cdc312 <put_cmsg+0x152> stac lfence mov %r9,(%rcx) After: movabs $<USER_PTR_MAX>,%r9 cmp %r9,%rax cmova %r9,%rax stac mov %rcx,(%rax) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250925224914.3590290-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2804359536 |
net: ethtool: remove duplicated mm.o from Makefile
Fixes: 2b30f8291a30 ("net: ethtool: add support for MAC Merge layer")
Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250926131323.222192-1-m.heidelberg@cab.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|