When using Rust on the x86 architecture, we are currently using the
unstable target.json feature to specify the compilation target. Rustc is
going to change how softfloat is specified in the target.json file on
x86, thus update generate_rust_target.rs to specify softfloat using the
new option.
Note that if you enable this parameter with a compiler that does not
recognize it, then that triggers a warning but it does not break the
build.
[ For future reference, this solves the following error:
RUSTC L rust/core.o
error: Error loading target specification: target feature
`soft-float` is incompatible with the ABI but gets enabled in
target spec. Run `rustc --print target-list` for a list of
built-in targets
- Miguel ]
Cc: <stable@vger.kernel.org> # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust/pull/136146
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
Link: https://lore.kernel.org/r/20250203-rustc-1-86-x86-softfloat-v1-1-220a72a5003e@google.com
[ Added 6.13.y too to Cc: stable tag and added reasoning to avoid
over-backporting. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The uretprobe syscall is implemented as a performance enhancement on
x86_64 by having the kernel inject a call to it on function exit; User
programs cannot call this system call explicitly.
As such, this syscall is considered a kernel implementation detail and
should not be filtered by seccomp.
Enhance the seccomp bpf test suite to check that uretprobes can be
attached to processes without the killing the process regardless of
seccomp policy.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com
[kees: Skip archs without __NR_uretprobe]
Signed-off-by: Kees Cook <kees@kernel.org>
Pull pci fixes from Bjorn Helgaas:
- When saving a device's state, always save the upstream bridge's PM L1
Substates configuration as well because the bridge never saves its
own state, and restoring a device needs the state for both ends; this
was a regression that caused link and power management errors after
suspend/resume (Ilpo Järvinen)
- Correct TPH Control Register write, where we wrote the ST Mode where
the THP Requester Enable value was intended (Robin Murphy)
* tag 'pci-v6.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/TPH: Restore TPH Requester Enable correctly
PCI/ASPM: Fix L1SS saving
I'm stepping down as ath10k, ath11k and ath12k maintainer so remove me from
MAINTAINERS file and Device Tree bindings. Jeff continues as the maintainer.
As my quicinc.com email will not work anymore so add an entry to .mailmap file
to direct the mail to my kernel.org address.
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Acked-by: Jeff Johnson <jjohnson@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20250203180445.1429640-1-kvalo@kernel.org
Pull xen fixes from Juergen Gross:
"Three fixes for xen_hypercall_hvm() that was introduced in the 6.13
cycle"
* tag 'for-linus-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: remove unneeded dummy push from xen_hypercall_hvm()
x86/xen: add FRAME_END to xen_hypercall_hvm()
x86/xen: fix xen_hypercall_hvm() to not clobber %rbx
Merge amd-pstate driver fixes for 6.14-rc2 from Mario Limonciello:
"* Fix some error cleanup paths with mutex use and boost
* Fix a ref counting issue
* Fix a schedutil issue"
* tag 'amd-pstate-v6.14-2025-02-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate: Fix cpufreq_policy ref counting
cpufreq/amd-pstate: Fix max_perf updation with schedutil
cpufreq/amd-pstate: Remove the goto label in amd_pstate_update_limits
cpufreq/amd-pstate: Fix per-policy boost flag incorrect when fail
ASAN generates special synthetic symbols to help check for ODR
violations. These synthetic symbols lack debug information, so
gendwarfksyms emits warnings when processing them. No code should ever
have a dependency on these symbols, so we should not be exporting them,
just like the __cfi symbols.
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250122-gendwarfksyms-kasan-rust-v1-1-5ee5658f4fb6@google.com
[ Fixed typo in commit message. Slightly reworded title. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Pull networking fixes from Paolo Abeni:
"Interestingly the recent kmemleak improvements allowed our CI to catch
a couple of percpu leaks addressed here.
We (mostly Jakub, to be accurate) are working to increase review
coverage over the net code-base tweaking the MAINTAINER entries.
Current release - regressions:
- core: harmonize tstats and dstats
- ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels
- eth: tun: revert fix group permission check
- eth: stmmac: revert "specify hardware capability value when FIFO
size isn't specified"
Previous releases - regressions:
- udp: gso: do not drop small packets when PMTU reduces
- rxrpc: fix race in call state changing vs recvmsg()
- eth: ice: fix Rx data path for heavy 9k MTU traffic
- eth: vmxnet3: fix tx queue race condition with XDP
Previous releases - always broken:
- sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0
- ethtool: ntuple: fix rss + ring_cookie check
- rxrpc: fix the rxrpc_connection attend queue handling
Misc:
- recognize Kuniyuki Iwashima as a maintainer"
* tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified"
MAINTAINERS: add a sample ethtool section entry
MAINTAINERS: add entry for ethtool
rxrpc: Fix race in call state changing vs recvmsg()
rxrpc: Fix call state set to not include the SERVER_SECURING state
net: sched: Fix truncation of offloaded action statistics
tun: revert fix group permission check
selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()
netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()
selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0
pfifo_tail_enqueue: Drop new packet when sch->limit == 0
selftests: mptcp: connect: -f: no reconnect
net: rose: lock the socket in rose_bind()
net: atlantic: fix warning during hot unplug
rxrpc: Fix the rxrpc_connection attend queue handling
net: harmonize tstats and dstats
selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported
selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure
ethtool: ntuple: fix rss + ring_cookie check
ethtool: rss: fix hiding unsupported fields in dumps
...
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
The Nullity of sps->cstream needs to be checked in sof_ipc_msg_data() and not
assume that it is not NULL.
The sps->stream must be cleared to NULL on close since this is used as a check
to see if we have active PCM stream.
MS-SMB2 section 2.2.13.2.10 specifies that 'epoch' should be a 16-bit
unsigned integer used to track lease state changes. Change the data
type of all instances of 'epoch' from unsigned int to __u16. This
simplifies the epoch change comparisons and makes the code more
compliant with the protocol spec.
Cc: stable@vger.kernel.org
Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Commit 1db806ec06 ("PCI/ASPM: Save parent L1SS config in
pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both the
Upstream Port and its upstream bridge when handling an Upstream Port, which
matches what the L1SS restore side does. However, parent->state_saved can
be set true at an earlier time when the upstream bridge saved other parts
of its state. Then later when attempting to save the L1SS config while
handling the Upstream Port, parent->state_saved is true in
pci_save_aspm_l1ss_state() resulting in early return and skipping saving
bridge's L1SS config because it is assumed to be already saved. Later on
restore, junk is written into L1SS config which causes issues with some
devices.
Remove parent->state_saved check and unconditionally save L1SS config also
for the upstream bridge from an Upstream Port which ought to be harmless
from correctness point of view. With the Upstream Port check now present,
saving the L1SS config more than once for the bridge is no longer a problem
(unlike when the parent->state_saved check got introduced into the fix
during its development).
Link: https://lore.kernel.org/r/20250131152913.2507-1-ilpo.jarvinen@linux.intel.com
Fixes: 1db806ec06 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Reported by: Rafael J. Wysocki <rafael@kernel.org>
Closes: https://lore.kernel.org/r/CAJZ5v0iKmynOQ5vKSQbg1J_FmavwZE-nRONovOZ0mpMVauheWg@mail.gmail.com
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/d7246feb-4f3f-4d0c-bb64-89566b170671@molgen.mpg.de
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360
Richard Henderson <richard.henderson@linaro.org> writes[1]:
> There was a Spec benchmark (I forget which) which was memory bound and ran
> twice as fast with 32-bit pointers.
>
> I copied the idea from DEC to the ELF abi, but never did all the other work
> to allow the toolchain to take advantage.
>
> Amusingly, a later Spec changed the benchmark data sets to not fit into a
> 32-bit address space, specifically because of this.
>
> I expect one could delete the ELF bit and personality and no one would
> notice. Not even the 10 remaining Alpha users.
In [2] it was pointed out that parts of setarch weren't working
properly on alpha because it has it's own SET_PERSONALITY
implementation. In the discussion that followed Richard Henderson
pointed out that the 32bit pointer support for alpha was never
completed.
Fix this by removing alpha's 32bit pointer support.
As a bit of paranoia refuse to execute any alpha binaries that have
the EF_ALPHA_32BIT flag set. Just in case someone somewhere has
binaries that try to use alpha's 32bit pointer support.
Link: https://lkml.kernel.org/r/CAFXwXrkgu=4Qn-v1PjnOR4SG0oUb9LSa0g6QXpBq4ttm52pJOQ@mail.gmail.com [1]
Link: https://lkml.kernel.org/r/20250103140148.370368-1-glaubitz@physik.fu-berlin.de [2]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/87y0zfs26i.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: Kees Cook <kees@kernel.org>
The find_preferred_mode() functions takes the mode_config mutex, but due
to the order most tests have, is called with the crtc_ww_class_mutex
taken. This raises a warning for a circular dependency when running the
tests with lockdep.
Reorder the tests to call find_preferred_mode before the acquire context
has been created to avoid the issue.
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129-test-kunit-v2-4-fe59c43805d5@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
On an aarch64 kernel with CONFIG_PAGE_SIZE_64KB=y,
arena_htab tests cause a segmentation fault and soft lockup.
The same failure is not observed with 4k pages on aarch64.
It turns out arena_map_free() is calling
apply_to_existing_page_range() with the address returned by
bpf_arena_get_kern_vm_start(). If this address is not page-aligned
the code ends up calling apply_to_pte_range() with that unaligned
address causing soft lockup.
Fix it by round up GUARD_SZ to PAGE_SIZE << 1 so that the
division by 2 in bpf_arena_get_kern_vm_start() returns
a page-aligned value.
Fixes: 317460317a ("bpf: Introduce bpf_arena.")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20250205170059.427458-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For systems which load firmware on the cs35l41 which use ACPI, the
_SUB value is used to differentiate firmware and tuning files for the
individual systems. In the case where a system does not have a _SUB
defined in ACPI node for cs35l41, there needs to be a fallback to
allow the files for that system to be differentiated. Since all
ACPI nodes for cs35l41 should have a HID defined, the HID should be a
safe option.
Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Tested-by: André Almeida <andrealmeid@igalia.com>
Link: https://patch.msgid.link/20250205164806.414020-1-sbinding@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This reverts commit 8865d22656, which caused breakage for platforms
which are not using xgmac2 or gmac4. Only these two cores have the
capability of providing the FIFO sizes from hardware capability fields
(which are provided in priv->dma_cap.[tr]x_fifo_size.)
All other cores can not, which results in these two fields containing
zero. We also have platforms that do not provide a value in
priv->plat->[tr]x_fifo_size, resulting in these also being zero.
This causes the new tests introduced by the reverted commit to fail,
and produce e.g.:
stmmaceth f0804000.eth: Can't specify Rx FIFO size
An example of such a platform which fails is QEMU's npcm750-evb.
This uses dwmac1000 which, as noted above, does not have the capability
to provide the FIFO sizes from hardware.
Therefore, revert the commit to maintain compatibility with the way
the driver used to work.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/4e98f967-f636-46fb-9eca-d383b9495b86@roeck-us.net
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Steven Price <steven.price@arm.com>
Fixes: 8865d22656 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
Link: https://patch.msgid.link/E1tfeyR-003YGJ-Gb@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I feel like we don't do a good enough keeping authors of driver
APIs around. The ethtool code base was very nicely compartmentalized
by Michal. Establish a precedent of creating MAINTAINERS entries
for "sections" of the ethtool API. Use Andrew and cable test as
a sample entry. The entry should ideally cover 3 elements:
a core file, test(s), and keywords. The last one is important
because we intend the entries to cover core code *and* reviews
of drivers implementing given API!
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204215750.169249-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michal did an amazing job converting ethtool to Netlink, but never
added an entry to MAINTAINERS for himself. Create a formal entry
so that we can delegate (portions) of this code to folks.
Over the last 3 years majority of the reviews have been done by
Andrew and I. I suppose Michal didn't want to be on the receiving
end of the flood of patches.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250204215729.168992-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 3ba11e684d ("pinctrl: pinconf-generic: print hex value")
unconditionally switched to printing hex values in
pinconf_generic_dump_one(). However, if a dump format is registered for the
dumped pin, the hex value is printed as well. This hex value does not
necessarily correspond 1:1 with the hardware register value (as noted by
commit 3ba11e684d ("pinctrl: pinconf-generic: print hex value")). As a
result, user-facing output may include information like:
output drive strength (0x100 uA).
To address this, check if a dump format is registered for the dumped
property, and print the unsigned value instead when applicable.
Fixes: 3ba11e684d ("pinctrl: pinconf-generic: print hex value")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://lore.kernel.org/20250205101058.2034860-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Enable TISCI Interrupt Router and Interrupt Aggregator drivers.
These IPs are found in all TI K3 SoCs like J721E, AM62X and is required
for core functionality like DMA, GPIO Interrupts which is necessary
during boot, thus make them built-in.
bloat-o-meter summary on vmlinux:
add/remove: 460/1 grow/shrink: 4/0 up/down: 162483/-8 (162475)
...
Total: Before=31615984, After=31778459, chg +0.51%
These configs were previously selected for ARCH_K3 in respective Kconfigs
till commit b8b26ae398 ("irqchip/ti-sci-inta : Add module build support")
and commit 2d95ffaecb ("irqchip/ti-sci-intr: Add module build support")
dropped them and few driver configs (TI_K3_UDMA, TI_K3_RINGACC)
dependent on these also got disabled due to this. While re-enabling the
TI_SCI_INT_*_IRQCHIP configs, these configs with missing dependencies
(which are already part of arm64 defconfig) also get re-enabled which
explains the slightly larger size increase from the bloat-o-meter summary.
Fixes: 2d95ffaecb ("irqchip/ti-sci-intr: Add module build support")
Fixes: b8b26ae398 ("irqchip/ti-sci-inta : Add module build support")
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Tested-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://lore.kernel.org/r/20250205062229.3869081-1-vaishnav.a@ti.com
Signed-off-by: Nishanth Menon <nm@ti.com>
After commit 36008fe6e3 ("smb: client: don't try following DFS links
in cifs_tree_connect()"), TCP_Server_Info::leaf_fullpath will no
longer be changed, so there is no need to kstrdup() it.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the client attempts to tree connect to a domain-based DFS
namespace from a DFS interlink target, the server will return
STATUS_BAD_NETWORK_NAME and the following will appear on dmesg:
CIFS: VFS: BAD_NETWORK_NAME: \\dom\dfs
Since a DFS share might contain several DFS interlinks and they expire
after 10 minutes, the above message might end up being flooded on
dmesg when mounting or accessing them.
Print this only once per share.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Some servers don't respect the DFSREF_STORAGE_SERVER bit, so
unconditionally tree connect to DFS link target and then decide
whether or not continue chasing DFS referrals for DFS interlinks.
Otherwise the client would fail to mount such shares.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells says:
====================
rxrpc: Call state fixes
Here some call state fixes for AF_RXRPC.
(1) Fix the state of a call to not treat the challenge-response cycle as
part of an incoming call's state set. The problem is that it makes
handling received of the final packet in the receive phase difficult
as that wants to change the call state - but security negotiations may
not yet be complete.
(2) Fix a race between the changing of the call state at the end of the
request reception phase of a service call, recvmsg() collecting the last
data and sendmsg() trying to send the reply before the I/O thread has
advanced the call state.
Link: https://lore.kernel.org/20250203110307.7265-2-dhowells@redhat.com
====================
Link: https://patch.msgid.link/20250204230558.712536-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's a race in between the rxrpc I/O thread recording the end of the
receive phase of a call and recvmsg() examining the state of the call to
determine whether it has completed.
The problem is that call->_state records the I/O thread's view of the call,
not the application's view (which may lag), so that alone is not
sufficient. To this end, the application also checks whether there is
anything left in call->recvmsg_queue for it to pick up. The call must be
in state RXRPC_CALL_COMPLETE and the recvmsg_queue empty for the call to be
considered fully complete.
In rxrpc_input_queue_data(), the latest skbuff is added to the queue and
then, if it was marked as LAST_PACKET, the state is advanced... But this
is two separate operations with no locking around them.
As a consequence, the lack of locking means that sendmsg() can jump into
the gap on a service call and attempt to send the reply - but then get
rejected because the I/O thread hasn't advanced the state yet.
Simply flipping the order in which things are done isn't an option as that
impacts the client side, causing the checks in rxrpc_kernel_check_life() as
to whether the call is still alive to race instead.
Fix this by moving the update of call->_state inside the skb queue
spinlocked section where the packet is queued on the I/O thread side.
rxrpc's recvmsg() will then automatically sync against this because it has
to take the call->recvmsg_queue spinlock in order to dequeue the last
packet.
rxrpc's sendmsg() doesn't need amending as the app shouldn't be calling it
to send a reply until recvmsg() indicates it has returned all of the
request.
Fixes: 93368b6bd5 ("rxrpc: Move call state changes from recvmsg to I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250204230558.712536-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXRPC_CALL_SERVER_SECURING state doesn't really belong with the other
states in the call's state set as the other states govern the call's Rx/Tx
phase transition and govern when packets can and can't be received or
transmitted. The "Securing" state doesn't actually govern the reception of
packets and would need to be split depending on whether or not we've
received the last packet yet (to mirror RECV_REQUEST/ACK_REQUEST).
The "Securing" state is more about whether or not we can start forwarding
packets to the application as recvmsg will need to decode them and the
decoding can't take place until the challenge/response exchange has
completed.
Fix this by removing the RXRPC_CALL_SERVER_SECURING state from the state
set and, instead, using a flag, RXRPC_CALL_CONN_CHALLENGING, to track
whether or not we can queue the call for reception by recvmsg() or notify
the kernel app that data is ready. In the event that we've already
received all the packets, the connection event handler will poke the app
layer in the appropriate manner.
Also there's a race whereby the app layer sees the last packet before rxrpc
has managed to end the rx phase and change the state to one amenable to
allowing a reply. Fix this by queuing the packet after calling
rxrpc_end_rx_phase().
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250204230558.712536-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of tc offload, when user space queries the kernel for tc action
statistics, tc will query the offloaded statistics from device drivers.
Among other statistics, drivers are expected to pass the number of
packets that hit the action since the last query as a 64-bit number.
Unfortunately, tc treats the number of packets as a 32-bit number,
leading to truncation and incorrect statistics when the number of
packets since the last query exceeds 0xffffffff:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 58 sec used 0 sec
Action statistics:
Sent 1133877034176 bytes 536959475 pkt (dropped 0, overlimits 0 requeues 0)
[...]
According to the above, 2111-byte packets were redirected which is
impossible as only 64-byte packets were transmitted and the MTU was
1500.
Fix by treating packets as a 64-bit number:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 61 sec used 0 sec
Action statistics:
Sent 1370624380864 bytes 21416005951 pkt (dropped 0, overlimits 0 requeues 0)
[...]
Which shows that only 64-byte packets were redirected (1370624380864 /
21416005951 = 64).
Fixes: 3804070235 ("net/sched: Enable netdev drivers to update statistics of offloaded actions")
Reported-by: Joe Botha <joe@atomic.ac>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204123839.1151804-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>