TWx Linux Repository
Go to file
Taehee Yoo 59a931c5b7 xdp: fix invalid wait context of page_pool_destroy()
If the driver uses a page pool, it creates a page pool with
page_pool_create().
The reference count of page pool is 1 as default.
A page pool will be destroyed only when a reference count reaches 0.
page_pool_destroy() is used to destroy page pool, it decreases a
reference count.
When a page pool is destroyed, ->disconnect() is called, which is
mem_allocator_disconnect().
This function internally acquires mutex_lock().

If the driver uses XDP, it registers a memory model with
xdp_rxq_info_reg_mem_model().
The xdp_rxq_info_reg_mem_model() internally increases a page pool
reference count if a memory model is a page pool.
Now the reference count is 2.

To destroy a page pool, the driver should call both page_pool_destroy()
and xdp_unreg_mem_model().
The xdp_unreg_mem_model() internally calls page_pool_destroy().
Only page_pool_destroy() decreases a reference count.

If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
will face an invalid wait context warning.
Because xdp_unreg_mem_model() calls page_pool_destroy() with
rcu_read_lock().
The page_pool_destroy() internally acquires mutex_lock().

Splat looks like:
=============================
[ BUG: Invalid wait context ]
6.10.0-rc6+ #4 Tainted: G W
-----------------------------
ethtool/1806 is trying to lock:
ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
other info that might help us debug this:
context-{5:5}
3 locks held by ethtool/1806:
stack backtrace:
CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
__lock_acquire+0x1681/0x4de0
? _printk+0x64/0xe0
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x1b3/0x580
? mem_allocator_disconnect+0x73/0x150
? __wake_up_klogd.part.0+0x16/0xc0
? __pfx_lock_acquire+0x10/0x10
? dump_stack_lvl+0x91/0xc0
__mutex_lock+0x15c/0x1690
? mem_allocator_disconnect+0x73/0x150
? __pfx_prb_read_valid+0x10/0x10
? mem_allocator_disconnect+0x73/0x150
? __pfx_llist_add_batch+0x10/0x10
? console_unlock+0x193/0x1b0
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___mutex_lock+0x10/0x10
? tick_nohz_tick_stopped+0x16/0x90
? __irq_work_queue_local+0x1e5/0x330
? irq_work_queue+0x39/0x50
? __wake_up_klogd.part.0+0x79/0xc0
? mem_allocator_disconnect+0x73/0x150
mem_allocator_disconnect+0x73/0x150
? __pfx_mem_allocator_disconnect+0x10/0x10
? mark_held_locks+0xa5/0xf0
? rcu_is_watching+0x11/0xb0
page_pool_release+0x36e/0x6d0
page_pool_destroy+0xd7/0x440
xdp_unreg_mem_model+0x1a7/0x2a0
? __pfx_xdp_unreg_mem_model+0x10/0x10
? kfree+0x125/0x370
? bnxt_free_ring.isra.0+0x2eb/0x500
? bnxt_free_mem+0x5ac/0x2500
xdp_rxq_info_unreg+0x4a/0xd0
bnxt_free_mem+0x1356/0x2500
bnxt_close_nic+0xf0/0x3b0
? __pfx_bnxt_close_nic+0x10/0x10
? ethnl_parse_bit+0x2c6/0x6d0
? __pfx___nla_validate_parse+0x10/0x10
? __pfx_ethnl_parse_bit+0x10/0x10
bnxt_set_features+0x2a8/0x3e0
__netdev_update_features+0x4dc/0x1370
? ethnl_parse_bitset+0x4ff/0x750
? __pfx_ethnl_parse_bitset+0x10/0x10
? __pfx___netdev_update_features+0x10/0x10
? mark_held_locks+0xa5/0xf0
? _raw_spin_unlock_irqrestore+0x42/0x70
? __pm_runtime_resume+0x7d/0x110
ethnl_set_features+0x32d/0xa20

To fix this problem, it uses rhashtable_lookup_fast() instead of
rhashtable_lookup() with rcu_read_lock().
Using xa without rcu_read_lock() here is safe.
xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
call_rcu() of mem_xa_remove().
The mem_xa_remove() is called by page_pool_destroy() if a reference
count reaches 0.
The xa is already protected by the reference count mechanism well in the
control plane.
So removing rcu_read_lock() for page_pool_destroy() is safe.

Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14 20:40:21 -07:00
arch ARM: SoC fixes for 6.10, part 3 2024-07-12 09:00:25 -07:00
block block: unmap and free user mapped integrity via submitter 2024-06-12 11:00:50 -06:00
certs kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
Documentation ARM: SoC fixes for 6.10, part 3 2024-07-12 09:00:25 -07:00
drivers net: netconsole: Disable target before netpoll cleanup 2024-07-14 07:38:44 -07:00
fs for-6.10-rc7-tag 2024-07-12 12:08:42 -07:00
include ipv6: fix source address selection with route leak 2024-07-14 07:34:16 -07:00
init gcc: disable '-Warray-bounds' for gcc-9 2024-06-15 10:43:04 -07:00
io_uring io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it 2024-07-02 09:42:10 -06:00
ipc Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
kernel bpf: Defer work in bpf_timer_cancel_and_free 2024-07-10 15:59:44 -07:00
lib 21 hotfixes, 15 of which are cc:stable. 2024-07-10 14:59:41 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio 2024-07-09 15:41:11 -07:00
net xdp: fix invalid wait context of page_pool_destroy() 2024-07-14 20:40:21 -07:00
rust rust: avoid unused import warning in rusttest 2024-06-11 23:33:28 +02:00
samples tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
scripts kbuild: scripts/gdb: bring the "abspath" back 2024-06-27 04:20:32 +09:00
security integrity-v6.10-fix 2024-07-05 16:21:54 -07:00
sound ASoC: Fixes for v6.10 2024-07-11 17:11:50 +02:00
tools selftests: vrf_route_leaking: add local test 2024-07-14 07:34:16 -07:00
usr kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
virt KVM fixes for 6.10 2024-06-21 08:03:55 -04:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: create a list of all built DTB files 2024-02-19 18:20:39 +09:00
.mailmap MAINTAINERS: mailmap: update Lorenzo Stoakes's email address 2024-07-06 11:39:51 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS pci-v6.10-fixes-2 2024-07-05 12:33:00 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS ARM: SoC fixes for 6.10, part 3 2024-07-12 09:00:25 -07:00
Makefile Linux 6.10-rc7 2024-07-07 14:23:46 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.