A recent commit moved the error handling of sqpoll thread and tctx
failures into the thread itself, as part of fixing an issue. However, it
missed that tctx allocation may also fail, and that
io_sq_offload_create() does its own error handling for the task_struct
in that case.
Remove the manual task putting in io_sq_offload_create(), as
io_sq_thread() will notice that the tctx did not get setup and hence it
should put itself and exit.
Reported-by: syzbot+763e12bbf004fb1062e4@syzkaller.appspotmail.com
Fixes: ac0b8b327a ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This function exists in both tctx.h (where it belongs) and in io_uring.h
as a remnant of before the tctx handling code got split out. Remove the
io_uring.h definition and ensure that sqpoll.c includes the tctx.h
header to get the definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Without explicitly setting a parent for the watchdog device, the device is
registered with a NULL parent. This causes device_add() (called internally
by devm_watchdog_register_device()) to register the device under
/sys/devices/virtual, since no parent is provided. The result is:
DEVPATH=/devices/virtual/watchdog/watchdog0
To fix this, assign &pdev->dev as the parent of the watchdog device before
calling devm_watchdog_register_device(). This ensures the device is
associated with the Portwell EC platform device and placed correctly in
sysfs as:
DEVPATH=/devices/platform/portwell-ec/watchdog/watchdog0
This aligns the device hierarchy with expectations and avoids misplacement
under the virtual class.
Fixes: 8357967533 ("platform/x86: portwell-ec: Add GPIO and WDT driver for Portwell EC")
Signed-off-by: Ivan Hu <ivan.hu@canonical.com>
Link: https://lore.kernel.org/r/20250616074819.63547-1-ivan.hu@canonical.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
An aoe device's rq_list contains accepted block requests that are
waiting to be transmitted to the aoe target. This queue was added as
part of the conversion to blk_mq. However, the queue was not cleaned out
when an aoe device is downed which caused blk_mq_freeze_queue() to sleep
indefinitely waiting for those requests to complete, causing a hang. This
fix cleans out the queue before calling blk_mq_freeze_queue().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665
Fixes: 3582dd2917 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: Justin Sanders <jsanders.devel@gmail.com>
Link: https://lore.kernel.org/r/20250610170600.869-1-jsanders.devel@gmail.com
Tested-By: Valentin Kleibel <valentin@vrvis.at>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As-per the SBI specification, an SBI remote fence operation applies
to the entire address space if either:
1) start_addr and size are both 0
2) size is equal to 2^XLEN-1
>From the above, only #1 is checked by SBI SFENCE calls so fix the
size parameter check in SBI SFENCE calls to cover #2 as well.
Fixes: 13acfec2db ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250605061458.196003-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
We can only pass negative error codes to bch2_err_str(); if it's a
positive integer it's not an error and we trip an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A path exists in a particular snapshot: we should do the pathwalk in the
snapshot ID of the inode we started from, _not_ change snapshot ID as we
walk inodes and dirents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we find a directory connectivity problem, we should do the repair
in the oldest snapshot that has the issue - so that we don't end up
duplicating work or making a real mess of things.
Oldest snapshot IDs have the highest integer value, so - just walk
inodes in reverse order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch_subvolume.fs_path_parent needs to be updated as well, it should
match inode.bi_parent_subvol.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The bch2_subvolume_get_snapshot() call needs to happen before the dirent
lookup - the dirent is in the parent subvolume.
Also, check for loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
kthread creation checks for pending signals, which is _very_ annoying if
we have to do a long recovery and don't go rw until we've done
significant work.
Check if we'll be going rw and pre-allocate kthreads/workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Print out more info when we find a key (extent, dirent, xattr) for a
missing inode - was there a good inode in an older snapshot, full(ish)
list of keys for that missing inode, so we can make better decisions on
how to repair.
If it looks like it should've been deleted, autofix it. If we ever hit
the non-autofix cases, we'll want to write more repair code (possibly
reconstituting the inode).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
After cd3cdb1ef7 ("Single err message for btree node reads"),
all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what
the actual error type was if the recovery pass was scanning for btree
nodes. This lead to the code continuing despite things like bad node
formats when they earlier would have caused a jump to fsck_err, because
btree_err only jumps when the return from __btree_err does not match
fsck_fix. Ultimately this lead to undefined behavior by attempting to
unpack a key based on an invalid format.
Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause
__btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes.
Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com
Fixes: cd3cdb1ef7 ("bcachefs: Single err message for btree node reads")
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a mount option for rewinding the journal, bringing the entire
filesystem to where it was at a previous point in time.
This is for extreme disaster recovery scenarios - it's not intended as
an undelete operation.
The option takes a journal sequence number; the desired sequence number
can be determined with 'bcachefs list_journal'
Caveats:
- The 'journal_transaction_names' option must have been enabled (it's on
by default). The option controls emitting of extra debug info in the
journal, so we can see what individual transactions were doing;
It also enables journalling of keys being overwritten, which is what
we rely on here.
- A full fsck run will be automatically triggered since alloc info will
be inconsistent. Only leaf node updates to non-alloc btrees are
rewound, since rewinding interior btree updates isn't possible or
desirable.
- We can't do anything about data that was deleted and overwritten.
Lots of metadata updates after the point in time we're rewinding to
shouldn't cause a problem, since we segragate data and metadata
allocations (this is in order to make repair by btree node scan
practical on larger filesystems; there's a small 64-bit per device
bitmap in the superblock of device ranges with btree nodes, and we try
to keep this small).
However, having discards enabled will cause problems, since buckets
are discarded as soon as they become empty (this is why we don't
implement fstrim: we don't need it).
Hopefully, this feature will be a one-off thing that's never used
again: this was implemented for recovering from the "vfs i_nlink 0 ->
subvol deletion" bug, and that bug was unusually disastrous and
additional safeguards have since been implemented.
But if it does turn out that we need this more in the future, I'll
have to implement an option so that empty buckets aren't discarded
immediately - lagging by perhaps 1% of device capacity.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
pldmfw calls crc32 code and depends on it being enabled, else
there is a link error as follows. So PLDMFW should select CRC32.
lib/pldmfw/pldmfw.o: In function `pldmfw_flash_image':
pldmfw.c:(.text+0x70f): undefined reference to `crc32_le_base'
This problem was introduced by commit b8265621f4 ("Add pldmfw library
for PLDM firmware update").
It manifests as of commit d69ea414c9 ("ice: implement device flash
update via devlink").
And is more likely to occur as of commit 9ad19171b6 ("lib/crc: remove
unnecessary prompt for CONFIG_CRC32 and drop 'default y'").
Found by chance while exercising builds based on tinyconfig.
Fixes: b8265621f4 ("Add pldmfw library for PLDM firmware update")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250613-pldmfw-crc32-v1-1-f3fad109eee6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Fixes: 328ea68874 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
Pull x86 fixes from Dave Hansen:
"This is a pretty scattered set of fixes. The majority of them are
further fixups around the recent ITS mitigations.
The rest don't really have a coherent story:
- Some flavors of Xen PV guests don't support large pages, but the
set_memory.c code assumes all CPUs support them.
Avoid problems with a quick CPU feature check.
- The TDX code has some wrappers to help retry calls to the TDX
module. They use function pointers to assembly functions and the
compiler usually generates direct CALLs. But some new compilers,
plus -Os turned them in to indirect CALLs and the assembly code was
not annotated for indirect calls.
Force inlining of the helper to fix it up.
- Last, a FRED issue showed up when single-stepping. It's fine when
using an external debugger, but was getting stuck returning from a
SIGTRAP handler otherwise.
Clear the FRED 'swevent' bit to ensure that forward progress is
made"
* tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "mm/execmem: Unify early execmem_cache behaviour"
x86/its: explicitly manage permissions for ITS pages
x86/its: move its_pages array to struct mod_arch_specific
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/mm/pat: don't collapse pages without PSE set
x86/virt/tdx: Avoid indirect calls to TDX assembly functions
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
The function core_scsi3_decode_spec_i_port(), in its error code path,
unconditionally calls core_scsi3_lunacl_undepend_item() passing the
dest_se_deve pointer, which may be NULL.
This can lead to a NULL pointer dereference if dest_se_deve remains
unset.
SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg
Unable to handle kernel paging request at virtual address dfff800000000012
Call trace:
core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P)
core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod]
core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod]
target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod]
Fix this by adding a NULL check before calling
core_scsi3_lunacl_undepend_item()
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To pick up the changes in:
243c90e917 ("build_bug.h: more user friendly error messages in BUILD_BUG_ON_ZERO()")
This also needed to pick the __BUILD_BUG_ON_ZERO_MSG() in
linux/compiler.h, that needed to be polished to avoid hitting old clang
problems with _Static_assert on arrays of structs:
Debian clang version 11.0.1-2~deb10u1
Debian clang version 11.0.1-2~deb10u1
$ make NO_LIBTRACEEVENT=1 ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf CC=clang
<SNIP>
btf_dump.c:895:18: error: type name does not allow storage class to be specified
for (i = 0; i < ARRAY_SIZE(pads); i++) {
^
/git/perf-6.16.0-rc1/tools/include/linux/kernel.h:91:59: note: expanded from macro 'ARRAY_SIZE'
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
^
/git/perf-6.16.0-rc1/tools/include/linux/compiler-gcc.h:26:28: note: expanded from macro '__must_be_array'
#define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
^
/git/perf-6.16.0-rc1/tools/include/linux/build_bug.h:17:2: note: expanded from macro 'BUILD_BUG_ON_ZERO'
__BUILD_BUG_ON_ZERO_MSG(e, ##__VA_ARGS__, #e " is true")
^
/git/perf-6.16.0-rc1/tools/include/linux/compiler.h:248:67: note: expanded from macro '__BUILD_BUG_ON_ZERO_MSG'
#define __BUILD_BUG_ON_ZERO_MSG(e, msg, ...) ((int)sizeof(struct {_Static_assert(!(e), msg);}))
^
/usr/include/x86_64-linux-gnu/sys/cdefs.h:438:5: note: expanded from macro '_Static_assert'
extern int (*__Static_assert_function (void)) \
^
These also failed:
toolsbuilder@five:~$ grep FAIL dm.log/summary | grep clang
1 72.87 almalinux:8 : FAIL clang version 19.1.7 ( 19.1.7-2.module_el8.10.0+3990+33d0d926)
15 73.39 centos:stream : FAIL clang version 17.0.6 (Red Hat 17.0.6-1.module_el8+767+9fa966b8)
36 87.14 opensuse:15.4 : FAIL clang version 15.0.7
37 80.08 opensuse:15.5 : FAIL clang version 15.0.7
40 72.12 oraclelinux:8 : FAIL clang version 16.0.6 (Red Hat 16.0.6-2.0.1.module+el8.9.0+90129+d3ee8717)
42 74.12 rockylinux:8 : FAIL clang version 16.0.6 (Red Hat 16.0.6-2.module+el8.9.0+1651+e10a8f6d)
toolsbuilder@five:~$
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/aEszb7SSIJB6Lp6f@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
827547bc3a ("KVM: SVM: Add architectural definitions/assets for Bus Lock Threshold")
That triggers:
CC /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
LD /tmp/build/perf-tools/arch/x86/util/perf-util-in.o
LD /tmp/build/perf-tools/arch/x86/perf-util-in.o
LD /tmp/build/perf-tools/arch/perf-util-in.o
LD /tmp/build/perf-tools/perf-util-in.o
AR /tmp/build/perf-tools/libperf-util.a
LINK /tmp/build/perf-tools/perf
The SVM_EXIT_BUS_LOCK exit reason was added to SVM_EXIT_REASONS, used in
kvm-stat.c.
This addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/aErcjuTTCVEZ-8Nb@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>