Commit Graph

1367546 Commits

Author SHA1 Message Date
Sean Christopherson
fa079a0616 irqbypass: Drop pointless and misleading THIS_MODULE get/put
Drop irqbypass.ko's superfluous and misleading get/put calls on
THIS_MODULE.  A module taking a reference to itself is useless; no amount
of checks will prevent doom and destruction if the caller hasn't already
guaranteed the liveliness of the module (this goes for any module).  E.g.
if try_module_get() fails because irqbypass.ko is being unloaded, then the
kernel has already hit a use-after-free by virtue of executing code whose
lifecycle is tied to irqbypass.ko.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20250516230734.2564775-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20 13:52:36 -07:00
Sean Christopherson
cd4178d194 KVM: arm64: WARN if unmapping a vLPI fails in any path
When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
failure occurs when freeing an ITE.  If undoing vCPU affinity fails, then
odds are very good that vLPI state tracking has has gotten out of whack,
i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI.  At best,
inconsistent state means there is a lurking bug/flaw somewhere.  At worst,
the inconsistency could eventually be fatal to the host, e.g. if an ITS
command fails because KVM's view of things doesn't match reality/hardware.

Note, only the call from kvm_arch_irq_bypass_del_producer() by way of
kvm_vgic_v4_unset_forwarding() doesn't already WARN.  Common KVM's
kvm_irq_routing_update() WARNs if kvm_arch_update_irqfd_routing() fails.
For that path, if its_unmap_vlpi() fails in kvm_vgic_v4_unset_forwarding(),
the only possible causes are that the GIC doesn't have a v4 ITS (from
its_irq_set_vcpu_affinity()):

        /* Need a v4 ITS */
        if (!is_v4(its_dev->its))
                return -EINVAL;

        guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);

        /* Unmap request? */
        if (!info)
                return its_vlpi_unmap(d);

or that KVM has gotten out of sync with the GIC/ITS (from its_vlpi_unmap()):

        if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))
                return -EINVAL;

All of the above failure scenarios are warnable offences, as they should
never occur absent a kernel/KVM bug.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/all/aFWY2LTVIxz5rfhh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20 13:52:29 -07:00
Paolo Bonzini
28224ef02b KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
Allow userspace to advertise TDG.VP.VMCALL subfunctions that the
kernel also supports.  For each output register of GetTdVmCallInfo's
leaf 1, add two fields to KVM_TDX_CAPABILITIES: one for kernel-supported
TDVMCALLs (userspace can set those blindly) and one for user-supported
TDVMCALLs (userspace can set those if it knows how to handle them).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 14:20:20 -04:00
Paolo Bonzini
4580dbef5c KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 14:09:50 -04:00
Binbin Wu
25e8b1dd48 KVM: TDX: Exit to userspace for GetTdVmCallInfo
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX,
to allow userspace to provide information about the support of
TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API.

GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>,
<ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>,
<Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and
<Instruction.WRMSR>. They must be supported by VMM to support TDX guests.

For GetTdVmCallInfo
- When leaf (r12) to enumerate TDVMCALL functionality is set to 0,
  successful execution indicates all GHCI base TDVMCALLs listed above are
  supported.

  Update the KVM TDX document with the set of the GHCI base APIs.

- When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it
  indicates the TDX guest is querying the supported TDVMCALLs beyond
  the GHCI base TDVMCALLs.
  Exit to userspace to let userspace set the TDVMCALL sub-function bit(s)
  accordingly to the leaf outputs.  KVM could set the TDVMCALL bit(s)
  supported by itself when the TDVMCALLs don't need support from userspace
  after returning from userspace and before entering guest. Currently, no
  such TDVMCALLs implemented, KVM just sets the values returned from
  userspace.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 13:55:47 -04:00
Binbin Wu
cf207eac06 KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
Handle TDVMCALL for GetQuote to generate a TD-Quote.

GetQuote is a doorbell-like interface used by TDX guests to request VMM
to generate a TD-Quote signed by a service hosting TD-Quoting Enclave
operating on the host.  A TDX guest passes a TD Report (TDREPORT_STRUCT) in
a shared-memory area as parameter.  Host VMM can access it and queue the
operation for a service hosting TD-Quoting enclave.  When completed, the
Quote is returned via the same shared-memory area.

KVM only checks the GPA from the TDX guest has the shared-bit set and drops
the shared-bit before exiting to userspace to avoid bleeding the shared-bit
into KVM's exit ABI.  KVM forwards the request to userspace VMM (e.g. QEMU)
and userspace VMM queues the operation asynchronously.  KVM sets the return
code according to the 'ret' field set by userspace to notify the TDX guest
whether the request has been queued successfully or not.  When the request
has been queued successfully, the TDX guest can poll the status field in
the shared-memory area to check whether the Quote generation is completed
or not.  When completed, the generated Quote is returned via the same
buffer.

Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is
required to handle the KVM exit reason as the initial support for TDX,
by reentering KVM to ensure that the TDVMCALL is complete.  While at it,
add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 13:09:32 -04:00
Binbin Wu
b5aafcb4ef KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and
return it for unimplemented TDVMCALL subfunctions.

Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is not
implemented is vague because TDX guests can't tell the error is due to
the subfunction is not supported or an invalid input of the subfunction.
New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid the
ambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND.

Before the change, for common guest implementations, when a TDX guest
receives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases:
1. Some operand is invalid. It could change the operand to another value
   retry.
2. The subfunction is not supported.

For case 1, an invalid operand usually means the guest implementation bug.
Since the TDX guest can't tell which case is, the best practice for
handling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf,
treating the failure as fatal if the TDVMCALL is essential or ignoring
it if the TDVMCALL is optional.

With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent to
old TDX guest that do not know about it, but it is expected that the
guest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND.
Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERAND
specifically; for example Linux just checks for success.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-20 13:09:31 -04:00
Paolo Bonzini
2f3fc29ae8 KVM/riscv fixes for 6.16, take #1
- Fix the size parameter check in SBI SFENCE calls
 - Don't treat SBI HFENCE calls as NOPs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmhVTUsACgkQrUjsVaLH
 LAc3MQ/9EoP9phFp7jbmW4zmnssmDP+ADiV1fry81122bCMTC+3mtlkuQNHQ/S29
 8rpUgln+nzVAGmnmTVecRirVqFHSQkiHDyPJU6lqk4VmuJhrsTzQn3lqpfNntKXn
 li+Wzh9jCSlJCqsZwAzzxchowKmBnfiPV8i/f4uFeLDQ1SQ6e+n/kFGoghjfZdiT
 m8PW1iHV5LFRiW+vlVi9dT9Xso3ecyf8QqhEtN2+XO+nZrGu3LM6QVTkcbh32yRK
 bfECy9QpDosqOpDbFeksejzDkrxtU1AHvrKwhVMBmNr/RMzDLYtUzBCJ28gKCnfM
 nMoegVbKxIsnMwFr7CfastA3kJIyeeyWcrk69dcq8XbbXVMRUvngpj4lU5JChayV
 QjHTJ9VI9ahPS7EWUaSwr8IlezLp5mvrUlGAaOISsE/sv9fPSjYc3/rUAIQd3WvB
 w1kXOzTpNXzHlVYX+5z1nSoZ8WbAbDFm1QQ3/Y4PubDWtRANXIlNQHh63jgQdPqL
 knBFXwVrrp2ylTo0G0gD98YNBmdwqd/utO1NyvsFgLOmOoJiCBSKhlBhkl1tr+BR
 OtOPSItebUTvr3clrYaN19NOmG2OjDjvu8JDk1O3KjmEjeX2y+0XDMr3a3R9zO7G
 whOTPmBJAnV3ZRr8t0HT2oLpJm9rECOBdgrGmzU1MvcagiIcr3A=
 =/0O/
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-fixes-6.16-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 6.16, take #1

- Fix the size parameter check in SBI SFENCE calls
- Don't treat SBI HFENCE calls as NOPs
2025-06-20 13:07:24 -04:00
Paolo Bonzini
a09e359d43 KVM/arm64 fixes for 6.16, take #3
- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
   missing synchronisation
 
 - A small fix for the irqbypass hook fixes, tightening the check and
   ensuring that we only deal with MSI for both the old and the new
   route entry
 
 - Rework the way the shadow LRs are addressed in a nesting
   configuration, plugging an embarrassing bug as well as simplifying
   the whole process
 
 - Add yet another fix for the dreaded arch_timer_edge_cases selftest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmhUCB0ACgkQI9DQutE9
 ekNlyA//fvzrp/Z/wVwTUGU+Zba91t6GCA4JAOtpZSY+SQTuFb8aNw00//n65Iy1
 IMQUhRxBlPOXKTYyEXFnZ8hcO+7mmIUEyhSXsYpMxhmSHys9uE89X2oJvg6f+8EK
 ofsAGb89OBJu/PEmVZxK4NyVyw1vdnriDv0YkujIRN69zO1PF9bB6Kr9VkJ4fd4e
 lwdbbGtxkQJ5ls5O7SDcOFQfej0zEg4q8Go3/ynBZUYOfkATR/Vqhy31rGa0XUnK
 wium+39JxfMUzwP5hi90QmlRUpbCNo90lF7Nm6ZbgJ0wjBPki9ekNBvFitFBblt2
 YWcv4jmFU+W23J0AGaImCsn20JA1577+Rcd6/wsE4olUpjpIOdJut15kIMyDY+Br
 B+kfd80GsJHSoD98VqZTYH/U4FAqkvZ2ATZnRoZosnRoZHi3DY7RX83CBJPZbo9D
 P5K41dbEVrTzF6JNj9jvOMfN1ttHu4sNOTO2XHe6+dEf63MhENpbOz2/ILz2HU5w
 DgRDj6GfTnevTFqwPbxeX9SsFSfTzqJCHH8kMhvCIUXCnsP8tEv+u7vT3tREWTUM
 2NRAL0nPydvh5Um+/q39BXIV4RpFwqgV7pnuGaHZDvkQcSc0lWrgsRPVxSO8NKMY
 +mmxRl413AapLids1EKMmHlxo4bcVnMCc0U8W5T/g4iEkc6GYEA=
 =Ztxh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.16, take #3

- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
  missing synchronisation

- A small fix for the irqbypass hook fixes, tightening the check and
  ensuring that we only deal with MSI for both the old and the new
  route entry

- Rework the way the shadow LRs are addressed in a nesting
  configuration, plugging an embarrassing bug as well as simplifying
  the whole process

- Add yet another fix for the dreaded arch_timer_edge_cases selftest
2025-06-20 13:07:10 -04:00
Mark Rutland
04c5355b2a KVM: arm64: VHE: Centralize ISBs when returning to host
The VHE hyp code has recently gained a few ISBs. Simplify this to one
unconditional ISB in __kvm_vcpu_run_vhe(), and remove the unnecessary
ISB from the kvm_call_hyp_ret() macro.

While kvm_call_hyp_ret() is also used to invoke
__vgic_v3_get_gic_config(), but no ISB is necessary in that case either.

For the moment, an ISB is left in kvm_call_hyp(), as there are many more
users, and removing the ISB would require a more thorough audit.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-8-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:34:59 +01:00
Mark Rutland
3a300a33e4 KVM: arm64: Remove cpacr_clear_set()
We no longer use cpacr_clear_set().

Remove cpacr_clear_set() and its helper functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-7-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:20 +01:00
Mark Rutland
186b58bacd KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd()
The hyp code FPSIMD/SVE/SME trap handling logic has some rather messy
open-coded manipulation of CPTR/CPACR. This is benign for non-nested
guests, but broken for nested guests, as the guest hypervisor's CPTR
configuration is not taken into account.

Consider the case where L0 provides FPSIMD+SVE to an L1 guest
hypervisor, and the L1 guest hypervisor only provides FPSIMD to an L2
guest (with L1 configuring CPTR/CPACR to trap SVE usage from L2). If the
L2 guest triggers an FPSIMD trap to the L0 hypervisor,
kvm_hyp_handle_fpsimd() will see that the vCPU supports FPSIMD+SVE, and
will configure CPTR/CPACR to NOT trap FPSIMD+SVE before returning to the
L2 guest. Consequently the L2 guest would be able to manipulate SVE
state even though the L1 hypervisor had configured CPTR/CPACR to forbid
this.

Clean this up, and fix the nested virt issue by always using
__deactivate_cptr_traps() and __activate_cptr_traps() to manage the CPTR
traps. This removes the need for the ad-hoc fixup in
kvm_hyp_save_fpsimd_host(), and ensures that any guest hypervisor
configuration of CPTR/CPACR is taken into account.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-6-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:20 +01:00
Mark Rutland
59e6e101a6 KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()
There's no need for fpsimd_sve_sync() to write to CPTR/CPACR. All
relevant traps are always disabled earlier within __kvm_vcpu_run(), when
__deactivate_cptr_traps() configures CPTR/CPACR.

With irrelevant details elided, the flow is:

handle___kvm_vcpu_run(...)
{
	flush_hyp_vcpu(...) {
		fpsimd_sve_flush(...);
	}

	__kvm_vcpu_run(...) {
		__activate_traps(...) {
			__activate_cptr_traps(...);
		}

		do {
			__guest_enter(...);
		} while (...);

		__deactivate_traps(....) {
			__deactivate_cptr_traps(...);
		}
	}

	sync_hyp_vcpu(...) {
		fpsimd_sve_sync(...);
	}
}

Remove the unnecessary write to CPTR/CPACR. An ISB is still necessary,
so a comment is added to describe this requirement.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-5-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:20 +01:00
Mark Rutland
e62dd50784 KVM: arm64: Reorganise CPTR trap manipulation
The NVHE/HVHE and VHE modes have separate implementations of
__activate_cptr_traps() and __deactivate_cptr_traps() in their
respective switch.c files. There's some duplication of logic, and it's
not currently possible to reuse this logic elsewhere.

Move the logic into the common switch.h header so that it can be reused,
and de-duplicate the common logic.

This rework changes the way SVE traps are deactivated in VHE mode,
aligning it with NVHE/HVHE modes:

* Before this patch, VHE's __deactivate_cptr_traps() would
  unconditionally enable SVE for host EL2 (but not EL0), regardless of
  whether the ARM64_SVE cpucap was set.

* After this patch, VHE's __deactivate_cptr_traps() will take the
  ARM64_SVE cpucap into account. When ARM64_SVE is not set, SVE will be
  trapped from EL2 and below.

The old and new behaviour are both benign:

* When ARM64_SVE is not set, the host will not touch SVE state, and will
  not reconfigure SVE traps. Host EL0 access to SVE will be trapped as
  expected.

* When ARM64_SVE is set, the host will configure EL0 SVE traps before
  returning to EL0 as part of reloading the EL0 FPSIMD/SVE/SME state.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-4-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:19 +01:00
Mark Rutland
257d0aa8e2 KVM: arm64: VHE: Synchronize CPTR trap deactivation
Currently there is no ISB between __deactivate_cptr_traps() disabling
traps that affect EL2 and fpsimd_lazy_switch_to_host() manipulating
registers potentially affected by CPTR traps.

When NV is not in use, this is safe because the relevant registers are
only accessed when guest_owns_fp_regs() && vcpu_has_sve(vcpu), and this
also implies that SVE traps affecting EL2 have been deactivated prior to
__guest_entry().

When NV is in use, a guest hypervisor may have configured SVE traps for
a nested context, and so it is necessary to have an ISB between
__deactivate_cptr_traps() and fpsimd_lazy_switch_to_host().

Due to the current lack of an ISB, when a guest hypervisor enables SVE
traps in CPTR, the host can take an unexpected SVE trap from within
fpsimd_lazy_switch_to_host(), e.g.

| Unhandled 64-bit el1h sync exception on CPU1, ESR 0x0000000066000000 -- SVE
| CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT
| Hardware name: FVP Base RevC (DT)
| pstate: 604023c9 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __kvm_vcpu_run+0x6f4/0x844
| lr : __kvm_vcpu_run+0x150/0x844
| sp : ffff800083903a60
| x29: ffff800083903a90 x28: ffff000801f4a300 x27: 0000000000000000
| x26: 0000000000000000 x25: ffff000801f90000 x24: ffff000801f900f0
| x23: ffff800081ff7720 x22: 0002433c807d623f x21: ffff000801f90000
| x20: ffff00087f730730 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
| x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
| x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff000801f90d70
| x5 : 0000000000001000 x4 : ffff8007fd739000 x3 : ffff000801f90000
| x2 : 0000000000000000 x1 : 00000000000003cc x0 : ffff800082f9d000
| Kernel panic - not syncing: Unhandled exception
| CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT
| Hardware name: FVP Base RevC (DT)
| Call trace:
|  show_stack+0x18/0x24 (C)
|  dump_stack_lvl+0x60/0x80
|  dump_stack+0x18/0x24
|  panic+0x168/0x360
|  __panic_unhandled+0x68/0x74
|  el1h_64_irq_handler+0x0/0x24
|  el1h_64_sync+0x6c/0x70
|  __kvm_vcpu_run+0x6f4/0x844 (P)
|  kvm_arm_vcpu_enter_exit+0x64/0xa0
|  kvm_arch_vcpu_ioctl_run+0x21c/0x870
|  kvm_vcpu_ioctl+0x1a8/0x9d0
|  __arm64_sys_ioctl+0xb4/0xf4
|  invoke_syscall+0x48/0x104
|  el0_svc_common.constprop.0+0x40/0xe0
|  do_el0_svc+0x1c/0x28
|  el0_svc+0x30/0xcc
|  el0t_64_sync_handler+0x10c/0x138
|  el0t_64_sync+0x198/0x19c
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0000,000002c0,02df4fb9,97ee773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---

Fix this by adding an ISB between __deactivate_traps() and
fpsimd_lazy_switch_to_host().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250617133718.4014181-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:19 +01:00
Mark Rutland
cade3d57e4 KVM: arm64: VHE: Synchronize restore of host debug registers
When KVM runs in non-protected VHE mode, there's no context
synchronization event between __debug_switch_to_host() restoring the
host debug registers and __kvm_vcpu_run() unmasking debug exceptions.
Due to this, it's theoretically possible for the host to take an
unexpected debug exception due to the stale guest configuration.

This cannot happen in NVHE/HVHE mode as debug exceptions are masked in
the hyp code, and the exception return to the host will provide the
necessary context synchronization before debug exceptions can be taken.

For now, avoid the problem by adding an ISB after VHE hyp code restores
the host debug registers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250617133718.4014181-2-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 13:06:19 +01:00
Zenghui Yu
56a1498450 KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases
Close the GIC FD to free the reference it holds to the VM so that we can
correctly clean up the VM. This also gets rid of the

	"KVM: debugfs: duplicate directory 395722-4"

warning when running arch_timer_edge_cases.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250608095402.1131-1-yuzenghui@huawei.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 09:58:21 +01:00
Sean Christopherson
1fbe6861a6 KVM: arm64: Explicitly treat routing entry type changes as changes
Explicitly treat type differences as GSI routing changes, as comparing MSI
data between two entries could get a false negative, e.g. if userspace
changed the type but left the type-specific data as-

Note, the same bug was fixed in x86 by commit bcda70c56f3e ("KVM: x86:
Explicitly treat routing entry type changes as changes").

Fixes: 4bf3693d36af ("KVM: arm64: Unmap vLPIs affected by changes to GSI routing information")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250611224604.313496-3-seanjc@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19 09:58:21 +01:00
Marc Zyngier
8a8ff069c7 KVM: arm64: nv: Fix tracking of shadow list registers
Wei-Lin reports that the tracking of shadow list registers is
majorly broken when resync'ing the L2 state after a run, as
we confuse the guest's LR index with the host's, potentially
losing the interrupt state.

While this could be fixed by adding yet another side index to
track it (Wei-Lin's fix), it may be better to refactor this
code to avoid having a side index altogether, limiting the
risk to introduce this class of bugs.

A key observation is that the shadow index is always the number
of bits in the lr_map bitmap. With that, the parallel indexing
scheme can be completely dropped.

While doing this, introduce a couple of helpers that abstract
the index conversion and some of the LR repainting, making the
whole exercise much simpler.

Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Reviewed-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250614145721.2504524-1-r09922117@csie.ntu.edu.tw
Link: https://lore.kernel.org/r/86qzzkc5xa.wl-maz@kernel.org
2025-06-19 09:58:20 +01:00
Anup Patel
2e7be16299 RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs
The SBI specification clearly states that SBI HFENCE calls should
return SBI_ERR_NOT_SUPPORTED when one of the target hart doesn’t
support hypervisor extension (aka nested virtualization in-case
of KVM RISC-V).

Fixes: c7fa3c48de86 ("RISC-V: KVM: Treat SBI HFENCE calls as NOPs")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250605061458.196003-3-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-06-17 10:18:40 +05:30
Anup Patel
6aba0cb5bb RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls
As-per the SBI specification, an SBI remote fence operation applies
to the entire address space if either:
1) start_addr and size are both 0
2) size is equal to 2^XLEN-1

>From the above, only #1 is checked by SBI SFENCE calls so fix the
size parameter check in SBI SFENCE calls to cover #2 as well.

Fixes: 13acfec2dbcc ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250605061458.196003-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-06-17 10:18:40 +05:30
Linus Torvalds
e04c78d86a Linux 6.16-rc2 2025-06-15 13:49:41 -07:00
Linus Torvalds
08215f5486 Kbuild fixes for v6.16
- Move warnings about linux/export.h from W=1 to W=2
 
  - Fix structure type overrides in gendwarfksyms
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmhO6/MVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGoXMP/iVDFmVDE3qlMMmZQh+UBnfrkQ4j
 rhHMlCqUDcUWV26CgR5u7lRDIKK9sIIZq9a0EqZjabspQAMz5C/cxbICGu7r9Iji
 DLKJJ8g8nv3qORsDXBKcj5Ijz5QzyBoOJ5FtoS/RUQjMcl6+XeFFWcJSVF9WCM+S
 YuKP2/ILM9L77/2bLMM+3PLgp1FMWgiHql1QlQAHdv0ZiteGmdbwfbCdnfrx6TJc
 JTICUfrn9Me/nlJ2HAKJ+ExHEl0qQwNOGKDd2wwOcgMkCRCfK6bMjh0B2A1Elgpm
 BaDZVIHpjW6nH4Y14eC9b+Hj7aFSCfPxnmqcHNMM0965KuZhBnvJAXUooy2k57Eh
 TND8jyOSzX1zzHrxDZiFsmiPWmbv3KqSQomBy+Lt5Y3pf4ittdZOCCgLph2rHPtm
 xP7fWxxZ2i10KzZCHT5y/FevoZqozD4U4i+CwgSEAolPi7jmyrdgDfBIq/eM+EwG
 xFFbnJ6GDcNqrVGq8DMGZZcQzOFhFebibkh7d9QbW30FKg1vsUU7oQfKSYBwHi7i
 J5tOANXiL5AfDLgWQC7ugKSqAOBKxrYikguPZiY8cJsxYsphc+FFLm2N+O1x+ruV
 JOwT1qnF+KjCeJ0Ew969naEQ5lnFXgz6uyNGTIiI8bKhofHe0PgswAf3TnUmuGYP
 gl5dYNZzdX9Pdn5o
 =iO3g
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Move warnings about linux/export.h from W=1 to W=2

 - Fix structure type overrides in gendwarfksyms

* tag 'kbuild-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  gendwarfksyms: Fix structure type overrides
  kbuild: move warnings about linux/export.h from W=1 to W=2
2025-06-15 09:14:27 -07:00
Sami Tolvanen
2f6b47b295 gendwarfksyms: Fix structure type overrides
As we always iterate through the entire die_map when expanding
type strings, recursively processing referenced types in
type_expand_child() is not actually necessary. Furthermore,
the type_string kABI rule added in commit c9083467f7b9
("gendwarfksyms: Add a kABI rule to override type strings") can
fail to override type strings for structures due to a missing
kabi_get_type_string() check in this function.

Fix the issue by dropping the unnecessary recursion and moving
the override check to type_expand(). Note that symbol versions
are otherwise unchanged with this patch.

Fixes: c9083467f7b9 ("gendwarfksyms: Add a kABI rule to override type strings")
Reported-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-06-16 00:49:48 +09:00
Masahiro Yamada
a6a7946bd6 kbuild: move warnings about linux/export.h from W=1 to W=2
This hides excessive warnings, as nobody builds with W=2.

Fixes: a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1")
Fixes: 7d95680d64ac ("scripts/misc-check: check unnecessary #include <linux/export.h> when W=1")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
2025-06-16 00:41:40 +09:00
Linus Torvalds
8c6bc74c7f three smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmhNnYsACgkQiiy9cAdy
 T1HNMwv+N3G4LGJljr25eIk6ycFA5HLH/XCCo/BjUiue1MBPIDqckpEmnM1e583+
 cDnW/h01R8TLpM5yYppZSymvEoVfErAOsGlCEb+a1CSwbnK4RyGQfowNgJYMUsvA
 pJDFjqwfZjXoPMz/KO4WKBkrOqXgo41aAVwCm4b4N/KNBz06G52IX1PD9P8gaech
 LLts/3JxX6oUU/w06fCGqqPwtf3qa/EIOBD9qaAQ3SPLXCLdpdd6szuLcux5yWNf
 ncXB//fZZFpT4Ylb46bexENHV4Q664KdI6f11+IcaR1kgvEhSrgwpXb/c4G3wmk4
 wl0+Qn7kYdjcPCLtAkmo2LzcaDzLN3I9I5zYl+y2r7MiUixPzN27oYr6bNE4M+Z8
 FIaQ3VOtpY7jHnvILjtjTP2xKF/Oo1KgeqHS+yBTkp7TvUPY8AWG/em+mzb+ZShv
 9zCTjdt8k6taIh1uaZIDLd5tSQAE2wN20XjBbV4CHeM6z6GoFneXNCeZAf84+xRe
 pSLQpof1
 =QF6Q
 -----END PGP SIGNATURE-----

Merge tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - SMB3.1.1 POSIX extensions fix for char remapping

 - Fix for repeated directory listings when directory leases enabled

 - deferred close handle reuse fix

* tag 'v6.16-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: improve directory cache reuse for readdir operations
  smb: client: fix perf regression with deferred closes
  smb: client: disable path remapping with POSIX extensions
2025-06-14 10:13:32 -07:00
Linus Torvalds
ac91b4de44 IOMMU Fix for Linux v6.16-rc1:
Including:
 
 	- NVidia Tegra: Fix PTE size calculation.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmhNgMwACgkQK/BELZcB
 GuOcaA//dpoLQ71jVJDI2pFS3AwNA1MUK8kNH0xsEIIpNv5L1F/dtAdeonaX/jzT
 SWtxiRfkcbT3a+gfQ5Mfa6QeVEfpdzS86mjt3v8ncJruDjlrkJdGngx7EeIa4BDU
 ApNuVLcHUCGRS4b64lSjKR71DP+23lbbtkDL4Ec9UGflBQ+d8Xj/El3cCaGTR9rh
 PcDwKBC+C5LHjlY3dr12oAfN52jB129J60BELu+tbvTIz5HB+UK8v8j6I2F4g6eh
 Qwmb/aETjD9ELtyBR4Lc5olkXCaLuQL5B7Dx0PvRxLeX6I6AUEH/IhVOlUR1QguU
 J3FFiet9MqvJmMg9SxTfPKcQJnvAL3Qp5fve61hhXKCZqXhgGoTUOpKRXDn9ThYL
 flRu4X5TvxGFy+7KDptDodAdwZIskWPKAetYcFKmQFDvS8RJHIYLlSSChJnqv/wh
 9cMT9zZbgH8g2PRGB0GagCHT/YyuW0rrO/ZDTvpuOaSP9jTGQyF8M0VWuirKqspq
 ZOFl4XI4Ym7KM+e1O3T0wxvW1/kafFzYaydrPcb/VdqmNU5VRnNudzxR9aSlaO9i
 GPdtjiXw6vNQK4+0t1qN2f66bnnEAwQHZRsteHHet77wjREl3Nbxior2GtBTofAa
 yisAy6grpPpOfSMfnOpLAqI7Vi2nxy8rukvvK7VBk9Cu0aFZoF8=
 =Xz0y
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fix from Joerg Roedel:

 - Fix PTE size calculation for NVidia Tegra

* tag 'iommu-fixes-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/tegra: Fix incorrect size calculation
2025-06-14 10:01:47 -07:00
Linus Torvalds
f713ffa363 block-6.16-20250614
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhNaUIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvUOD/0WlwBN8WxAA+rzUXo42QJ3W+XruQ+VhQdx
 Hs/DBEH6KZji86ZVzoJOIwsdlSL2/6PxRIZVqwr3Q8aYnNedUnsjcD4frNBl76EA
 wFfPttjL7DcaOvVhY0n37IrQNmaeQC1R1O2JxhiWTBzNoNf2iWj84vSSgbgcfDVR
 trfhRvEwRgmAy037/72pUFYN+JRlv80D03SGfWTQtp6/qq+AA/z5XqWdg9I/opVM
 7+H5GoWHfPSG0wQo+Dms3mHV4zm5tOOfMmGIR2o4DoueKgMNgnUXRT8dc7DDBsqV
 0moKRHKbTbeN1fz3zqcko0Mp1gq+62hF/eXppQSeJMpMuAbcxaA+/ZFv7Ho9ZwYF
 jJwcp0O5e8XbRHFqrYWysKGKSvYfvTjr08X70+QFzm9ZJaGCtJYd2ceUNmyO2p6s
 m54gUnPq5d3nABbpCkAdP5sAv0yVV5idIoezCHIaBYQv8qPpKDrdHHXTQY/VX05x
 VBGmg9hUZSDMiGkR1d4oKTBayehuWVIpyczhy65KbAfoBA62hAl+aAldkpvLRo1r
 gKsrMSGP/H6zBU/IRaMGc/bnEnP6zFkn5vxnGwpDcD2tdJn0g+yEjIvJSXrmGJ0w
 lwzqYd3/vhFPmaEDxE3PyOOGBVCOPqGic+Y6OEIuHA3p2HFO3bsh6+64+iqls/so
 EmiHPp7n5g==
 =N1zM
 -----END PGP SIGNATURE-----

Merge tag 'block-6.16-20250614' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Fix for a deadlock on queue freeze with zoned writes

 - Fix for zoned append emulation

 - Two bio folio fixes, for sparsemem and for very large folios

 - Fix for a performance regression introduced in 6.13 when plug
   insertion was changed

 - Fix for NVMe passthrough handling for polled IO

 - Document the ublk auto registration feature

 - loop lockdep warning fix

* tag 'block-6.16-20250614' of git://git.kernel.dk/linux:
  nvme: always punt polled uring_cmd end_io work to task_work
  Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
  block: Fix bvec_set_folio() for very large folios
  bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAP
  block: use plug request list tail for one-shot backmerge attempt
  block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work
  block: Clear BIO_EMULATES_ZONE_APPEND flag on BIO completion
  ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)
  loop: move lo_set_size() out of queue freeze
2025-06-14 09:25:22 -07:00
Linus Torvalds
6d13760ea3 io_uring-6.16-20250614
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhNaVcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpt2QEACK7lbIuv6R3AIAeYrBliLzYUC8kIZ8DGbT
 4fKN343AqnuCCFrBR7ew4tHaKmQ3KBaMpoO4FkoV++//vWqJfs3cUOIkntb82UY2
 tolnc743QzZFBOmjMP8XhtJ2o/KmQYYCMjteLcVCnqT3IoUfJ2cd0lf37Rd4x1BK
 XJacW231lCgmeBa/s336MEm6HphiokmGISrji0bGBiQqQYmWHiQ/0FnrWAlEpZD8
 mGEA75u4L3JFbJvetfsgvN+ifF+l/l9F92S689gkPkUDaq3b81sUzk0+cBpFGva+
 tIWGM3PzRoDwM+MuUg/R9mCFWP3LIbZCB6Um2I8Ek7AGgPND/3ocQn4RLzR9gTez
 /Q/Z7WBL/xrWkOy5fNgfy7pDqkBgHxmPztlXMUWnd29d2i50Hh6lP+eyg+wNhqVG
 GAO1f4Oholr/KNI5rJzEX0hrC3X9wI551ryCdftTXZqJKKlaEChWBVUEwAvm7ZxE
 Oi7Ni8WC6ZVnij6Gb3thgIbXv1z3XwtcvwHTTH0w3Rf+3Iy+i9dYq//QN46XXmd5
 hglOvHOUpcQE/dtbgW5Uuo0QvBxyljbwmOJNDog69wX5DC4n5wfdr/E5TGzWnwTq
 FrYID16p3gK/F0PbkIwJwMOqxnzMMvtAhZbkAgrrnoI9rLOSt3qZKefajJxYVMid
 FCchOkj21A==
 =79aU
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.16-20250614' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix for a race between SQPOLL exit and fdinfo reading.

   It's slim and I was only able to reproduce this with an artificial
   delay in the kernel. Followup sparse fix as well to unify the access
   to ->thread.

 - Fix for multiple buffer peeking, avoiding truncation if possible.

 - Run local task_work for IOPOLL reaping when the ring is exiting.

   This currently isn't done due to an assumption that polled IO will
   never need task_work, but a fix on the block side is going to change
   that.

* tag 'io_uring-6.16-20250614' of git://git.kernel.dk/linux:
  io_uring: run local task_work from ring exit IOPOLL reaping
  io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
  io_uring: consistently use rcu semantics with sqpoll thread
  io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()
2025-06-14 08:44:54 -07:00
Linus Torvalds
588adb24b7 Rust fixes for v6.16
'kernel' crate:
 
   - 'hrtimer': fix future compile error when the 'impl_has_hr_timer!'
     macro starts to get called.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmhNQGMACgkQGXyLc2ht
 IW0l9g//fk7D9KC9Dv86miWNgk7RAuHpmCoK3PM4mIr983wA8DO7sl+0cSIJzxEp
 sYe0TGuXw/2jng1wPide9lG77yNe4HwJR2aeZJuFwxwki9yfd5rMqK7yizXxC8iY
 3EUyewp4qiBe4r3IFTfl4syo0TqdoSJ+J9wKOADzw/OzwZW4xO/shDZmCicYMRDJ
 Z20ke6a0benwmSYRkBm4XV3vS+rVf1NH54KrNK02KTqmQa0ErFMN2lCOUmwUfrTX
 980T0hfjpzwzB9LVn76d62bv16AwgM9Vgdx6OWhYste/7Vk2w6ZUfmXzBI9qlLry
 zaQi4oCc+eRI5fSs376nq8rpiFiCINxZmaHM3L8VvXhAB9vfOgECdXiGxVXJk60O
 rlUDQ5g1Lbf4MdrQMvHtn9ub8uRK0E2uktlnmaHh5D7RTrh7+UfJyvU2RYy2dE9w
 3QGINktZkIaHIdypUShtClBZClDI7ZfK2x6oQmICow9YNkxIV6l8DO6FGuJCjPUb
 gauMgTJBmIYh9qfcSi4+R01tie3/uzT6MRi7ZvdceWvq9wCPPzWMRFqYtelAQVnq
 qX9EhcPuVM84IhJtcuhtsuWU6wlhv+1rXHKFjbtrBB8DHxps/KbQrI8p36QSNBPi
 XEgrSXjjwjYZ9TszeAMmnbU8vpNXBKoWjOj8N1jQGWZX6eMMQmg=
 =J+av
 -----END PGP SIGNATURE-----

Merge tag 'rust-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust fix from Miguel Ojeda:

  - 'hrtimer': fix future compile error when the 'impl_has_hr_timer!'
    macro starts to get called

* tag 'rust-fixes-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: time: Fix compile error in impl_has_hr_timer macro
2025-06-14 08:38:34 -07:00
Linus Torvalds
27b9989b87 9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels.  Only 4 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaE0BIAAKCRDdBJ7gKXxA
 jkTMAQCTWhvZZdcEdyxo0HQbGo2pcqB4awXjire6GabBFcr1owD5AVV0OYiQNNEN
 tbOVsr+2aZBr/aXTkTy4VpOg1kin8Ak=
 =ThOY
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
  or aren't considered necessary for -stable kernels. Only 4 are for MM"

* tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: add mmap_prepare() compatibility layer for nested file systems
  init: fix build warnings about export.h
  MAINTAINERS: add Barry as a THP reviewer
  drivers/rapidio/rio_cm.c: prevent possible heap overwrite
  mm: close theoretical race where stale TLB entries could linger
  mm/vma: reset VMA iterator on commit_merge() OOM failure
  docs: proc: update VmFlags documentation in smaps
  scatterlist: fix extraneous '@'-sign kernel-doc notation
  selftests/mm: skip failed memfd setups in gup_longterm
2025-06-14 08:18:09 -07:00
Linus Torvalds
4774cfe354 SCSI fixes on 20250613
All fixes for drivers.  The core change in the error handler is simply
 to translate an ALUA specific sense code into a retry the ALUA
 components can handle and won't impact any other devices.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCaEyPIiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVXlAPoCGJTL
 EhMVta96mQGEBK2YfgBQGh87cMSNi7u3f04xawEA84UrOhakLtsVYp9Rua7k0VzL
 blmQtDCoujlsPAasaQg=
 =0N8r
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "All fixes for drivers.

  The core change in the error handler is simply to translate an ALUA
  specific sense code into a retry the ALUA components can handle and
  won't impact any other devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: error: alua: I/O errors for ALUA state transitions
  scsi: storvsc: Increase the timeouts to storvsc_timeout
  scsi: s390: zfcp: Ensure synchronous unit_add
  scsi: iscsi: Fix incorrect error path labels for flashnode operations
  scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registers
  scsi: core: ufs: Fix a hang in the error handler
2025-06-13 16:49:39 -07:00
Linus Torvalds
25294cb8a4 drm fixes for 6.16-rc2
vc4:
 - Fix infinite EPROBE_DEFER loop in vc4 probing.
 
 amdxdna:
 - Fix amdxdna firmware size.
 
 meson:
 - modesetting fixes
 
 sitronix:
 - Kconfig fix for st7171-i2c.
 
 dma-buf:
 - Fix -EBUSY WARN_ON_ONCE in dma-buf
 
 udmabuf:
 - Use dma_sync_sgtable_for_cpu in udmabuf.
 
 xe:
 - Fix regression disallowing 64K SVM migration
 - Use a bounce buffer for WA BB
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmhMjOkACgkQDHTzWXnE
 hr6N8A/8CJBntMZGliV9P9PSVCXY621Zg4OS0SUFzW+68Iru8qCYlZVqjpiglweW
 Rf9iwMEsRfgrXooJ9EkXS9+DCrRdtVVCdwsXXLK7C3+dPOCcjgJb/7vLizVM5nv0
 YHvrTsySqwIHviOyPiiOBvxCi27O04GpcA1uojZQ6Rqzi/5WVEgNqzOkuZcbcPvi
 OAajkcGHqPfWp3LaXgmaUB61gdaq89M2odbmbDQgIVr5jsGydnQ7QHJ8TXpNpjgj
 uN37IAFW7rTpfuzZ8wfR7FZ7V5EE9kvbcLO76blA9Aeoz35M+z+DpfSFgcyYnGg5
 nw2YrQw2tJesY323Ago7x//ta7Rmg6kz9zDG1aGe6Pe4NPw8nbjDZf6ZvpOSiZ09
 9H45t8Si97P+xoQkBY9GMn71B5/s1I1HlVzPYX8XBHJt8rD5ByR4EHuY81kbwRNK
 UZrqJw7/4qOgIXbaPNQvC3zrO4oG5vFjRxON56m1KJIYt12pLgbQXBVQh/wjjmi8
 9Ci0cfXjom5WV08av/A4NX21Vp7shbjuncA2vKpfkrRkr9B/Y9woFsMMmFGGAVHT
 jJGAqCyPZi1nnVYMbcwC3Ft6WlPFUoDj15+ZprMu0Oi84kgOhpgAiBhTaclwDKwu
 3dzpOn+IBXiZAkuZyHXDfavLck0iO/2qNW58Ocg1LHK3o7sPB2E=
 =VNUR
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Quiet week, only two pull requests came my way, xe has a couple of
  fixes and then a bunch of fixes across the board, vc4 probably fixes
  the biggest problem:

  vc4:
   - Fix infinite EPROBE_DEFER loop in vc4 probing

  amdxdna:
   - Fix amdxdna firmware size

  meson:
   - modesetting fixes

  sitronix:
   - Kconfig fix for st7171-i2c

  dma-buf:
   - Fix -EBUSY WARN_ON_ONCE in dma-buf

  udmabuf:
   - Use dma_sync_sgtable_for_cpu in udmabuf

  xe:
   - Fix regression disallowing 64K SVM migration
   - Use a bounce buffer for WA BB"

* tag 'drm-fixes-2025-06-14' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/lrc: Use a temporary buffer for WA BB
  udmabuf: use sgtable-based scatterlist wrappers
  dma-buf: fix compare in WARN_ON_ONCE
  drm/sitronix: st7571-i2c: Select VIDEOMODE_HELPERS
  drm/meson: fix more rounding issues with 59.94Hz modes
  drm/meson: use vclk_freq instead of pixel_freq in debug print
  drm/meson: fix debug log statement when setting the HDMI clocks
  drm/vc4: fix infinite EPROBE_DEFER loop
  drm/xe/svm: Fix regression disallowing 64K SVM migration
  accel/amdxdna: Fix incorrect PSP firmware size
2025-06-13 16:27:27 -07:00
Jens Axboe
b62e0efd8a io_uring: run local task_work from ring exit IOPOLL reaping
In preparation for needing to shift NVMe passthrough to always use
task_work for polled IO completions, ensure that those are suitably
run at exit time. See commit:

9ce6c9875f3e ("nvme: always punt polled uring_cmd end_io work to task_work")

for details on why that is necessary.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13 15:26:17 -06:00
Jens Axboe
9ce6c9875f nvme: always punt polled uring_cmd end_io work to task_work
Currently NVMe uring_cmd completions will complete locally, if they are
polled. This is done because those completions are always invoked from
task context. And while that is true, there's no guarantee that it's
invoked under the right ring context, or even task. If someone does
NVMe passthrough via multiple threads and with a limited number of
poll queues, then ringA may find completions from ringB. For that case,
completing the request may not be sound.

Always just punt the passthrough completions via task_work, which will
redirect the completion, if needed.

Cc: stable@vger.kernel.org
Fixes: 585079b6e425 ("nvme: wire up async polling for io passthrough commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13 15:18:34 -06:00
Linus Torvalds
18531f4d1c ACPI updates for 6.16-rc2
- Update the faux device handling code in the driver core and address
    an ACPI APEI error injection driver failure that started to occur
    after switching it over to using a faux device on top of that (Dan
    Williams).
 
  - Update data types of variables passed as arguments to
    mwait_idle_with_hints() in the ACPI PAD (processor aggregator device)
    driver to match the function definition after recent changes (Uros
    Bizjak).
 
  - Fix a NULL pointer dereference in the ACPI CPPC library that occurs
    when nosmp is passed to the kernel in the command line (Yunhui Cui).
 
  - Ignore ECDT tables with an invalid ID string to prevent using an
    incorrect GPE for signaling events on some systems (Armin Wolf).
 
  - Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhMhWMSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO11QwIAJauSuEZ6CMSB+ntXZ0WO+Sx62EKn1/w
 sC8auAtfmp7H31m1YjqJllt/n2tadJO2ZMAzMuHeVp+1LIxHnNPR6e97+8z+Xj3m
 224NUki1kG7EyEYEwtZnHVOQBue1nKxNZqQ4NHuuwIXIj2dE4GgsCEqT+vrZVmI+
 JLZWo8pMH2puAakdBkPtsdqWzTNq7lOAsigkoDvbO4Azz2GCPilgrgzOeqdOlFw8
 URwM7qhk6Wd77Zr9kyzQIRBt8LVwKIF6i13eR4CXCNzp+5O0qYlci1dBYErL/oWU
 u2D5ebQMCCKqBnNHowBr9ChM4QwmHB9YdTnx574z7kUKCjrgQOrC7hQ=
 =Jx0s
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix an ACPI APEI error injection driver failure that started to
  occur after switching it over to using a faux device, address an EC
  driver issue related to invalid ECDT tables, clean up the usage of
  mwait_idle_with_hints() in the ACPI PAD driver, add a new IRQ override
  quirk, and fix a NULL pointer dereference related to nosmp:

   - Update the faux device handling code in the driver core and address
     an ACPI APEI error injection driver failure that started to occur
     after switching it over to using a faux device on top of that (Dan
     Williams)

   - Update data types of variables passed as arguments to
     mwait_idle_with_hints() in the ACPI PAD (processor aggregator
     device) driver to match the function definition after recent
     changes (Uros Bizjak)

   - Fix a NULL pointer dereference in the ACPI CPPC library that occurs
     when nosmp is passed to the kernel in the command line (Yunhui Cui)

   - Ignore ECDT tables with an invalid ID string to prevent using an
     incorrect GPE for signaling events on some systems (Armin Wolf)

   - Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan)"

* tag 'acpi-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Use IRQ override on MACHENIKE 16P
  ACPI: EC: Ignore ECDT tables with an invalid ID string
  ACPI: CPPC: Fix NULL pointer dereference when nosmp is used
  ACPI: PAD: Update arguments of mwait_idle_with_hints()
  ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
  driver core: faux: Quiet probe failures
  driver core: faux: Suppress bind attributes
2025-06-13 13:39:15 -07:00
Linus Torvalds
f688b599d7 Power management updates for 6.16-rc2
- Implement CpuId Rust abstraction and use it to fix doctest failure
    related to the recently introduced cpumask abstraction (Viresh Kumar).
 
  - Do minor cleanups in the `# Safety` sections for cpufreq abstractions
    added recently (Viresh Kumar).
 
  - Unbreak cpupower systemd service units installation on some systems
    by adding a unitdir variable for specifying the location to install
    them (Francesco Poli).
 
  - Eliminate mwait_play_dead_cpuid_hint() again after reverting its
    elimination during the 6.16 merge window due to a problem with
    handling "dead" SMT siblings, but this time prevent leaving them in
    C1 after initialization by taking them online and back offline when
    a proper cpuidle driver for the platform has been registered (Rafael
    Wysocki).
 
  - Update data types of variables passed as arguments to
    mwait_idle_with_hints() to match the function definition
    after recent changes (Uros Bizjak).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhMgawSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1dA0H/j7V4sfA383pZfWwegRC4MQeiW4EVdIx
 2G6d33Gsfv+zEzTSsPvtKghR4eUIldTdco04bqusMcI+qIfgdWIEBpi2tQhJU2Tt
 Bgc24Kya0n85KKNLHs60xm0WXhkAyu3TFad+4yTGXRZEmAD+O6lyUPjum+mn+gbx
 HuLE6KE9D/qzzYOU03kjCsJExBf7vv0bBNIqGqNyuFLYOaoqZd5rLhNhhxm2AkYi
 hZ5wmYBY+2SJRzwryNNHQKKmZ1jk9HGapnIVrxQ2Pjc0AhX+tRW7FI5lDDvUWtW+
 RN/226Y3OQ3JHXAj4S0K64t0ZEpiEKS2oPatjGzosNmdyI0f+CRACQs=
 =OcND
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix the cpupower utility installation, fix up the recently added
  Rust abstractions for cpufreq and OPP, restore the x86 update
  eliminating mwait_play_dead_cpuid_hint() that has been reverted during
  the 6.16 merge window along with preventing the failure caused by it
  from happening, and clean up mwait_idle_with_hints() usage in
  intel_idle:

   - Implement CpuId Rust abstraction and use it to fix doctest failure
     related to the recently introduced cpumask abstraction (Viresh
     Kumar)

   - Do minor cleanups in the `# Safety` sections for cpufreq
     abstractions added recently (Viresh Kumar)

   - Unbreak cpupower systemd service units installation on some systems
     by adding a unitdir variable for specifying the location to install
     them (Francesco Poli)

   - Eliminate mwait_play_dead_cpuid_hint() again after reverting its
     elimination during the 6.16 merge window due to a problem with
     handling "dead" SMT siblings, but this time prevent leaving them in
     C1 after initialization by taking them online and back offline when
     a proper cpuidle driver for the platform has been registered
     (Rafael Wysocki)

   - Update data types of variables passed as arguments to
     mwait_idle_with_hints() to match the function definition after
     recent changes (Uros Bizjak)"

* tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  rust: cpu: Add CpuId::current() to retrieve current CPU ID
  rust: Use CpuId in place of raw CPU numbers
  rust: cpu: Introduce CpuId abstraction
  intel_idle: Update arguments of mwait_idle_with_hints()
  cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
  cpupower: split unitdir from libdir in Makefile
  Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
  ACPI: processor: Rescan "dead" SMT siblings during initialization
  intel_idle: Rescan "dead" SMT siblings during initialization
  x86/smp: PM/hibernate: Split arch_resume_nosmt()
  intel_idle: Use subsys_initcall_sync() for initialization
2025-06-13 13:27:41 -07:00
Rafael J. Wysocki
28b069933d Merge branches 'acpi-pad', 'acpi-cppc', 'acpi-ec' and 'acpi-resource'
Merge assorted ACPI updates for 6.16-rc2:

 - Update data types of variables passed as arguments to
   mwait_idle_with_hints() in the ACPI PAD (processor aggregator device)
   driver to match the function definition after recent changes (Uros
   Bizjak).

 - Fix a NULL pointer dereference in the ACPI CPPC library that occurs
   when nosmp is passed to the kernel in the command line (Yunhui Cui).

 - Ignore ECDT tables with an invalid ID string to prevent using an
   incorrect GPE for signaling events on some systems (Armin Wolf).

 - Add a new IRQ override quirk for MACHENIKE 16P (Wentao Guan).

* acpi-pad:
  ACPI: PAD: Update arguments of mwait_idle_with_hints()

* acpi-cppc:
  ACPI: CPPC: Fix NULL pointer dereference when nosmp is used

* acpi-ec:
  ACPI: EC: Ignore ECDT tables with an invalid ID string

* acpi-resource:
  ACPI: resource: Use IRQ override on MACHENIKE 16P
2025-06-13 21:55:30 +02:00
Rafael J. Wysocki
dd3581853c Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc2:

 - Update data types of variables passed as arguments to
   mwait_idle_with_hints() to match the function definition
   after recent changes (Uros Bizjak).

 - Eliminate mwait_play_dead_cpuid_hint() again after reverting its
   elimination during the merge window due to a problem with handling
   "dead" SMT siblings, but this time prevent leaving them in C1 after
   initialization by taking them online and back offline when a proper
   cpuidle driver for the platform has been registered (Rafael Wysocki).

* pm-cpuidle:
  intel_idle: Update arguments of mwait_idle_with_hints()
  Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
  ACPI: processor: Rescan "dead" SMT siblings during initialization
  intel_idle: Rescan "dead" SMT siblings during initialization
  x86/smp: PM/hibernate: Split arch_resume_nosmt()
  intel_idle: Use subsys_initcall_sync() for initialization
2025-06-13 21:28:07 +02:00
Rafael J. Wysocki
ea2867608b Merge branch 'pm-tools'
Merge a cpupower utility fix for 6.16-rc2 that unbreaks systemd service
units installation on some sysems (Francesco Poli).

* pm-tools:
  cpupower: split unitdir from libdir in Makefile
2025-06-13 21:25:38 +02:00
Linus Torvalds
02adc1490e spi: Fies for v6.16
A collection of driver specific fixes, most minor apart from the OMAP
 ones which disable some recent performance optimisations in some
 non-standard cases where we could start driving the bus incorrectly.
 
 The change to the stm32-ospi driver to use the newer reset APIs is a fix
 for interactions with other IP sharing the same reset line in some SoCs.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmhMXiEACgkQJNaLcl1U
 h9DfLwf9F7yAsLh7eGYPCqjDB365LaAuroOWuM5hB74cTFis9WJpcEI1pk9GdaD0
 m7TAkc+cxo0KBvpe0/t0+oqSBkUdcZ4ASXZp4zYnsj9dGhBcmlS0szHIOZWknMxZ
 E2pzHWj6p4/+zntWa+CCiiVGHz0PV3I9Oq3V1kI1EqyAZXc9uf5hj0hITVulJ4ih
 8+Y+927MpJ2dis8CaHeubDfNxnwJlCLS5GFwZaEhTXWp6IttxjH5KmeZu2Wtahdw
 ZcqUDLPYxdzoirDoGvdGBTHc0NMF843WD9wFWb29BzuyPLZYGAwGXDhGtHqSaGAo
 6HB5529nuKDAC1XNyRN6NA4bQosMaw==
 =ShaP
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A collection of driver specific fixes, most minor apart from the OMAP
  ones which disable some recent performance optimisations in some
  non-standard cases where we could start driving the bus incorrectly.

  The change to the stm32-ospi driver to use the newer reset APIs is a
  fix for interactions with other IP sharing the same reset line in some
  SoCs"

* tag 'spi-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engine
  spi: stm32-ospi: clean up on error in probe()
  spi: stm32-ospi: Make usage of reset_control_acquire/release() API
  spi: offload: check offload ops existence before disabling the trigger
  spi: spi-pci1xxxx: Fix error code in probe
  spi: loongson: Fix build warnings about export.h
  spi: omap2-mcspi: Disable multi-mode when the previous message kept CS asserted
  spi: omap2-mcspi: Disable multi mode when CS should be kept asserted after message
2025-06-13 11:01:44 -07:00
Linus Torvalds
601dddb6c5 regulator: Fix for v6.16
One minor fix for a leak in the DT parsing code in the max20086 driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmhMXpMACgkQJNaLcl1U
 h9DxfAf/Sk4lxMl2JHbjk+TYXz5Q+Y04fpGYMCnV1u64kq9yxB2NKBjMomS6YysY
 qsueW1caCJM/D/8MUqkU6mbQxGCt6V8eKWjVxcjfPOEM/lWCP+FfsMOdu9rXij1s
 bhR4fI7jmaDYNjwaDCnyK7F0/TxUj5GiOPV3d+5C3gQ6VXFB+wTs1jzalvszP7WP
 3doM/U8NZH7gg7zztvKW8l89d8OWCc7L83dW/ecrAAMX+jbi6JC8qT5VTbKkdSfs
 Y1dPrTloRIj+z1n72YY6RLsqD8qO5P5KW2wy1qbnqFEZx82s0xFcAgcCkkeK3lnG
 He8R+qqfZvjL1o+shdlCHthr+R82Bw==
 =bJgB
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "One minor fix for a leak in the DT parsing code in the max20086 driver"

* tag 'regulator-fix-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: max20086: Fix refcount leak in max20086_parse_regulators_dt()
2025-06-13 11:00:19 -07:00
Oleg Nesterov
f90fff1e15 posix-cpu-timers: fix race between handle_posix_cpu_timers() and posix_cpu_timer_del()
If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().

If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.

Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.

This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.

Cc: stable@vger.kernel.org
Reported-by: Benoît Sevens <bsevens@google.com>
Fixes: 0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-13 10:55:49 -07:00
Linus Torvalds
3ca933aad0 tracing fix for 6.16:
- Do not free "head" variable in filter_free_subsystem_filters()
 
   The first error path jumps to "free_now" label but first frees the newly
   allocated "head" variable. But the "free_now" code checks this variable,
   and if it is not NULL, it will iterate the list. As this list variable
   was already initialized, the "free_now" code will not do anything as it
   is empty. But freeing it will cause a UAF bug. The error path should
   simply jump to the "free_now" label and leave the "head" variable alone.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaExH8RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qs+DAQDWBAAmviDcMb+e+9uZi8rR8+Aj+j5S
 efPe/g4D2otl5QD/T7u5TZFrSOZfl4Gv9Z1ZWhKj+xfw3FiphODrzHdcsgo=
 =eP9r
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

 - Do not free "head" variable in filter_free_subsystem_filters()

   The first error path jumps to "free_now" label but first frees the
   newly allocated "head" variable. But the "free_now" code checks this
   variable, and if it is not NULL, it will iterate the list. As this
   list variable was already initialized, the "free_now" code will not
   do anything as it is empty. But freeing it will cause a UAF bug.

   The error path should simply jump to the "free_now" label and leave
   the "head" variable alone.

* tag 'trace-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Do not free "head" on error path of filter_free_subsystem_filters()
2025-06-13 10:51:11 -07:00
Linus Torvalds
dde6379705 ARM:
- Rework of system register accessors for system registers that are
   directly writen to memory, so that sanitisation of the in-memory
   value happens at the correct time (after the read, or before the
   write). For convenience, RMW-style accessors are also provided.
 
 - Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
   which was always broken.
 
 x86:
 
 - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass
   only the "untouched" addresses and flipping the shared/private bit
   in the implementation.
 
 - Disable SEV-SNP support on initialization failure
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhMVEQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNCYwgAoqYz+RTcIx6EwHD/kN/9maUcQVl1
 MMmXqfF2jQYmTvk7ocEW1qwx/SV0kB+H9LOThV8SWTvVxDbNqAaUWRDz+wcz3zaO
 6/sUwz4dtU4XaTgxYhhB82lsPtJHyM+FM+bNL4rFFnrA1tZ93wRsMEeryZ5h960V
 C1Bc+PLBdpj+S3gQGvxeMxnG/n0oOAcecUqQa3ViIOKSfZEc/11+BjjvfvkYqExq
 s206faKSqor8xVXUbgtOk3LZreIExj/mD+pwMiUBvG0H0g4wnaG7Arc41QCFMowF
 4l4sQVMWFZiKQvQZSfdQOeNsXcepWw0qISK7UeoWpLnpM78uUfCS6iG1rA==
 =Hc3G
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Rework of system register accessors for system registers that are
     directly writen to memory, so that sanitisation of the in-memory
     value happens at the correct time (after the read, or before the
     write). For convenience, RMW-style accessors are also provided.

   - Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
     which was always broken.

  x86:

   - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
     pass only the "untouched" addresses and flipping the shared/private
     bit in the implementation.

   - Disable SEV-SNP support on initialization failure

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
  KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
  KVM: SEV: Disable SEV-SNP support on initialization failure
  KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
  KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
  KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
  KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
  KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
  KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
  KVM: arm64: Add RMW specific sysreg accessor
  KVM: arm64: Add assignment-specific sysreg accessor
2025-06-13 10:05:31 -07:00
Jens Axboe
26ec15e4b0 io_uring/kbuf: don't truncate end buffer for multiple buffer peeks
If peeking a bunch of buffers, normally io_ring_buffers_peek() will
truncate the end buffer. This isn't optimal as presumably more data will
be arriving later, and hence it's better to stop with the last full
buffer rather than truncate the end buffer.

Cc: stable@vger.kernel.org
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Reported-by: Christian Mazakas <christian.mazakas@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13 11:01:49 -06:00
Linus Torvalds
ad6159087f This push fixes a broken self-test in hkdf (new regression).
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmhI8cUACgkQxycdCkmx
 i6djRxAAyPbJxvSFwBgGEh4EO+hMugezC74EAbat46Cs8PzMQjLEWEbOug2ukXbk
 W+C1jRzSn3d3/j2glmjlmLUDHyY0OgGt0/is/X/DcPa/WynexvGMWkM8Umf+vwoV
 G9jO0YdgtnXK/HnaZBKB138YXeSdj37iiA9vIIW4VcZuDZbn2CFer/XxMiN8A1Do
 lDF69O9gEqbV/lGIUZdqJRqgaD+fiJAqRXSFp3SF44jx4Q2aCE07eqctLdAkxn9j
 l5x3FYAfr2CSMzqebcdwibwH+11toVi5LyDBBSWHtmKdo1C4j8m4oMIkafDXHfCx
 AM8EqrLdbaotftH5WhC3scoO6mbYVRhA/GnGp4nY/6+fVdwKbxy6Z2U22fo5+VVp
 e390GhpAHH1fyBEpapHgndqGqNz5zCtlgRil3ySzp9M6rt+GpFxEeIPv2IPg8POo
 Svw4TFyubrjUPlsY88JlKnsVEBbL0L8sRWsdDfxbZVOPmUQmdUjETxQ1ShCfmw5J
 1w/W0PT8J95Qfm8t93pRFf5ZW6Uyot7R7J0N5oyeQJLT2wx5uPsH+Oaur7V+a5ZS
 JZNWkuTubr8Muy6kMIVsJ5tOoFKQ5uSW85BKHwSBJgCeULkhOaeupU8VXwanySMu
 6ftqLPLzWkrTqlX5BwjjsbwxijFn82FBkfcMy0WAQm3Oo/bFv3I=
 =IqUE
 -----END PGP SIGNATURE-----

Merge tag 'v6.16-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix a broken self-test in hkdf (new regression)"

* tag 'v6.16-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hkdf - move to late_initcall
2025-06-13 09:59:29 -07:00
Linus Torvalds
36df6f734a bcachefs fixes for 6.16-rc2
As usual, highlighting the ones users have been noticing:
 
 - Fix a small issue with has_case_insensitive not being propagated on
   snapshot creation; this led to fsck errors, which we're harmless
   because we're not using this flag yet (it's for overlayfs +
   casefolding).
 
 - Log the error being corrected in the journal when we're doing fsck
   repair: this was one of the "lessons learned" from the i_nlink 0 ->
   subvolume deletion bug, where reconstructing what had happened by
   analyzing the journal was a bit more difficult than it needed to be.
 
 - Don't schedule btree node scan to run in the superblock: this fixes a
   regression from the 6.16 recovery passes rework, and let to it running
   unnecessarily.
 
   The real issue here is that we don't have online, "self healing" style
   topology repair yet: topology repair currently has to run before we go
   RW, which means that we may schedule it unnecessarily after a
   transient error. This will be fixed in the future.
 
 - We now track, in btree node flags, the reason it was scheduled to be
   rewritten. We discovered a deadlock in recovery when many btree nodes
   need to be rewritten because they're degraded: fully fixing this will
   take some work but it's now easier to see what's going on.
 
   For the bug report where this came up, a device had been kicked RO due
   to transient errors: manually setting it back to RW was sufficient to
   allow recovery to succeed.
 
 - Mark a few more fsck errors as autofix: as a reminder to users, please
   do keep reporting cases where something needs to be repaired and is
   not repaired automatically (i.e. cases where -o fix_errors or fsck -y
   is required).
 
 - rcu_pending.c now works with PREEMPT_RT
 
 - 'bcachefs device add', then umount, then remount wasn't working - we
   now emit a uevent so that the new device's new superblock is correctly
   picked up
 
 - Assorted repair fixes: btree node scan will no longer incorrectly
   update sb->version_min,
 
 - Assorted syzbot fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmhLiCcACgkQE6szbY3K
 bnZrAA/8D7N5iwNP7LA3APij7LUj3VAzqX1WG0S0LAWGxYOjrRMn8gdrPt9W5CAO
 aAQlddIijggwJAv7CQdxChaR4IuagNJVsjarJ8//Toa6mUs/5cj570Dl+voNyXmC
 /vgZBjN6hP917QVxbHGfa0TJUkmPbkbR2eILfTB95q8goDmyA9OFJOx3Je3G36x3
 vuCvucAXmovMkcb7F41XBp6wHWBSvsttkLsKb79jLCGroBPN1FZRHCPlZWZotyh8
 H7QCRkC1WbUP+jFd/xfjU2O8BrvmPt4PzqaXl8IS23KWVRtYjBFMIO15LTU6qnzV
 QvmbYZhCJJ822kkvCkOny8QY6I97U84kGFPqZXrGPPChf55ne3QTmicRKgJzxo5r
 35n940l5zPE5RL/c77RulkEqfArf2cZ4vzDRUymX7cZBk7QFTkdrjINAUNd0dUsx
 kMLG6Ej77i1eiQbKWnH+jxiGuslStdOknpIdNCc/mnJflU7MfTkY+W8iQwQHdH6f
 Yh4rS6S6xLxq4ZlJQBagUAdBdeCNnybg0Voxim+uoy/nF+nnRupJJfvBWoJdSxxT
 PVhfT8VNUMOEiJannPRhqTzgVHBKVIMS1xxw3L4SeLqIW/JZcEeNWlNVmgPdiWu+
 X7LJJ5GcuxXnXOo+dxJtKTG5bf/PvtfPaD7gv+PK7QuXybrreL0=
 =hPjd
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "As usual, highlighting the ones users have been noticing:

   - Fix a small issue with has_case_insensitive not being propagated on
     snapshot creation; this led to fsck errors, which we're harmless
     because we're not using this flag yet (it's for overlayfs +
     casefolding).

   - Log the error being corrected in the journal when we're doing fsck
     repair: this was one of the "lessons learned" from the i_nlink 0 ->
     subvolume deletion bug, where reconstructing what had happened by
     analyzing the journal was a bit more difficult than it needed to
     be.

   - Don't schedule btree node scan to run in the superblock: this fixes
     a regression from the 6.16 recovery passes rework, and let to it
     running unnecessarily.

     The real issue here is that we don't have online, "self healing"
     style topology repair yet: topology repair currently has to run
     before we go RW, which means that we may schedule it unnecessarily
     after a transient error. This will be fixed in the future.

   - We now track, in btree node flags, the reason it was scheduled to
     be rewritten. We discovered a deadlock in recovery when many btree
     nodes need to be rewritten because they're degraded: fully fixing
     this will take some work but it's now easier to see what's going
     on.

     For the bug report where this came up, a device had been kicked RO
     due to transient errors: manually setting it back to RW was
     sufficient to allow recovery to succeed.

   - Mark a few more fsck errors as autofix: as a reminder to users,
     please do keep reporting cases where something needs to be repaired
     and is not repaired automatically (i.e. cases where -o fix_errors
     or fsck -y is required).

   - rcu_pending.c now works with PREEMPT_RT

   - 'bcachefs device add', then umount, then remount wasn't working -
     we now emit a uevent so that the new device's new superblock is
     correctly picked up

   - Assorted repair fixes: btree node scan will no longer incorrectly
     update sb->version_min,

   - Assorted syzbot fixes"

* tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs: (23 commits)
  bcachefs: Don't trace should_be_locked unless changing
  bcachefs: Ensure that snapshot creation propagates has_case_insensitive
  bcachefs: Print devices we're mounting on multi device filesystems
  bcachefs: Don't trust sb->nr_devices in members_to_text()
  bcachefs: Fix version checks in validate_bset()
  bcachefs: ioctl: avoid stack overflow warning
  bcachefs: Don't pass trans to fsck_err() in gc_accounting_done
  bcachefs: Fix leak in bch2_fs_recovery() error path
  bcachefs: Fix rcu_pending for PREEMPT_RT
  bcachefs: Fix downgrade_table_extra()
  bcachefs: Don't put rhashtable on stack
  bcachefs: Make sure opts.read_only gets propagated back to VFS
  bcachefs: Fix possible console lock involved deadlock
  bcachefs: mark more errors autofix
  bcachefs: Don't persistently run scan_for_btree_nodes
  bcachefs: Read error message now prints if self healing
  bcachefs: Only run 'increase_depth' for keys from btree node csan
  bcachefs: Mark need_discard_freespace_key_bad autofix
  bcachefs: Update /dev/disk/by-uuid on device add
  bcachefs: Add more flags to btree nodes for rewrite reason
  ...
2025-06-13 09:49:07 -07:00
Bagas Sanjaya
db3dfae1a2 Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists
Stephen Rothwell reports htmldocs warning on ublk docs:

Documentation/block/ublk.rst:414: ERROR: Unexpected indentation. [docutils]

Fix the warning by separating sublists of auto buffer registration
fallback behavior from their appropriate parent list item.

Fixes: ff20c516485e ("ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250612132638.193de386@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20250613023857.15971-1-bagasdotme@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13 09:25:42 -06:00
Jason Gunthorpe
f9705d66fa iommu/tegra: Fix incorrect size calculation
This driver uses a mixture of ways to get the size of a PTE,
tegra_smmu_set_pde() did it as sizeof(*pd) which became wrong when pd
switched to a struct tegra_pd.

Switch pd back to a u32* in tegra_smmu_set_pde() so the sizeof(*pd)
returns 4.

Fixes: 50568f87d1e2 ("iommu/terga: Do not use struct page as the handle for as->pd memory")
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Closes: https://lore.kernel.org/all/62e7f7fe-6200-4e4f-ad42-d58ad272baa6@tecnico.ulisboa.pt/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Link: https://lore.kernel.org/r/0-v1-da7b8b3d57eb+ce-iommu_terga_sizeof_jgg@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-13 17:02:31 +02:00