twx-linux/drivers
Michael Liang ec00ea5645 nvme-tcp: fix premature queue removal and I/O failover
[ Upstream commit 77e40bbce93059658aee02786a32c5c98a240a8a ]

This patch addresses a data corruption issue observed in nvme-tcp during
testing.

In an NVMe native multipath setup, when an I/O timeout occurs, all
inflight I/Os are canceled almost immediately after the kernel socket is
shut down. These canceled I/Os are reported as host path errors,
triggering a failover that succeeds on a different path.

However, at this point, the original I/O may still be outstanding in the
host's network transmission path (e.g., the NIC’s TX queue). From the
user-space app's perspective, the buffer associated with the I/O is
considered completed since they're acked on the different path and may
be reused for new I/O requests.

Because nvme-tcp enables zero-copy by default in the transmission path,
this can lead to corrupted data being sent to the original target,
ultimately causing data corruption.

We can reproduce this data corruption by injecting delay on one path and
triggering i/o timeout.

To prevent this issue, this change ensures that all inflight
transmissions are fully completed from host's perspective before
returning from queue stop. To handle concurrent I/O timeout from multiple
namespaces under the same controller, always wait in queue stop
regardless of queue's state.

This aligns with the behavior of queue stopping in other NVMe fabric
transports.

Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09 09:44:01 +02:00
..
accel accel/qaic: Fix integer overflow in qaic_validate_req() 2025-03-28 21:59:54 +01:00
accessibility
acpi ACPI PPTT: Fix coding mistakes in a couple of sizeof() calls 2025-05-02 07:50:58 +02:00
amba
android
ata ata: libata-scsi: Fix ata_msense_control_ata_feature() 2025-05-02 07:50:48 +02:00
atm
auxdisplay auxdisplay: hd44780: Fix an API misuse in hd44780.c 2025-05-02 07:50:38 +02:00
base driver core: fix potential NULL pointer dereference in dev_uevent() 2025-05-02 07:51:03 +02:00
bcma
block loop: aio inherit the ioprio of original request 2025-05-02 07:51:01 +02:00
bluetooth Bluetooth: btusb: avoid NULL pointer dereference in skb_dequeue() 2025-05-09 09:43:57 +02:00
bus bus: mhi: host: Fix race between unprepare and queue_buf 2025-04-25 10:45:26 +02:00
cache
cdrom
cdx cdx: Fix possible UAF error in driver_override_show() 2025-03-13 12:58:36 +01:00
char char: misc: register chrdev region with all possible minors 2025-05-02 07:50:49 +02:00
clk clk: check for disabled clock-provider in of_clk_get_hw_from_clkspec() 2025-05-02 07:50:53 +02:00
clocksource clocksource/drivers/stm32-lptimer: Use wakeup capable instead of init wakeup 2025-04-25 10:45:25 +02:00
comedi comedi: jr3_pci: Fix synchronous deletion of timer 2025-05-02 07:51:03 +02:00
connector
counter counter: microchip-tcb-capture: Fix undefined counter channel state on probe 2025-04-07 10:06:37 +02:00
cpufreq cpufreq: Fix setting policy limits when frequency tables are used 2025-05-09 09:43:53 +02:00
cpuidle cpuidle: riscv-sbi: fix device node release in early exit of for_each_possible_cpu 2025-01-17 13:36:16 +01:00
crypto crypto: ccp - Add support for PCI device 0x1134 2025-05-02 07:50:52 +02:00
cxl cxl/core/regs.c: Skip Memory Space Enable check for RCD and RCH Ports 2025-05-02 07:50:47 +02:00
dax
dca
devfreq
dio
dma dmaengine: dmatest: Fix dmatest waiting less when interrupted 2025-05-02 07:50:55 +02:00
dma-buf udmabuf: fix a buf size overflow issue during udmabuf creation 2025-05-02 07:50:57 +02:00
edac EDAC/altera: Set DDR and SDMMC interrupt mask before registration 2025-05-09 09:43:50 +02:00
eisa
extcon
firewire
firmware efi/libstub: Bump up EFI_MMAP_NR_SLACK_SLOTS to 32 2025-04-25 10:45:55 +02:00
fpga
fsi
gnss
gpio gpiolib: of: Move Atmel HSMCI quirk up out of the regulator comment 2025-05-02 07:50:59 +02:00
gpu drm/i915/pxp: fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions' 2025-05-09 09:43:56 +02:00
greybus
hid HID: pidff: Fix null pointer dereference in pidff_find_fields 2025-04-25 10:45:12 +02:00
hsi HSI: ssi_protocol: Fix use after free vulnerability in ssi_protocol Driver Due to Race Condition 2025-04-25 10:45:38 +02:00
hte
hv Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio() 2025-03-22 12:50:38 -07:00
hwmon hwmon: (nct6775-core) Fix out of bounds access for NCT679{8,9} 2025-04-10 14:37:38 +02:00
hwspinlock
hwtracing coresight-etm4x: add isb() before reading the TRCSTATR 2025-04-10 14:37:32 +02:00
i2c i2c: imx-lpi2c: Fix clock count when probe defers 2025-05-09 09:43:50 +02:00
i3c i3c: Add NULL pointer check in i3c_master_queue_ibi() 2025-04-25 10:45:28 +02:00
idle intel_idle: Handle older CPUs, which stop the TSC in deeper C states, correctly 2025-03-07 16:45:49 +01:00
iio iio: adc: ad7768-1: Fix conversion result sign 2025-05-02 07:50:39 +02:00
infiniband qibfs: fix _another_ leak 2025-05-02 07:50:56 +02:00
input Input: i8042 - swap old quirk combination with new quirk for more devices 2025-03-22 12:50:46 -07:00
interconnect
iommu iommu: Handle race with default domain setup 2025-05-09 09:43:56 +02:00
ipack
irqchip irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOs 2025-05-09 09:43:51 +02:00
isdn
leds leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs 2025-04-25 10:45:28 +02:00
macintosh
mailbox mailbox: pcc: Always clear the platform ack interrupt first 2025-05-02 07:50:54 +02:00
mcb mcb: fix a double free bug in chameleon_parse_gdd() 2025-05-02 07:50:47 +02:00
md dm: always update the array size in realloc_argv on success 2025-05-09 09:43:52 +02:00
media media: vimc: skip .s_stream() for stopped entities 2025-05-02 07:50:37 +02:00
memory memory: omap-gpmc: drop no compatible check 2025-04-10 14:37:38 +02:00
memstick memstick: rtsx_usb_ms: Fix slab-use-after-free in rtsx_usb_ms_drv_remove 2025-04-07 10:06:37 +02:00
message
mfd mfd: ene-kb3930: Fix a potential NULL pointer dereference 2025-04-25 10:45:28 +02:00
misc objtool, lkdtm: Obfuscate the do_nothing() pointer 2025-05-02 07:50:56 +02:00
mmc mmc: renesas_sdhi: Fix error handling in renesas_sdhi_probe 2025-05-09 09:43:51 +02:00
most
mtd mtd: rawnand: Add status chack in r852_ready() 2025-04-25 10:45:29 +02:00
mux
net bnxt_en: Fix ethtool -d byte order for 32-bit values 2025-05-09 09:44:01 +02:00
nfc
ntb ntb_hw_amd: Add NTB PCI ID for new gen CPU 2025-05-02 07:50:56 +02:00
nubus
nvdimm
nvme nvme-tcp: fix premature queue removal and I/O failover 2025-05-09 09:44:01 +02:00
nvmem nvmem: imx-ocotp-ele: fix MAC address byte order 2025-02-27 04:10:47 -08:00
of of: resolver: Fix device node refcount leakage in of_resolve_phandles() 2025-05-02 07:50:41 +02:00
opp OPP: OF: Fix an OF node leak in _opp_add_static_v2() 2025-02-08 09:51:55 +01:00
parisc
parport
pci PCI: imx6: Skip controller_id generation logic for i.MX7D 2025-05-09 09:43:55 +02:00
pcmcia
peci
perf perf: arm_pmu: Don't disable counter in armpmu_add() 2025-04-25 10:45:10 +02:00
phy phy: freescale: imx8m-pcie: assert phy reset and perst in power off 2025-04-25 10:45:36 +02:00
pinctrl pinctrl: renesas: rza2: Fix potential NULL pointer dereference 2025-05-02 07:50:52 +02:00
platform platform/x86/intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug 2025-05-09 09:43:53 +02:00
pmdomain pmdomain: imx8mp-blk-ctrl: add missing loop break condition 2025-01-23 17:21:17 +01:00
pnp
power power: supply: max77693: Fix wrong conversion of charge input threshold value 2025-04-10 14:37:31 +02:00
powercap powercap: call put_device() on an error path in powercap_register_control_type() 2025-03-22 12:50:40 -07:00
pps pps: Fix a use-after-free 2025-02-08 09:52:38 +01:00
ps3
ptp ptp: ocp: fix start time alignment in ptp_ocp_signal_set 2025-04-25 10:45:44 +02:00
pwm pwm: fsl-ftm: Handle clk_get_rate() returning 0 2025-04-25 10:45:21 +02:00
rapidio rapidio: fix an API misues when rio_add_net() fails 2025-03-13 12:58:27 +01:00
ras
regulator objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc() 2025-05-02 07:50:56 +02:00
remoteproc remoteproc: qcom_q6v5_mss: Handle platforms with one power domain 2025-04-10 14:37:31 +02:00
reset reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC 2025-04-07 10:06:36 +02:00
rpmsg
rtc rtc: pcf85063: do a SW reset if POR failed 2025-05-02 07:50:57 +02:00
s390 s390/tty: Fix a potential memory leak bug 2025-05-02 07:50:53 +02:00
sbus
scsi scsi: pm80xx: Set phy_attached to zero when device is gone 2025-05-02 07:51:01 +02:00
sh
siox
slimbus slimbus: messaging: Free transaction ID in delayed interrupt scenario 2025-03-13 12:58:37 +01:00
soc soc: qcom: ice: introduce devm_of_qcom_ice_get 2025-05-02 07:50:37 +02:00
soundwire soundwire: slave: fix an OF node reference leak in soundwire slave device 2025-04-10 14:37:32 +02:00
spi spi: tegra114: Don't fail set_cs_timing when delays are zero 2025-05-09 09:43:51 +02:00
spmi
ssb
staging staging: rtl8723bs: select CONFIG_CRYPTO_LIB_AES 2025-04-10 14:37:34 +02:00
target scsi: target: spc: Fix RSOC parameter data header size 2025-04-25 10:45:14 +02:00
tc
tee tee: optee: Fix supplicant wait loop 2025-02-27 04:10:51 -08:00
thermal thermal/drivers/rockchip: Add missing rk3328 mapping entry 2025-04-25 10:45:32 +02:00
thunderbolt thunderbolt: Scan retimers after device router has been enumerated 2025-05-02 07:50:55 +02:00
tty serial: sifive: lock port in startup()/shutdown() callbacks 2025-05-02 07:50:49 +02:00
ufs scsi: ufs: exynos: Ensure pre_link() executes before exynos_ufs_phy_init() 2025-05-02 07:51:01 +02:00
uio
usb usb: host: xhci-plat: mvebu: use ->quirks instead of ->init_quirk() func 2025-05-02 07:50:55 +02:00
vdpa vdpa/mlx5: Fix oversized null mkey longer than 32bit 2025-04-25 10:45:27 +02:00
vfio Revert "vfio/platform: check the bounds of read/write syscalls" 2025-02-21 13:57:27 +01:00
vhost vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint 2025-04-10 14:37:32 +02:00
video backlight: led_bl: Hold led_access lock when calling led_sysfs_disable() 2025-04-25 10:45:30 +02:00
virt drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl 2025-03-13 12:58:37 +01:00
virtio
vlynq
w1
watchdog watchdog: rti_wdt: Fix an OF node leak in rti_wdt_probe() 2025-02-08 09:52:25 +01:00
xen xen: Change xen-acpi-processor dom0 dependency 2025-05-02 07:50:58 +02:00
zorro
Kconfig
Makefile