Commit Graph

1454 Commits

Author SHA1 Message Date
Jakub Kicinski
854e9bf5c5 mlx5-fixes-2024-09-25
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmb0b3IACgkQSD+KveBX
 +j5nQAf/Z9HoBKQfTtw4Ke8OQCmYEiri20D8p47c6XVXB6g7tZ+d2EphvtGmgN8V
 NifxP5qcovFR7pXX2qaJRzeCmJ9J2hn4mo/cXY0EycNmR02a6kD5QFLvbmCZcFP5
 VpkLAtTu9QyjmI4BkrX5wyMeXOzTX/UxG3aDcsuUf9eyuTYfu/6GEGfTFUMX9vpJ
 P3V2grBQHDU0M+CuigLtpPynBfTHot+/p7zWu+mGRGINFhVYeXkMzqwgeGiFpfya
 hjfGaKs6gFzlEGxMQoU2wigEtZnIYQr7BgrPBxzw78FgkGCx8YShoh5fCKb33OjS
 0j7jwy6gcEAlZwmuuiTPpQ4TOlM27Q==
 =6+q6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2024-09-25

* tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice
  net/mlx5e: SHAMPO, Fix overflow of hd_per_wq
  net/mlx5: HWS, changed E2BIG error to a negative return code
  net/mlx5: HWS, fixed double-free in error flow of creating SQ
  net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc
  net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc()
  net/mlx5: Added cond_resched() to crdump collection
  net/mlx5: Fix error path in multi-packet WQE transmit
====================

Link: https://patch.msgid.link/20240925202013.45374-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:14:53 -07:00
Yevgeny Kliteynik
19da17010a net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc
Fixing the wrong size of a field in hca_cap_2.
The bug was introduced by adding new fields for HWS
and not fixing the reserved field size.

Fixes: 34c626c3004a ("net/mlx5: Added missing mlx5_ifc definition for HW Steering")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-25 13:15:45 -07:00
Linus Torvalds
54d7e8190e RDMA v6.12 merge window
Usual collection of small improvements and fixes:
 
 - Bug fixes and minor improvments in cxgb4, siw, mlx5, rxe, efa, rts, hfi,
   erdma, hns, irdma
 
 - Code cleanups/typos/etc. Tidy alloc_ordered_workqueue() calls
 
 - Multipath PCI for mlx5
 
 - Variable size work queue, SRQ changes, and relaxed ordering for new bnxt HW
 
 - New ODP fault resolution FW protocol in mlx5
 
 - New "rdma monitor" netlink mechanism
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZvGZowAKCRCFwuHvBreF
 YcvbAP9abSxZte3zzG1ZJ/6BShSCGJvu4RMMMQI6wNJWZZiJ5wEA18MdaWzGFS8O
 BzP48Z/0VGsd2MOfNX4JeyYIs7SNYQA=
 =FXLo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual collection of small improvements and fixes, nothing especially
  stands out to me here.

  The new multipath PCI feature is a sign of things to come, I think we
  will see more of this in the next 10 years. Broadcom and HNS continue
  to update their drivers for their new HW generations.

  Summary:

   - Bug fixes and minor improvments in cxgb4, siw, mlx5, rxe, efa, rts,
     hfi, erdma, hns, irdma

   - Code cleanups/typos/etc. Tidy alloc_ordered_workqueue() calls

   - Multipath PCI for mlx5

   - Variable size work queue, SRQ changes, and relaxed ordering for new
     bnxt HW

   - New ODP fault resolution FW protocol in mlx5

   - New 'rdma monitor' netlink mechanism"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
  RDMA/bnxt_re: Remove the unused variable en_dev
  RDMA/nldev: Add missing break in rdma_nl_notify_err_msg()
  RDMA/irdma: fix error message in irdma_modify_qp_roce()
  RDMA/cxgb4: Added NULL check for lookup_atid
  RDMA/hns: Fix ah error counter in sw stat not increasing
  RDMA/bnxt_re: Recover the device when FW error is detected
  RDMA/bnxt_re: Group all operations under add_device and remove_device
  RDMA/bnxt_re: Use the aux device for L2 ULP callbacks
  RDMA/bnxt_re: Change aux driver data to en_info to hold more information
  RDMA/nldev: Expose whether RDMA monitoring is supported
  RDMA/nldev: Add support for RDMA monitoring
  RDMA/mlx5: Use IB set_netdev and get_netdev functions
  RDMA/device: Remove optimization in ib_device_get_netdev()
  RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creation
  RDMA/mlx5: Obtain upper net device only when needed
  RDMA/mlx5: Check RoCE LAG status before getting netdev
  RDMA/mlx5: Consider the query_vuid cap for data_direct
  net/mlx5: Handle memory scheme ODP capabilities
  RDMA/mlx5: Add implicit MR handling to ODP memory scheme
  RDMA/mlx5: Add handling for memory scheme page fault events
  ...
2024-09-24 11:48:00 -07:00
Chiara Meiohas
8d159eb211 RDMA/mlx5: Use IB set_netdev and get_netdev functions
The IB layer provides a common interface to store and get net
devices associated to an IB device port (ib_device_set_netdev()
and ib_device_get_netdev()).
Previously, mlx5_ib stored and managed the associated net devices
internally.

Replace internal net device management in mlx5_ib with
ib_device_set_netdev() when attaching/detaching  a net device and
ib_device_get_netdev() when retrieving the net device.

Export ib_device_get_netdev().

For mlx5 representors/PFs/VFs and lag creation we replace the netdev
assignments with the IB set/get netdev functions.

In active-backup mode lag the active slave net device is stored in the
lag itself. To assure the net device stored in a lag bond IB device is
the active slave we implement the following:
- mlx5_core: when modifying the slave of a bond we send the internal driver event
  MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE.
- mlx5_ib: when catching the event call ib_device_set_netdev()

This patch also ensures the correct IB events are sent in switchdev lag.

While at it, when in multiport eswitch mode, only a single IB device is
created for all ports. The said IB device will receive all netdev events
of its VFs once loaded, thus to avoid overwriting the mapping of PF IB
device to PF netdev, ignore NETDEV_REGISTER events if the ib device has
already been mapped to a netdev.

Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13 08:27:40 +03:00
Michael Guralnik
907936b6f4 net/mlx5: Handle memory scheme ODP capabilities
When running over new FW that supports the new memory scheme ODP, set
the cap in the FW to signal the FW we are working in the new scheme.

In the memory scheme ODP the per_transport_service capabilities are RO
for the driver so we skip their setting.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-9-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13 08:27:30 +03:00
Shay Drory
5bd877093f net/mlx5: Add NOT_READY command return status
Add a new command status MLX5_CMD_STAT_NOT_READY to handle cases
where the firmware is not ready.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20240911201757.1505453-14-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 20:50:29 -07:00
Moshe Shemesh
9947204cda net/mlx5: Add device cap for supporting hot reset in sync reset flow
New devices with new FW can support sync reset for firmware activate
using hot reset. Add capability for supporting it and add MFRL field to
query from FW which type of PCI reset method to use while handling sync
reset events.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240911201757.1505453-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 20:50:29 -07:00
Moshe Shemesh
da2f660b3b net/mlx5: fs, make get_root_namespace API function
As preparation for HW Steering support, where the function
get_root_namespace() is needed to get root FDB, make it an API function
and rename it to mlx5_get_root_namespace().

Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20240911201757.1505453-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 20:50:28 -07:00
Jakub Kicinski
46ae4d0a48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts (sort of) and no adjacent changes.

This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12 17:11:24 -07:00
Michael Guralnik
64c68385a3 RDMA/mlx5: Add new ODP memory scheme eqe format
Add new fields to support the new memory scheme page fault and extend
the token field to u64 as in the new scheme the token is 48 bit.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-4-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11 14:56:15 +03:00
Michael Guralnik
6cd9171d04 net/mlx5: Expose HW bits for Memory scheme ODP
Expose IFC bits to support the new memory scheme on demand paging.
Change the macro reading odp capabilities to be able to read from the
new IFC layout and align the code in upper layers to be compiled.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-3-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11 14:56:12 +03:00
Michael Guralnik
cef7dde883 net/mlx5: Expand mkey page size to support 6 bits
Protect the usage of the 6th bit with the relevant capability to ensure
we are using the new page sizes with FW that supports the bit extension.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11 14:56:07 +03:00
Carolina Jubran
452ef7f860 net/mlx5: Add missing masks and QoS bit masks for scheduling elements
Add the missing masks for supported element types and Transmit
Scheduling Arbiter (TSAR) types in scheduling elements.

Also, add the corresponding bit masks for these types in the QoS
capabilities of a NIC scheduler.

Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 12:39:57 -07:00
Yevgeny Kliteynik
00b9f0daef net/mlx5: Added missing definitions in preparation for HW Steering
As part of preparation for HWS, added missing definitions
in qp.h and fs_core.h:

 - FS_FT_FDB_RX/TX table types that are used by HWS in addition
   to an existing FS_FT_FDB
 - MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE that is used by HWS to
   require fence in WQE

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
Yevgeny Kliteynik
34c626c300 net/mlx5: Added missing mlx5_ifc definition for HW Steering
Add mlx5_ifc definitions that are required for HWS support.

Note that due to change in the mlx5_ifc_flow_table_context_bits
structure that now includes both SWS and HWS bits in a union,
this patch also includes small change in one of SWS files that
was required for compilation.

Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-09-09 11:10:04 -07:00
Yishai Hadas
c772a2c690 net/mlx5: Add IFC related stuff for data direct
Add IFC related stuff for data direct.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/82da7f578a567909bb5858a64ba844fe4cc298fa.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-08-08 10:53:48 +03:00
Rahul Rameshbabu
7e45c1e9ed net/mlx5: Add support for MTPTM and MTCTR registers
Make Management Precision Time Measurement (MTPTM) register and Management
Cross Timestamp (MTCTR) register usable in mlx5 driver.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20240730134055.1835261-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-05 16:44:44 -07:00
Linus Torvalds
f4f92db439 virtio: features, fixes, cleanups
Several new features here:
 
 - Virtio find vqs API has been reworked
   (required to fix the scalability issue we have with
    adminq, which I hope to merge later in the cycle)
 
 - vDPA driver for Marvell OCTEON
 
 - virtio fs performance improvement
 
 - mlx5 migration speedups
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ
 tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW
 aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf
 tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH
 atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n
 WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ=
 =LYua
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - Virtio find vqs API has been reworked (required to fix the
     scalability issue we have with adminq, which I hope to merge later
     in the cycle)

   - vDPA driver for Marvell OCTEON

   - virtio fs performance improvement

   - mlx5 migration speedups

  Fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits)
  virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
  virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
  virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
  virtio_balloon: convert to use virtio_find_vqs_info()
  virtiofs: convert to use virtio_find_vqs_info()
  scsi: virtio_scsi: convert to use virtio_find_vqs_info()
  virtio_net: convert to use virtio_find_vqs_info()
  virtio_crypto: convert to use virtio_find_vqs_info()
  virtio_console: convert to use virtio_find_vqs_info()
  virtio_blk: convert to use virtio_find_vqs_info()
  virtio: rename find_vqs_info() op to find_vqs()
  virtio: remove the original find_vqs() op
  virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
  virtio: convert find_vqs() op implementations to find_vqs_info()
  virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
  virtio: introduce virtio_queue_info struct and find_vqs_info() config op
  virtio: make virtio_find_single_vq() call virtio_find_vqs()
  virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
  caif_virtio: use virtio_find_single_vq() for single virtqueue finding
  vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
  ...
2024-07-19 11:57:55 -07:00
Linus Torvalds
3d51520954 RDMA v6.11 merge window
Usual collection of small improvements and fixes:
 
 - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe, hf1,
   qib, ocrdma
 
 - bnxt_re support for MSN, which is a new retransmit logic
 
 - Initial mana support for RC qps
 
 - Use after free bug and cleanups in iwcm
 
 - Reduce resource usage in mlx5 when RDMA verbs features are not used
 
 - New verb to drain shared recieve queues, similar to normal recieve
   queues. This is necessary to allow ULPs a clean shutdown. Used in the
   iscsi rdma target
 
 - mlx5 support for more than 16 bits of doorbell indexes
 
 - Doorbell moderation support for bnxt_re
 
 - IB multi-plane support for mlx5
 
 - New EFA adaptor PCI IDs
 
 - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't rename
   the device
 
 - A collection of hns bugs
 
 - Fix long standing bug in bnxt_re with incorrect endian handling of
   immediate data
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZpfvKQAKCRCFwuHvBreF
 YXomAP46gZpGv5mlMOAXePRuKq6glNZWl3pVuwuycnlmjQcEUQD/dhQbJz0rZKBr
 swuibPo83bFacfXJL7Wxd48m4G3EfgI=
 =1eXu
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual collection of small improvements and fixes:

   - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe,
     hf1, qib, ocrdma

   - bnxt_re support for MSN, which is a new retransmit logic

   - Initial mana support for RC qps

   - Use after free bug and cleanups in iwcm

   - Reduce resource usage in mlx5 when RDMA verbs features are not used

   - New verb to drain shared recieve queues, similar to normal recieve
     queues. This is necessary to allow ULPs a clean shutdown. Used in
     the iscsi rdma target

   - mlx5 support for more than 16 bits of doorbell indexes

   - Doorbell moderation support for bnxt_re

   - IB multi-plane support for mlx5

   - New EFA adaptor PCI IDs

   - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't
     rename the device

   - A collection of hns bugs

   - Fix long standing bug in bnxt_re with incorrect endian handling of
     immediate data"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
  IB/hfi1: Constify struct flag_table
  RDMA/mana_ib: Set correct device into ib
  bnxt_re: Fix imm_data endianness
  RDMA: Fix netdev tracker in ib_device_set_netdev
  RDMA/hns: Fix mbx timing out before CMD execution is completed
  RDMA/hns: Fix insufficient extend DB for VFs.
  RDMA/hns: Fix undifined behavior caused by invalid max_sge
  RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
  RDMA/hns: Fix missing pagesize and alignment check in FRMR
  RDMA/hns: Fix unmatch exception handling when init eq table fails
  RDMA/hns: Fix soft lockup under heavy CEQE load
  RDMA/hns: Check atomic wr length
  RDMA/ocrdma: Don't inline statistics functions
  RDMA/core: Introduce "name_assign_type" for an IB device
  RDMA/qib: Fix truncation compilation warnings in qib_verbs.c
  RDMA/qib: Fix truncation compilation warnings in qib_init.c
  RDMA/efa: Add EFA 0xefa3 PCI ID
  RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
  net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
  RDMA/mlx5: Add plane index support when querying PTYS registers
  ...
2024-07-19 09:51:33 -07:00
Jakub Kicinski
dd3cd3ca69 aux-sysfs-irqs
Shay Says:
 ==========
 Introduce auxiliary bus IRQs sysfs
 
 Today, PCI PFs and VFs, which are anchored on the PCI bus, display their
 IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files.  PCI
 subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored
 on the auxiliary bus. However, these PCI SFs lack such IRQ information
 on the auxiliary bus, leaving users without visibility into which IRQs
 are used by the SFs. This absence makes it impossible to debug
 situations and to understand the source of interrupts/SFs for
 performance tuning and debug.
 
 Additionally, the SFs are multifunctional devices supporting RDMA,
 network devices, clocks, and more, similar to their peer PCI PFs and
 VFs. Therefore, it is desirable to have SFs' IRQ information available
 at the bus/device level.
 
 To overcome the above limitations, this short series extends the
 auxiliary bus to display IRQ information in sysfs, similar to that of
 PFs and VFs.
 
 It adds an 'irqs' directory under the auxiliary device and includes an
 <irq_num> sysfs file within it.
 
 For example:
 $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
 50  51  52  53  54  55  56  57  58
 
 Patch summary:
 patch-1 adds auxiliary bus to support irqs used by auxiliary device
 patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary
         bus
 
 ==========
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmaQTCYACgkQSD+KveBX
 +j7nRAgAhyi8mD93AjpoXX8onbK3ZyPnGwLToCs0NT3EzT0BIwNvDovQp4rhcs16
 3zVwvW+twVsbMuPYTpPVgcynpL6N0K/CoW+ubDGZaRIaf0nDmh4MY1wY/EUsVj8R
 FbeTi5L+9MyKvFbtO5d4cW1q7M0XVD3uR8Wle6PwvXZ1gcM59vsR1eml25NLTC8B
 Z9F9WKG+dFAni0ll/IL837Se3QQapRXtJQ3g6XbIcpXiMqgIrHZ9FyY0LvuWlQq4
 LsIPKh7RySATmAYXwwpsnfdrilvvMHsyjlAoeNHEJBsAUY+kpOIFFi6J5EB+/oyo
 jhBhlc4Al0vUXis9jGTysO7mVYVOUQ==
 =DTxS
 -----END PGP SIGNATURE-----

Merge tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
aux-sysfs-irqs

Shay Says:
==========
Introduce auxiliary bus IRQs sysfs

Today, PCI PFs and VFs, which are anchored on the PCI bus, display their
IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files.  PCI
subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored
on the auxiliary bus. However, these PCI SFs lack such IRQ information
on the auxiliary bus, leaving users without visibility into which IRQs
are used by the SFs. This absence makes it impossible to debug
situations and to understand the source of interrupts/SFs for
performance tuning and debug.

Additionally, the SFs are multifunctional devices supporting RDMA,
network devices, clocks, and more, similar to their peer PCI PFs and
VFs. Therefore, it is desirable to have SFs' IRQ information available
at the bus/device level.

To overcome the above limitations, this short series extends the
auxiliary bus to display IRQ information in sysfs, similar to that of
PFs and VFs.

It adds an 'irqs' directory under the auxiliary device and includes an
<irq_num> sysfs file within it.

For example:
$ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
50  51  52  53  54  55  56  57  58

Patch summary:
patch-1 adds auxiliary bus to support irqs used by auxiliary device
patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary
        bus
==========

* tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Expose SFs IRQs
  driver core: auxiliary bus: show auxiliary device IRQs
  RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded
  net/mlx5: Reimplement write combining test
====================

Link: https://patch.msgid.link/20240711213140.256997-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15 12:42:45 -07:00
Daniel Jurgens
63c6e08eac net/mlx5: IFC updates for SF max IO EQs
Expose a new cap sf_eq_usage. The vhca_resource_manager can write this
cap, indicating the SF driver should use max_num_eqs_24b to determine
how many EQs to use.

Will be used in the next patch, to indicate to the SF driver from the PF
that the user has set the max io eqs via devlink. So the SF driver can
later query the proper max eq value from the new cap.

devlink port function set pci/0000:08:00.0/32768 max_io_eqs 32

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240712003310.355106-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13 15:44:16 -07:00
Dragos Tatulea
cdc3c7eaae vdpa/mlx5: Add support for modifying the VQ features field
This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240626-stage-vdpa-vq-precreate-v2-11-560c491078df@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09 08:42:46 -04:00
Dragos Tatulea
f70080c5bc vdpa/mlx5: Add support for modifying the virtio_version VQ field
This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Message-Id: <20240626-stage-vdpa-vq-precreate-v2-10-560c491078df@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09 08:42:45 -04:00
Jakub Kicinski
76ed626479 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/phy/aquantia/aquantia.h
  219343755eae ("net: phy: aquantia: add missing include guards")
  61578f679378 ("net: phy: aquantia: add support for PHY LEDs")

drivers/net/ethernet/wangxun/libwx/wx_hw.c
  bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx")
  b501d261a5b3 ("net: txgbe: add FDIR ATR support")
https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/

include/linux/mlx5/mlx5_ifc.h
  048a403648fc ("net/mlx5: IFC updates for changing max EQs")
  99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO")
https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/

Adjacent changes:

drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
  4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference")
  3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard")

include/net/mac80211.h
  816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP")
  5a009b42e041 ("wifi: mac80211: track changes in AP's TPE")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04 14:16:11 -07:00
Mark Zhang
7a2210a57d RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
Supports per-plane port counters by querying PPCNT register with the
"extended port counters" group, as the query_vport_counter command
doesn't support plane ports.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/06ffb582d67159b7def4654c8272d3d6e8bd2f2f.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:38:05 +03:00
Mark Zhang
c6b6677d85 net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
This patch adds new fields to support multi-plane and the extend port
counters group. Actual support will be added in the next patch.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/70221cdd79aad0e21cbf385d9567e3ebffbc5137.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:38:05 +03:00
Mark Zhang
3b43399b29 RDMA/mlx5: Add plane index support when querying PTYS registers
Support the new "plane_ind" field when querying port PTYS registers.
This is needed when querying the rate of a plane port.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/1f703c36306aa46917fcd88eadbb23b3e380d526.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:38:05 +03:00
Mark Zhang
2a5db20fa5 RDMA/mlx5: Add support to multi-plane device and port
When multi-plane is supported, a logical port, which is aggregation of
multiple physical plane ports, is exposed for data transmission.
Compared with a normal mlx5 IB port, this logical port supports all
functionalities except Subnet Management.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:10:15 +03:00
Mark Zhang
65528cfb21 net/mlx5: mlx5_ifc update for multi-plane support
Add new fields to support mlx5 multi-plane feature. Actual support will
be added in following patches.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/36a74a1b1d2b7b59c99cda4abad1794ddde30230.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:10:15 +03:00
Daniel Jurgens
048a403648 net/mlx5: IFC updates for changing max EQs
Expose new capability to support changing the number of EQs available
to other functions.

Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28 12:58:10 +01:00
Or Har-Toov
0c5275bf75 RDMA/mlx5: Use sq timestamp as QP timestamp when RoCE is disabled
When creating a QP, one of the attributes is TS format (timestamp).
In some devices, we have a limitation that all QPs should have the same
ts_format. The ts_format is chosen based on the device's capability.
The qp_ts_format cap resides under the RoCE caps table, and the
cap will be 0 when RoCE is disabled. So when RoCE is disabled, the
value that should be queried is sq_ts_format under HCA caps.

Consider the case when the system supports REAL_TIME_TS format (0x2),
some QPs are created with REAL_TIME_TS as ts_format, and afterwards
RoCE gets disabled. When trying to construct a new QP, we can't use
the qp_ts_format, that is queried from the RoCE caps table, Since it
leads to passing 0x0 (FREE_RUNNING_TS) as the value of the qp_ts_format,
which is different than the ts_format of the previously allocated
QPs REAL_TIME_TS format (0x2).

Thus, to resolve this, read the sq_ts_format, which also reflect
the supported ts format for the QP when RoCE is disabled.

Fixes: 4806f1e2fee8 ("net/mlx5: Set QP timestamp mode to default")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Link: https://lore.kernel.org/r/32801966eb767c7fd62b8dea3b63991d5fbfe213.1718554199.git.leon@kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26 10:53:29 -03:00
Patrisious Haddad
b339e0a39d RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded
The req_transport_retries_exceeded counter shows the number of times
requester detected transport retries exceed error.

The req_rnr_retries_exceeded counter show the number of times the
requester detected RNR NAKs retries exceed error.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/250466af94f4989d638fab168e246035530e912f.1718301543.git.leon@kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16 18:53:23 +03:00
Jianbo Liu
d98995b4bf net/mlx5: Reimplement write combining test
The test of write combining was added before in mlx5_ib driver. It
opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When
BlueFlame is used, WQEs get written directly to a PCI BAR of the
device (in addition to memory) so that the device handles them without
having to access memory.

In this test, the WQEs written in memory are different from the ones
written to the BlueFlame which request CQE update. By checking the
completion reports posted on CQ, we can know if BlueFlame succeeds or
not. The write combining must be supported if BlueFlame succeeds as
its register is written using write combining.

This patch reimplements the test in the same way, but using a pair of
SQ and CQ only. It is moved to mlx5_core as a general feature used by
both mlx5_core and mlx5_ib.

Besides, save write combine test result of the PCI function, so that
its thousands of child functions such as SF can query without paying
the time and resource penalty by itself. The test function is called
only after failing to get the cached result. With this enhancement,
all thousands of SFs of the PF attached to same driver no longer need
to perform WC check explicitly, which is already done in the system.
This saves several commands per SF, thereby speeds up SF creation and
also saves completion EQ creation.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16 18:33:59 +03:00
Rahul Rameshbabu
296eaab825 net/mlx5e: Support SWP-mode offload L4 csum calculation
Calculate the pseudo-header checksum for both IPSec transport mode and
IPSec tunnel mode for mlx5 devices that do not implement a pure hardware
checksum offload for L4 checksum calculation. Introduce a capability bit
that identifies such mlx5 devices.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:53:24 -07:00
Cosmin Ratiu
e575d3a6dd net/mlx5: Correct TASR typo into TSAR
TSAR is the correct spelling (Transmit Scheduling ARbiter).

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:53:23 -07:00
Yoray Zack
99be56171f net/mlx5e: SHAMPO, Re-enable HW-GRO
Add back HW-GRO to the reported features.

As the current implementation of HW-GRO uses KSMs with a
specific fixed buffer size (256B) to map its headers buffer,
we reported the feature only if the NIC is supporting KSM and
the minimum value for buffer size is below the requested one.

iperf3 bandwidth comparison:
+---------+--------+--------+-----------+
| streams | SW GRO | HW GRO | Unit      |
|---------+--------+--------+-----------|
| 1       | 36     | 42     | Gbits/sec |
| 4       | 34     | 39     | Gbits/sec |
| 8       | 31     | 35     | Gbits/sec |
+---------+--------+--------+-----------+

A downstream patch will add skb fragment coalescing which will improve
performance considerably.

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 20:20:46 -07:00
Yoray Zack
758191c9ea net/mlx5e: SHAMPO, Use KSMs instead of KLMs
KSM Mkey is KLM Mkey with a fixed buffer size. Due to this fact,
it is a faster mechanism than KLM.

SHAMPO feature used KLMs Mkeys for memory mappings of its headers buffer.
As it used KLMs with the same buffer size for each entry,
we can use KSMs instead.

This commit changes the Mkeys that map the SHAMPO headers buffer
from KLMs to KSMs.

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 20:20:46 -07:00
Gal Pressman
1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c700 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Jakub Kicinski
654de42f3f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.10 net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-14 10:53:19 -07:00
Parav Pandit
db5944e16c net/mlx5: Remove unused msix related exported APIs
MSIX irq allocation and free APIs are no longer
in use. Hence, remove the dead code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20240512124306.740898-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 16:35:47 -07:00
Shay Drory
e0e6adfe8c net/mlx5: Enable 8 ports LAG
This patch adds to mlx5 drivers support for 8 ports HCAs.
Starting with ConnectX-8 HCAs with 8 ports are possible.

As most driver parts aren't affected by such configuration most driver
code is unchanged.

Specially the only affected areas are:
- Lag
- Multiport E-Switch
- Single FDB E-Switch

All of the above are already factored in generic way, and LAG and VF LAG
are tested, so all that left is to change a #define and remove checks
which are no longer needed.
However, Multiport E-Switch is not tested yet, so it is left untouched.

This patch will allow to create hardware LAG/VF LAG when all 8 ports are
added to the same bond device.

for example, In order to activate the hardware lag a user can execute
the following:

ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0
ip link set eth6 master bond0
ip link set eth7 master bond0
ip link set eth8 master bond0
ip link set eth9 master bond0

Where eth2, eth3, eth4, eth5, eth6, eth7, eth8 and eth9 are the PFs of
the same HCA.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512124306.740898-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 16:35:47 -07:00
Akiva Goldberger
485d65e135 net/mlx5: Add a timeout to acquire the command queue semaphore
Prevent forced completion handling on an entry that has not yet been
assigned an index, causing an out of bounds access on idx = -22.
Instead of waiting indefinitely for the sem, blocking flow now waits for
index to be allocated or a sem acquisition timeout before beginning the
timer for FW completion.

Kernel log example:
mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion

Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:38:33 -07:00
Rahul Rameshbabu
445a25f6e1 net/mlx5e: Support updating coalescing configuration without resetting channels
When CQE mode or DIM state is changed, gracefully reconfigure channels to
handle new configuration. Previously, would create new channels that would
reflect the changes rather than update the original channels.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu
eca1e8a628 net/mlx5e: Use DIM constants for CQ period mode parameter
Use core DIM CQ period mode enum values for the CQ parameter for the period
mode. Translate the value to the specific mlx5 device constant for the
selected period mode when creating a CQ. Avoid needing to translate mlx5
device constants to DIM constants for core DIM functionality.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Cosmin Ratiu
4aafb8ab2a net/mlx5e: Support FEC settings for 100G/lane modes
This consists of:
1. Expose the 100G/lane capability bit in the PCAM reg.
2. Expose the per link mode FEC capability masks in the PPLM reg.
3. Set the overrides according to ethtool parameters.
FEC for new modes is set if and only if the PCAM 100G/lane capability is
advertised and the capability mask for a given link mode reports that it
can accept the requested FEC mode.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05 21:54:40 -07:00
Jianbo Liu
c788d79cfa net/mlx5: Skip pages EQ creation for non-page supplier function
Page events are not issued by device on the function if
page_request_disable is set, so no need to create pages EQ.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Jianbo Liu
137f3d50ad net/mlx5: Support matching on l4_type for ttc_table
Replace matching on TCP and UDP protocols with new l4_type field which
is parsed by steering for ttc_table. It is enabled by the
outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table
capabilities and used only if pcc_ifa2 bit in HCA capabilities is set.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Gal Pressman
30f8d23814 net/mlx5: Convert uintX_t to uX
In the kernel, the preferred types are uX.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240402133043.56322-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:58 -07:00
Linus Torvalds
4138f02288 VFIO updates for v6.9-rc1
- Add warning in unlikely case that device is not captured with
    driver_override. (Kunwu Chan)
 
  - Error handling improvements in mlx5-vfio-pci to detect firmware
    tracking object error states, logging of firmware error syndrom,
    and releasing of firmware resources in aborted migration sequence.
    (Yishai Hadas)
 
  - Correct an un-alphabetized VFIO MAINTAINERS entry.
    (Alex Williamson)
 
  - Make the mdev_bus_type const and also make the class struct const
    for a couple of the vfio-mdev sample drivers. (Ricardo B. Marliere)
 
  - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
    Grace-Hopper superchip.  During initialization of the chip-to-chip
    interconnect in this hardware module, the PCI BARs of the device
    become unused in favor of a faster, coherent mechanism for exposing
    device memory.  This driver primarily changes the VFIO
    representation of the device to masquerade this coherent aperture
    to replace the physical PCI BARs for userspace drivers.  This also
    incorporates use of a new vma flag allowing KVM to use write
    combining attributes for uncached device memory.  (Ankit Agrawal)
 
  - Reset fixes and cleanups for the pds-vfio-pci driver.  Save and
    restore files were previously leaked if the device didn't pass
    through an error state, this is resolved and later re-fixed to
    prevent access to the now freed files.  Reset handling is also
    refactored to remove the complicated deferred reset mechanism.
    (Brett Creeley)
 
  - Remove some references to pl330 in the vfio-platform amba
    driver. (Geert Uytterhoeven)
 
  - Remove twice redundant and ugly code to unpin incidental pins
    of the zero-page. (Alex Williamson)
 
  - Deferred reset logic is also removed from the hisi-acc-vfio-pci
    driver as a simplification. (Shameer Kolothum)
 
  - Enforce that mlx5-vfio-pci devices must support PRE_COPY and
    remove resulting unnecessary code.  There is no device firmware
    that has been available publicly without this support.
    (Yishai Hadas)
 
  - Switch over to using the .remove_new callback for vfio-platform
    in support of the broader transition for a void remove function.
    (Uwe Kleine-König)
 
  - Resolve multiple issues in interrupt code for VFIO bus drivers
    that allow calling eventfd_signal() on a NULL context.  This
    also remove a potential race in INTx setup on certain hardware
    for vfio-pci, races with various mechanisms to mask INTx, and
    leaked virqfds in vfio-platform. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmXzesgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiA4oQAKU3Z6h8oQXaMsc2nKip
 NnOtrrKw2jIohEGw01uRUf8q9uhLeLE0bidrDETion812/Lyv7M/aDlLIK4nvDvt
 AAFwL1iAKbVYTomIIWQckCwki5gBp3I+1vAQekJn4qXe7B9GohNz9cl9fNLVpcNd
 X3rWUVB5LVOvSzI+o6Ueqau+XFOMxpndr9VX4zbknIa0Th49EoYGYWPAYjzN4YyV
 GVSIWJHbtpAAHsL46jc7HmCeAtsVVkW/qHPInerSPCxabiQ+i0LSnlM16j6xXjK1
 9SvJi7+FCRGTvF3Ql2sWTK65glEbQ0xBzwSIs0L3AuKHsRISGbCHP1wymriJr5K7
 +asIM18HNLfmH/BAksbrd2M5gys8/xO9+7xIzTaYlZyTNM99Zu7d/u0B3AjYemG/
 Me3N86E2cl9Xc3NV6UEX8L1/pPpg6jKiOcZ6V9pGycUMyOTJS36FJT8Czr/jemtA
 /y6HOBpjE1gMACkk63P8GQaLMnQs7glSAEg2e++MvUVIW5END7usyLrSDr87Ysoa
 O0deH5FNSW6QAbGV+PRAmaacPvGF5B8BppYm/lnxHmPf0+saVEYOSbdFoSpK4l5H
 8f79fCtzpY8in5EDIOGJvu4u5ZWWoS/5dFLr0Teyj4vyK//PLmBv0gWoGRJV6aqj
 BrJ1f9NQ0+lVTIj/jTQ5uzlC
 =m5jA
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Add warning in unlikely case that device is not captured with
   driver_override (Kunwu Chan)

 - Error handling improvements in mlx5-vfio-pci to detect firmware
   tracking object error states, logging of firmware error syndrom, and
   releasing of firmware resources in aborted migration sequence (Yishai
   Hadas)

 - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson)

 - Make the mdev_bus_type const and also make the class struct const for
   a couple of the vfio-mdev sample drivers (Ricardo B. Marliere)

 - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
   Grace-Hopper superchip. During initialization of the chip-to-chip
   interconnect in this hardware module, the PCI BARs of the device
   become unused in favor of a faster, coherent mechanism for exposing
   device memory. This driver primarily changes the VFIO representation
   of the device to masquerade this coherent aperture to replace the
   physical PCI BARs for userspace drivers. This also incorporates use
   of a new vma flag allowing KVM to use write combining attributes for
   uncached device memory (Ankit Agrawal)

 - Reset fixes and cleanups for the pds-vfio-pci driver. Save and
   restore files were previously leaked if the device didn't pass
   through an error state, this is resolved and later re-fixed to
   prevent access to the now freed files. Reset handling is also
   refactored to remove the complicated deferred reset mechanism (Brett
   Creeley)

 - Remove some references to pl330 in the vfio-platform amba driver
   (Geert Uytterhoeven)

 - Remove twice redundant and ugly code to unpin incidental pins of the
   zero-page (Alex Williamson)

 - Deferred reset logic is also removed from the hisi-acc-vfio-pci
   driver as a simplification (Shameer Kolothum)

 - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove
   resulting unnecessary code. There is no device firmware that has been
   available publicly without this support (Yishai Hadas)

 - Switch over to using the .remove_new callback for vfio-platform in
   support of the broader transition for a void remove function (Uwe
   Kleine-König)

 - Resolve multiple issues in interrupt code for VFIO bus drivers that
   allow calling eventfd_signal() on a NULL context. This also remove a
   potential race in INTx setup on certain hardware for vfio-pci, races
   with various mechanisms to mask INTx, and leaked virqfds in
   vfio-platform (Alex Williamson)

* tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits)
  vfio/fsl-mc: Block calling interrupt handler without trigger
  vfio/platform: Create persistent IRQ handlers
  vfio/platform: Disable virqfds on cleanup
  vfio/pci: Create persistent INTx handler
  vfio: Introduce interface to flush virqfd inject workqueue
  vfio/pci: Lock external INTx masking ops
  vfio/pci: Disable auto-enable of exclusive INTx IRQ
  vfio/pds: Refactor/simplify reset logic
  vfio/pds: Make sure migration file isn't accessed after reset
  vfio/platform: Convert to platform remove callback returning void
  vfio/mlx5: Enforce PRE_COPY support
  vfio/mbochs: make mbochs_class constant
  vfio/mdpy: make mdpy_class constant
  hisi_acc_vfio_pci: Remove the deferred_reset logic
  Revert "vfio/type1: Unpin zero pages"
  vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached
  vfio: amba: Rename pl330_ids[] to vfio_amba_ids[]
  vfio/pds: Always clear the save/restore FDs on reset
  vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
  vfio/pci: rename and export range_intersect_range
  ...
2024-03-15 13:21:13 -07:00
Jakub Kicinski
d7e14e5344 Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
 same port under one netdev instance. Passing traffic through different
 devices belonging to different NUMA sockets saves cross-numa traffic and
 allows apps running on the same netdev from different numas to still
 feel a sense of proximity to the device and achieve improved
 performance.
 
 We achieve this by grouping PFs together, and creating the netdev only
 once all group members are probed. Symmetrically, we destroy the netdev
 once any of the PFs is removed.
 
 The channels are distributed between all devices, a proper configuration
 would utilize the correct close numa when working on a certain app/cpu.
 
 We pick one device to be a primary (leader), and it fills a special
 role.  The other devices (secondaries) are disconnected from the network
 in the chip level (set to silent mode). All RX/TX traffic is steered
 through the primary to/from the secondaries.
 
 Currently, we limit the support to PFs only, and up to two devices
 (sockets).
 
 V6:
 - Address documentation comments from Jakub.
 
 V5:
  - Address documentation comments from Przemek Kitszel.
 
 V4:
  - Improve documentation for better user observability and understanding
    of the feature, in terms of queues and their expected NUMA/CPU/IRQ
    affinity.
 
 V3:
  - Fix documentation per Jakubs feedback.
  - Fix typos
  - Link new documentation in the networking index.rst
 
 V2:
  - Add documentation in a new patch.
  - Add debugfs in a new patch.
  - Add mlx5_ifc bit for MPIR cap check and use it before query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXpfYgACgkQSD+KveBX
 +j5jIAf/VGIX/UQttq74MzK9pWgJNKtf7l8aSYtZuKXx68pmpr+25DfsxbKEeVfy
 KzjvGFx5peoKisWILyaljQXSn7snmSqOsQf/IwDzmsmF/2ZTDyf6NPC6gND0bIjJ
 Uu6cJ2T6Sa9ktg+ANz/gLDvGBBfPqSYTYIXrJnNQKsnW6nV8mDvy4WVf6etvCxOi
 rMjfcqwNijf3GPTJd/qkaWhwneDG2AFWd5HzdORpNh6iuv8Cbc9aNhWgAPh18o7v
 VWuAiFraTgaz6jj2H/NfziAk4ZrtVsCqhaFjJe3eLO+MCk/bZ/SizsAcR61JLkjL
 pFqh5wqxA6v+5YJm4zVatZqPLIt4gQ==
 =GZBa
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Multi-PF netdev (Socket Direct)

This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.

We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.

The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.

We pick one device to be a primary (leader), and it fills a special
role.  The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.

Currently, we limit the support to PFs only, and up to two devices
(sockets).

* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  Documentation: networking: Add description for multi-pf netdev
  net/mlx5: Enable SD feature
  net/mlx5e: Block TLS device offload on combined SD netdev
  net/mlx5e: Support per-mdev queue counter
  net/mlx5e: Support cross-vhca RSS
  net/mlx5e: Let channels be SD-aware
  net/mlx5e: Create EN core HW resources for all secondary devices
  net/mlx5e: Create single netdev per SD group
  net/mlx5: SD, Add debugfs
  net/mlx5: SD, Add informative prints in kernel log
  net/mlx5: SD, Implement steering for primary and secondaries
  net/mlx5: SD, Implement devcom communication and primary election
  net/mlx5: SD, Implement basic query and instantiation
  net/mlx5: SD, Introduce SD lib
  net/mlx5: Add MPIR bit in mcam_access_reg
====================

Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 20:45:17 -08:00