I tested commit 43042b90cae1 ("svcrdma: Reduce Receive doorbell
rate") with mlx4 (IB) and software iWARP and didn't find any
issues. However, I recently got my hardware iWARP setup back on
line (FastLinQ) and it's crashing hard on this commit (confirmed
via bisect).
The failure mode is complex.
- After a connection is established, the first Receive completes
normally.
- But the second and third Receives have garbage in their Receive
buffers. The server responds with ERR_VERS as a result.
- When the client tears down the connection to retry, a couple
of posted Receives flush twice, and that corrupts the recv_ctxt
free list.
- __svc_rdma_free then faults or loops infinitely while destroying
the xprt's recv_ctxts.
Since 43042b90cae1 ("svcrdma: Reduce Receive doorbell rate") does
not fix a bug but is a scalability enhancement, it's safe and
appropriate to revert it while working on a replacement.
Fixes: 43042b90cae1 ("svcrdma: Reduce Receive doorbell rate")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
||
|---|---|---|
| .. | ||
| addr.h | ||
| auth_gss.h | ||
| auth.h | ||
| bc_xprt.h | ||
| cache.h | ||
| clnt.h | ||
| debug.h | ||
| gss_api.h | ||
| gss_asn1.h | ||
| gss_err.h | ||
| gss_krb5_enctypes.h | ||
| gss_krb5.h | ||
| metrics.h | ||
| msg_prot.h | ||
| rpc_pipe_fs.h | ||
| rpc_rdma_cid.h | ||
| rpc_rdma.h | ||
| sched.h | ||
| stats.h | ||
| svc_rdma_pcl.h | ||
| svc_rdma.h | ||
| svc_xprt.h | ||
| svc.h | ||
| svcauth_gss.h | ||
| svcauth.h | ||
| svcsock.h | ||
| timer.h | ||
| types.h | ||
| xdr.h | ||
| xprt.h | ||
| xprtmultipath.h | ||
| xprtrdma.h | ||
| xprtsock.h | ||