[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211228090325.27263-1-dust.li@linux.alibaba.com>
Date: Tue, 28 Dec 2021 17:03:23 +0800
From: Dust Li <dust.li@...ux.alibaba.com>
To: Karsten Graul <kgraul@...ux.ibm.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
Wen Gu <guwen@...ux.alibaba.com>,
Tony Lu <tonylu@...ux.alibaba.com>
Subject: [PATCH net 0/2] net/smc: fix kernel panic caused by race of smc_sock
This patchset fixes the race between smc_release triggered by
close(2) and cdc_handle triggered by underlaying RDMA device.
The race is caused because the smc_connection may been released
before the pending tx CDC messages got its CQEs. In order to fix
this, I add a counter to track how many pending WRs we have posted
through the smc_connection, and only release the smc_connection
after there is no pending WRs on the connection.
The first patch prevents posting WR on a QP that is not in RTS
state. This patch is needed because if we post WR on a QP that
is not in RTS state, ib_post_send() may success but no CQE will
return, and that will confuse the counter tracking the pending
WRs.
The second patch add a counter to track how many WRs were posted
through the smc_connection, and don't reset the QP on link destroying
to prevent leak of the counter.
Dust Li (2):
net/smc: don't send CDC/LLC message if link not ready
net/smc: fix kernel panic caused by race of smc_sock
net/smc/smc.h | 5 +++++
net/smc/smc_cdc.c | 52 +++++++++++++++++++++-------------------------
net/smc/smc_cdc.h | 2 +-
net/smc/smc_core.c | 27 ++++++++++++++++++------
net/smc/smc_core.h | 6 ++++++
net/smc/smc_ib.c | 4 ++--
net/smc/smc_ib.h | 1 +
net/smc/smc_llc.c | 2 +-
net/smc/smc_wr.c | 45 +++++----------------------------------
net/smc/smc_wr.h | 5 ++---
10 files changed, 68 insertions(+), 81 deletions(-)
--
2.19.1.3.ge56e4f7
Powered by blists - more mailing lists