[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cc0678b3-304c-0841-db15-ffb2117b63f6@linux.alibaba.com>
Date: Wed, 2 Mar 2022 19:44:15 +0800
From: "D. Wythe" <alibuda@...ux.alibaba.com>
To: kgraul@...ux.ibm.com
Cc: kuba@...nel.org, davem@...emloft.net, netdev@...r.kernel.org,
linux-s390@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
在 2022/3/1 下午9:17, D. Wythe 写道:
> From: "D. Wythe" <alibuda@...ux.alibaba.com>
>
> Remove connections from link group is not synchronous with handling
> SMC_LLC_DELETE_RKEY, which means that even the number of connections is
> less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can
> register rtoken successfully later, in other words, the rtoken entry may
> have not been released. This will cause an unexpected
> SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have
> to fallback to TCP.
>
> We found that the main reason for the problem dues to following execution
> sequence:
>
> Server Conn A: Server Conn B: Client Conn B:
>
> smc_lgr_unregister_conn
> smc_lgr_register_conn
> smc_clc_send_accept ->
> smc_rtoken_add
> smcr_buf_unuse
> -> Client Conn A:
> smc_rtoken_delete
>
> smc_lgr_unregister_conn() makes current link available to assigned to new
> incoming connection, while smcr_buf_unuse() has not executed yet, which
> means that smc_rtoken_add may fail because of insufficient rtoken_entry,
> reversing their execution order will avoid this problem.
>
> Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
> Signed-off-by: D. Wythe <alibuda@...ux.alibaba.com>
> ---
> net/smc/smc_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 2f321d2..c9c3a68 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn)
> cancel_work_sync(&conn->abort_work);
> }
> if (!list_empty(&lgr->list)) {
> - smc_lgr_unregister_conn(conn);
> smc_buf_unuse(conn, lgr); /* allow buffer reuse */
> + smc_lgr_unregister_conn(conn);
> }
>
> if (!lgr->conns_num)
I have two patch for this issue, and i missed one, I'll post it in v2
series.
Powered by blists - more mailing lists