[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211230040022.GC55356@linux.alibaba.com>
Date: Thu, 30 Dec 2021 12:00:22 +0800
From: "dust.li" <dust.li@...ux.alibaba.com>
To: Karsten Graul <kgraul@...ux.ibm.com>,
Wen Gu <guwen@...ux.alibaba.com>, davem@...emloft.net,
kuba@...nel.org
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, tonylu@...ux.alibaba.com
Subject: Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R
link access and clear
On Wed, Dec 29, 2021 at 01:51:27PM +0100, Karsten Graul wrote:
>On 28/12/2021 16:13, Wen Gu wrote:
>> We encountered some crashes caused by the race between SMC-R
>> link access and link clear triggered by link group termination
>> in abnormal case, like port error.
>
>Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold().
>In smc_wr_free_link() there are waits for the refcounts to become zero.
>
>Why do you need to introduce another refcounting instead of using the existing?
>And if you have a good reason, do we still need the existing refcounting with your new
>implementation?
>
>Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()?
>
>Btw: it is interesting what kind of crashes you see, we never met them in our setup.
We are trying to using SMC + RDMA to boost application performance,
we now have a product in the cloud called ERDMA which can be used
in the virtual machine.
We are testing SMC with link down/up with short flow cases since
in the cloud environment the RDMA device may be plugged in/out
frequently, and there are many different applications, some of them
may have pretty much short flows.
>Its great to see you evaluating SMC in a cloud environment!
Thanks! We are trying to use SMC to boost performance for cloud
applications, and we hope SMC can be more generic and widely used.
Powered by blists - more mailing lists