[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <095c6e45-dd9e-1809-ae51-224679783241@linux.alibaba.com>
Date: Wed, 5 Jan 2022 16:27:51 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Karsten Graul <kgraul@...ux.ibm.com>, davem@...emloft.net,
kuba@...nel.org
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
tonylu@...ux.alibaba.com
Subject: Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link
group access and termination
Thanks for your reply.
On 2022/1/3 6:36 pm, Karsten Graul wrote:
> On 31/12/2021 10:44, Wen Gu wrote:
>> On 2021/12/29 8:56 pm, Karsten Graul wrote:
>>> On 28/12/2021 16:13, Wen Gu wrote:
>>>> We encountered some crashes caused by the race between the access
>>>> and the termination of link groups.
>> What do you think about it?
>>
>
> Hi Wen,
>
> thank you, and I also wish you and your family a happy New Year!
>
> Thanks for your detailed explanation, you convinced me of your idea to use
> a reference counting! I think its a good solution for the various problems you describe.
>
> I am still thinking that even if you saw no problems when conn->lgr is not NULL when the lgr
> is already terminated there should be more attention on the places where conn->lgr is checked.
Thank you for reminding. I agree with the concern.
It should be improved to avoid the potential issue we haven't found.
> For example, in smc_cdc_get_slot_and_msg_send() there is a check for !conn->lgr with the intention
> to avoid working with a terminated link group.
> Should all checks for !conn->lgr be now replaced by the check for conn->freed ?? Does this make sense?
In my humble opinion, we can replace !conn->lgr with !conn->alert_token_local.
If a smc connection is registered to a link group successfully by smc_lgr_register_conn(),
conn->alert_token_local is set to non-zero. At this moment, the conn->lgr is ready to be used.
And if the link group is terminated, conn->alert_token_local is reset to zero in smc_lgr_unregister_conn(),
meaning that the link group registered to connection shouldn't be used anymore.
So I think checking conn->alert_token_local has the same effect with checking conn->lgr to
identify whether the link group pointed by conn->lgr is still healthy and able to be used.
What do you think about it? :)
Thanks,
Wen Gu
Powered by blists - more mailing lists