[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <747c3399-4e6f-0353-95bf-6b6f3a0f5f60@linux.alibaba.com>
Date: Thu, 6 Jan 2022 21:02:34 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Karsten Graul <kgraul@...ux.ibm.com>, davem@...emloft.net,
kuba@...nel.org
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
tonylu@...ux.alibaba.com
Subject: Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link
group access and termination
Thanks for your reply.
On 2022/1/5 8:03 pm, Karsten Graul wrote:
> On 05/01/2022 09:27, Wen Gu wrote:
>> On 2022/1/3 6:36 pm, Karsten Graul wrote:
>>> On 31/12/2021 10:44, Wen Gu wrote:
>>>> On 2021/12/29 8:56 pm, Karsten Graul wrote:
>>>>> On 28/12/2021 16:13, Wen Gu wrote:
>>>>>> We encountered some crashes caused by the race between the access
>>>>>> and the termination of link groups.
>> So I think checking conn->alert_token_local has the same effect with checking conn->lgr to
>> identify whether the link group pointed by conn->lgr is still healthy and able to be used.
>
> Yeah that sounds like a good solution for that! So is it now guaranteed that conn->lgr is always
> set and this check can really be removed completely, or should there be a new helper that checks
> conn->lgr and the alert_token, like smc_lgr_valid() ?
In my humble opinion, the link group pointed by conn->lgr might have the following
three stages if we remove 'conn->lgr = NULL' from smc_lgr_unregister_conn().
1. conn->lgr = NULL and conn->alert_token_local is zero
This means that the connection has never been registered in a link group. conn->lgr is clearly
unable to use.
2. conn->lgr != NULL and conn->alert_token_local is non-zero
This means that the connection has been registered in a link group, and conn->lgr is valid to access.
3. conn->lgr != NULL but conn->alert_token_local is zero
This means that the connection was registered in a link group before, but is unregistered from
it now. conn->lgr shouldn't be used anymore.
So I am trying this way:
1) Introduce a new helper smc_conn_lgr_state() to check the three stages mentioned above.
enum smc_conn_lgr_state {
SMC_CONN_LGR_ORPHAN, /* conn was never registered in a link group */
SMC_CONN_LGR_VALID, /* conn is registered in a link group now */
SMC_CONN_LGR_INVALID, /* conn was registered in a link group, but now
is unregistered from it and conn->lgr should
not be used any more */
};
2) replace the current conn->lgr check with the new helper.
These new changes are under testing now.
What do you think about it? :)
Thanks,
Wen Gu
Powered by blists - more mailing lists