lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <095c6e45-dd9e-1809-ae51-224679783241@linux.alibaba.com>
Date:   Wed, 5 Jan 2022 16:27:51 +0800
From:   Wen Gu <guwen@...ux.alibaba.com>
To:     Karsten Graul <kgraul@...ux.ibm.com>, davem@...emloft.net,
        kuba@...nel.org
Cc:     linux-s390@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
        tonylu@...ux.alibaba.com
Subject: Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link
 group access and termination

Thanks for your reply.

On 2022/1/3 6:36 pm, Karsten Graul wrote:
> On 31/12/2021 10:44, Wen Gu wrote:
>> On 2021/12/29 8:56 pm, Karsten Graul wrote:
>>> On 28/12/2021 16:13, Wen Gu wrote:
>>>> We encountered some crashes caused by the race between the access
>>>> and the termination of link groups.
>> What do you think about it?
>>
> 
> Hi Wen,
> 
> thank you, and I also wish you and your family a happy New Year!
> 
> Thanks for your detailed explanation, you convinced me of your idea to use
> a reference counting! I think its a good solution for the various problems you describe.
> 
> I am still thinking that even if you saw no problems when conn->lgr is not NULL when the lgr
> is already terminated there should be more attention on the places where conn->lgr is checked.

Thank you for reminding. I agree with the concern.

It should be improved to avoid the potential issue we haven't found.

> For example, in smc_cdc_get_slot_and_msg_send() there is a check for !conn->lgr with the intention
> to avoid working with a terminated link group.
> Should all checks for !conn->lgr be now replaced by the check for conn->freed ?? Does this make sense?

In my humble opinion, we can replace !conn->lgr with !conn->alert_token_local.

If a smc connection is registered to a link group successfully by smc_lgr_register_conn(),
conn->alert_token_local is set to non-zero. At this moment, the conn->lgr is ready to be used.

And if the link group is terminated, conn->alert_token_local is reset to zero in smc_lgr_unregister_conn(),
meaning that the link group registered to connection shouldn't be used anymore.

So I think checking conn->alert_token_local has the same effect with checking conn->lgr to
identify whether the link group pointed by conn->lgr is still healthy and able to be used.

What do you think about it? :)

Thanks,
Wen Gu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ