[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e41f49d-abec-34b4-283b-7ad4bbff3b41@linux.alibaba.com>
Date: Sun, 8 Oct 2023 16:22:55 +0800
From: "D. Wythe" <alibuda@...ux.alibaba.com>
To: Wenjia Zhang <wenjia@...ux.ibm.com>,
Alexandra Winter <wintera@...ux.ibm.com>
Cc: jaka@...ux.ibm.com, kgraul@...ux.ibm.com, kuba@...nel.org,
davem@...emloft.net, netdev@...r.kernel.org, linux-s390@...r.kernel.org,
linux-rdma@...r.kernel.org
Subject: Re: [PATCH net] net/smc: fix panic smc_tcp_syn_recv_sock() while
closing listen socket
On 10/6/23 2:14 AM, Wenjia Zhang wrote:
>
>
> On 26.09.23 11:06, D. Wythe wrote:
>>
>>
>> On 9/26/23 3:18 PM, Alexandra Winter wrote:
>>>
>>> On 26.09.23 05:00, D. Wythe wrote:
>>>> You are right. The key point is how to ensure the valid of smc sock
>>>> during the life time of clc sock, If so, READ_ONCE is good
>>>> enough. Unfortunately, I found that there are no such guarantee,
>>>> so it's still a life-time problem.
>>> Did you discover a scenario, where clc sock could live longer than
>>> smc sock?
>>> Wouldn't that be a dangerous scenario in itself? I still have some
>>> hope that the lifetime of an smc socket is by design longer
>>> than that of the corresponding tcp socket.
>>
>>
>> Hi Alexandra,
>>
>> Yes there is. Considering scenario:
>>
>> tcp_v4_rcv(skb)
>>
>> /* req sock */
>> reqsk = _inet_lookup_skb(skb)
>>
>> /* listen sock */
>> sk = reqsk(reqsk)->rsk_listener;
>> sock_hold(sk);
>> tcp_check_req(sk)
>>
>>
>> smc_release /*
>> release smc listen sock */
>> __smc_release
>> smc_close_active() /* smc_sk->sk_state = SMC_CLOSED; */
>> if
>> (smc_sk->sk_state == SMC_CLOSED)
>> smc_clcsock_release();
>> sock_release(clcsk); /* close clcsock */
>> sock_put(sk); /* might not the final refcnt */
>>
>> sock_put(smc_sk) /* might be the final refcnt of smc_sock */
>>
>> syn_recv_sock(sk...)
>> /* might be the final refcnt of tcp listen sock */
>> sock_put(sk);
>>
>> Fortunately, this scenario only affects smc_syn_recv_sock and
>> smc_hs_congested, as other callbacks already have locks to protect smc,
>> which can guarantee that the sk_user_data is either NULL (set in
>> smc_close_active) or valid under the lock.
>> I'm kind of confused with this scenario. How could the
> smc_clcsock_release()->sock_release(clcsk) happen?
> Because the syn_recv_sock happens short prior to accept(), that means
> that the &smc->tcp_listen_work is already triggered but the real
> accept() is still not happening. At this moment, the incoming
> connection is being added into the accept queue. Thus, if the
> sk->sk_state is changed from SMC_LISTEN to SMC_CLOSED in
> smc_close_active(), there is still
> "flush_work(&smc->tcp_listen_work);" after that. That ensures the
> smc_clcsock_release() should not happen, if smc_clcsock_accept() is
> not finished. Do you think that the execution of the
> &smc->tcp_listen_work is already done? Or am I missing something?
>
Hi wenjia,
Sorry for late reply, we have just returned from vacation.
The smc_clcsock_release here release the listen clcsock rather than the
child clcsock.
So the flush_work might not be helpful for this scenario.
Best wishes,
D. Wythe
>>> Considering the const, maybe
>>>> we need to do :
>>>>
>>>> 1. hold a refcnt of smc_sock for syn_recv_sock to keep smc sock
>>>> valid during life time of clc sock
>>>> 2. put the refcnt of smc_sock in sk_destruct in tcp_sock to release
>>>> the very smc sock .
>>>>
>>>> In that way, we can always make sure the valid of smc sock during
>>>> the life time of clc sock. Then we can use READ_ONCE rather
>>>> than lock. What do you think ?
>>> I am not sure I fully understand the details what you propose to do.
>>> And it is not only syn_recv_sock(), right?
>>> You need to consider all relations between smc socks and tcp socks;
>>> fallback to tcp, initial creation, children of listen sockets,
>>> variants of shutdown, ... Preferrably a single simple mechanism
>>> covers all situations. Maybe there is such a mechanism already today?
>>> (I don't think clcsock->sk->sk_user_data or sk_callback_lock provide
>>> this general coverage)
>>> If we really have a gap, a general refcnt'ing on smc sock could be a
>>> solution, but needs to be designed carefully.
>>
>> You are right , we need designed it with care, we will try the
>> referenced solutions internally first, and I will also send some RFCs
>> so that everyone can track the latest progress
>> and make it can be all agreed.
>>> Many thanks to you and the team to help make smc more stable and
>>> robust.
>>
>> Our pleasure 😁. The stability of smc is important to us too.
>>
>> Best wishes,
>> D. Wythe
>>
>>
Powered by blists - more mailing lists