[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a86a0aff-803d-478c-b26b-d42cb5301070@linux.ibm.com>
Date: Wed, 11 Oct 2023 14:39:22 +0200
From: Wenjia Zhang <wenjia@...ux.ibm.com>
To: "D. Wythe" <alibuda@...ux.alibaba.com>,
Alexandra Winter <wintera@...ux.ibm.com>
Cc: jaka@...ux.ibm.com, kgraul@...ux.ibm.com, kuba@...nel.org,
davem@...emloft.net, netdev@...r.kernel.org,
linux-s390@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH net] net/smc: fix panic smc_tcp_syn_recv_sock() while
closing listen socket
On 05.10.23 20:14, Wenjia Zhang wrote:
>>
>>
>> On 26.09.23 11:06, D. Wythe wrote:
>>>
>>>
>>> On 9/26/23 3:18 PM, Alexandra Winter wrote:
>>>>
>>>> On 26.09.23 05:00, D. Wythe wrote:
>>>>> You are right. The key point is how to ensure the valid of smc sock
>>>>> during the life time of clc sock, If so, READ_ONCE is good
>>>>> enough. Unfortunately, I found that there are no such guarantee, so
>>>>> it's still a life-time problem.
>>>> Did you discover a scenario, where clc sock could live longer than
>>>> smc sock?
>>>> Wouldn't that be a dangerous scenario in itself? I still have some
>>>> hope that the lifetime of an smc socket is by design longer
>>>> than that of the corresponding tcp socket.
>>>
>>>
>>> Hi Alexandra,
>>>
>>> Yes there is. Considering scenario:
>>>
>>> tcp_v4_rcv(skb)
>>>
>>> /* req sock */
>>> reqsk = _inet_lookup_skb(skb)
>>>
>>> /* listen sock */
>>> sk = reqsk(reqsk)->rsk_listener;
>>> sock_hold(sk);
>>> tcp_check_req(sk)
>>>
>>>
>>> smc_release /*
>>> release smc listen sock */
>>> __smc_release
>>> smc_close_active() /* smc_sk->sk_state = SMC_CLOSED; */
>>> if
>>> (smc_sk->sk_state == SMC_CLOSED)
>>> smc_clcsock_release();
>>> sock_release(clcsk); /* close clcsock */
>>> sock_put(sk); /* might not the final refcnt */
>>>
>>> sock_put(smc_sk) /* might be the final refcnt of smc_sock */
>>>
>>> syn_recv_sock(sk...)
>>> /* might be the final refcnt of tcp listen sock */
>>> sock_put(sk);
>>>
>>> Fortunately, this scenario only affects smc_syn_recv_sock and
>>> smc_hs_congested, as other callbacks already have locks to protect smc,
>>> which can guarantee that the sk_user_data is either NULL (set in
>>> smc_close_active) or valid under the lock.
>>> I'm kind of confused with this scenario. How could the
>> smc_clcsock_release()->sock_release(clcsk) happen?
>> Because the syn_recv_sock happens short prior to accept(), that means
>> that the &smc->tcp_listen_work is already triggered but the real
>> accept() is still not happening. At this moment, the incoming connection
>> is being added into the accept queue. Thus, if the sk->sk_state is
>> changed from SMC_LISTEN to SMC_CLOSED in smc_close_active(), there is
>> still "flush_work(&smc->tcp_listen_work);" after that. That ensures the
>> smc_clcsock_release() should not happen, if smc_clcsock_accept() is not
>> finished. Do you think that the execution of the &smc->tcp_listen_work
>> is already done? Or am I missing something?
>> > Hi wenjia,
>
> Sorry for late reply, we have just returned from vacation.
>
> The smc_clcsock_release here release the listen clcsock rather than
> the child clcsock.
> So the flush_work might not be helpful for this scenario.
>
> Best wishes,
> D. Wythe
It seems like that I lost some mails these days :-( Just saw your answer.
Maybe I didn't describe my thought clearly. Following data flow is your
scenario, right?
–
(sk_state == SMC_LISTEN)|
tcp_check_req() | smc_release()
| ->__smc_release()
| -> smc_close_active()
| -> sk->sk_state = SMC_CLOSED;
| -> ...
| -> smc->clcsock->sk->sk_user_data = NULL;
| -> ...
|*1) -> flush_work(&smc->tcp_listen_work);
|*4)
| -> smc_clcsock_accept()
| -> kernel_accept()
| -> inet_csk_accept()
|*5)
| if (sk->sk_state == SMC_CLOSED)
|*3)-> smc_clcsock_release()
-> syn_recv_sock() *2)|
|
v
My question is how the smc_clcsock_release() could happen after the
syn_recv_sock()?
IMO, the syn_recv_sock() should be called during the
&smc->tcp_listen_work, which is corresponding to lsmc (listen smc). And
in smc_clcsock_accept(), the lsmc->clcsock as the listening socket goes
on to be used to accept a new connection. If the &smc->tcp_listen_work
is not finished, *1) will wait for its finishing. It can only happen in
following situation:
*4) sk_state is SMC_CLOSED, then no connection is accepted.
*5) old sk_state is SMC_LISTEN, TCP accept is successful. But current
sk_state is SMC_CLOSED. Thus, no new smc connection.
What do you think? Please let me know if I have any lapse of thought.
Thanks,
Wenjia
Powered by blists - more mailing lists