[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5nkibw33isxiw57jmoaadizo3m2p76ve6zioumlu2z2nh5lwck@xodwiv56zrou>
Date: Mon, 13 Jan 2025 16:01:19 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: Michal Luczaj <mhal@...x.co>
Cc: netdev@...r.kernel.org, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
Luigi Leonardi <leonardi@...hat.com>, "David S. Miller" <davem@...emloft.net>,
Wongi Lee <qwerty@...ori.io>, Eugenio PĂ©rez <eperezma@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Eric Dumazet <edumazet@...gle.com>, kvm@...r.kernel.org,
Paolo Abeni <pabeni@...hat.com>, Stefan Hajnoczi <stefanha@...hat.com>,
Jason Wang <jasowang@...hat.com>, Simon Horman <horms@...nel.org>, Hyunwoo Kim <v4bel@...ori.io>,
Jakub Kicinski <kuba@...nel.org>, virtualization@...ts.linux.dev,
Bobby Eshleman <bobby.eshleman@...edance.com>, stable@...r.kernel.org
Subject: Re: [PATCH net v2 1/5] vsock/virtio: discard packets if the
transport changes
On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
>On 1/13/25 12:05, Stefano Garzarella wrote:
>> On Mon, Jan 13, 2025 at 11:12:52AM +0100, Michal Luczaj wrote:
>>> On 1/13/25 10:07, Stefano Garzarella wrote:
>>>> On Mon, 13 Jan 2025 at 09:57, Stefano Garzarella <sgarzare@...hat.com> wrote:
>>>>> On Sun, Jan 12, 2025 at 11:42:30PM +0100, Michal Luczaj wrote:
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> So, if I get this right:
>>>>>> 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
>>>>>> 2. transport->release() calls vsock_remove_bound() without checking if sk
>>>>>> was bound and moved to bound list (refcnt=1)
>>>>>> 3. vsock_bind() assumes sk is in unbound list and before
>>>>>> __vsock_insert_bound(vsock_bound_sockets()) calls
>>>>>> __vsock_remove_bound() which does:
>>>>>> list_del_init(&vsk->bound_table); // nop
>>>>>> sock_put(&vsk->sk); // refcnt=0
>>>>>>
>>>>>> The following fixes things for me. I'm just not certain that's the only
>>>>>> place where transport destruction may lead to an unbound socket being
>>>>>> removed from the unbound list.
>>>>>>
>>>>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>>>>> index 7f7de6d88096..0fe807c8c052 100644
>>>>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>>>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>>>>> @@ -1303,7 +1303,8 @@ void virtio_transport_release(struct vsock_sock *vsk)
>>>>>>
>>>>>> if (remove_sock) {
>>>>>> sock_set_flag(sk, SOCK_DONE);
>>>>>> - virtio_transport_remove_sock(vsk);
>>>>>> + if (vsock_addr_bound(&vsk->local_addr))
>>>>>> + virtio_transport_remove_sock(vsk);
>>>>>
>>>>> I don't get this fix, virtio_transport_remove_sock() calls
>>>>> vsock_remove_sock()
>>>>> vsock_remove_bound()
>>>>> if (__vsock_in_bound_table(vsk))
>>>>> __vsock_remove_bound(vsk);
>>>>>
>>>>>
>>>>> So, should already avoid this issue, no?
>>>>
>>>> I got it wrong, I see now what are you trying to do, but I don't think
>>>> we should skip virtio_transport_remove_sock() entirely, it also purge
>>>> the rx_queue.
>>>
>>> Isn't rx_queue empty-by-definition in case of !__vsock_in_bound_table(vsk)?
>>
>> It could be.
>>
>> But I see some other issues:
>> - we need to fix also in the other transports, since they do the same
>
>Ahh, yes, VMCI and Hyper-V would need that, too.
>
>> - we need to check delayed cancel work too that call
>> virtio_transport_remove_sock()
>
>That's the "I'm just not certain" part. As with rx_queue, I though delayed
>cancel can only happen for a bound socket. So, per architecture, no need to
>deal with that here, right?
I think so.
>
>> An alternative approach, which would perhaps allow us to avoid all this,
>> is to re-insert the socket in the unbound list after calling release()
>> when we deassign the transport.
>>
>> WDYT?
>
>If we can't keep the old state (sk_state, transport, etc) on failed
>re-connect() then reverting back to initial state sounds, uhh, like an
>option :) I'm not sure how well this aligns with (user's expectations of)
>good ol' socket API, but maybe that train has already left.
We really want to behave as similar as possible with the other sockets,
like AF_INET, so I would try to continue toward that train.
>
>Another possibility would be to simply brick the socket on failed (re)connect.
>
I see, though, this is not the behavior of AF_INET for example, right?
Do you have time to investigate/fix this problem?
If not, I'll try to look into it in the next few days, maybe next week.
Thanks,
Stefano
Powered by blists - more mailing lists