[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10799332-b349-451b-a12b-efed43683357@openvpn.net>
Date: Mon, 2 Dec 2024 00:39:56 +0100
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>,
Sergey Ryazanov <ryazanov.s.a@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Donald Hunter <donald.hunter@...il.com>,
Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
On 29/11/2024 17:10, Sabrina Dubroca wrote:
> 2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
>> On 15.11.2024 17:02, Antonio Quartulli wrote:
>>> On 11/11/2024 02:54, Sergey Ryazanov wrote:
>>> [...]
>>>>> + skb_reset_transport_header(skb);
>>>>> + skb_probe_transport_header(skb);
>>>>> + skb_reset_inner_headers(skb);
>>>>> +
>>>>> + memset(skb->cb, 0, sizeof(skb->cb));
>>>>
>>>> Why do we need to zero the control buffer here?
>>>
>>> To avoid the next layer to assume the cb is clean while it is not.
>>> Other drivers do the same as well.
>>
>> AFAIR, there is no convention to clean the control buffer before the handing
>> over. The common practice is a bit opposite, programmer shall not assume
>> that the control buffer has been zeroed.
>>
>> Not a big deal to clean it here, we just can save some CPU cycles avoiding
>> it.
>>
>>> I think this was recommended by Sabrina as well.
>>
>> Curious. It's macsec that does not zero it, or I've not understood how it
>> was done.
>
> I only remember discussing a case [1] where one function within ovpn
> was expecting a cleared skb->cb to behave correctly but the caller did
> not clear it. In general, as you said, clearing cb "to be nice to
> other layers" is not expected. Sorry if some comments I made were
> confusing.
No problem at all.
I misunderstood some statement and went the wrong route.
Thanks a lot Sergey for pointing this out.
I am only clearing the cb before usage as required by internal assumptions.
>
> [1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
>
>
>>>>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
>>>>> +{
>>>>> + struct ovpn_socket *ovpn_sock;
>>>>> +
>>>>> + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) !=
>>>>> UDP_ENCAP_OVPNINUDP))
>>>>> + return NULL;
>>>>> +
>>>>> + ovpn_sock = rcu_dereference_sk_user_data(sk);
>>>>> + if (unlikely(!ovpn_sock))
>>>>> + return NULL;
>>>>> +
>>>>> + /* make sure that sk matches our stored transport socket */
>>>>> + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
>>>>> + return NULL;
>>>>> +
>>>>> + return ovpn_sock->ovpn;
>>>>
>>>> Now, returning of this pointer is safe. But the following TCP
>>>> transport support calls the socket release via a scheduled work.
>>>> What extends socket lifetime and makes it possible to receive a UDP
>>>> packet way after the interface private data release. Is it correct
>>>> assumption?
>>>
>>> Sorry you lost me when sayng "following *TCP* transp[ort support calls".
>>> This function is invoked only in UDP context.
>>> Was that a typ0?
>>
>> Yeah, you are right. The question sounds like a riddle. I should eventually
>> stop composing emails at midnight. Let me paraphrase it.
>>
>> The potential issue is tricky since we create it patch-by-patch.
>>
>> Up to this patch the socket releasing procedure looks solid and reliable.
>> E.g. the P2P netdev destroying:
>>
>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>> ovpn_peer_release_p2p
>> ovpn_peer_del_p2p
>> ovpn_peer_put
>> ovpn_peer_release_kref
>> ovpn_peer_release
>> ovpn_socket_put
>> ovpn_socket_release_kref
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>> netdev_run_todo
>> rcu_barrier <- no running ovpn_udp_encap_recv after this point
>
> It's more the synchronize_net in unregister_netdevice_many_notify?
> rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu
> waits for rcu_read_lock sections (see the comments for rcu_barrier and
> synchronize_rcu in kernel/rcu/tree.c).
>
>> free_netdev
>>
>> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be
>> spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will
>> be done. All good.
>>
>> Then, the following patch 'ovpn: implement TCP transport' disjoin
>> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket
>> detach function call:
>>
>> ovpn_socket_release_kref
>> ovpn_socket_schedule_release
>> schedule_work(&sock->work)
>>
>> And long time after the socket will be actually detached:
>>
>> ovpn_socket_release_work
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>>
>> And until this detaching will take a place, UDP handler can call
>> ovpn_udp_encap_recv() whatever number of times.
>>
>> So, we can end up with this scenario:
>>
>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>> ovpn_peer_release_p2p
>> ovpn_peer_del_p2p
>> ovpn_peer_put
>> ovpn_peer_release_kref
>> ovpn_peer_release
>> ovpn_socket_put
>> ovpn_socket_release_kref
>> ovpn_socket_schedule_release
>> schedule_work(&sock->work)
>> netdev_run_todo
>> rcu_barrier
>> free_netdev
>>
>> ovpn_udp_encap_recv <- called for an incoming UDP packet
>> ovpn_from_udp_sock <- returns pointer to freed memory
>> // Any access to ovpn pointer is the use-after-free
>>
>> ovpn_socket_release_work <- kernel finally ivoke the work
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>>
>> To address the issue, I see two possible solutions:
>> 1. flush the workqueue somewhere before the netdev release
>> 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach
>
> Going with #2, we could fully split detach into a synchronous part and
> async part (with async not needed for UDP). detach_sync clears the
> pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so
> that no more packets will be sent through the ovpn driver.
>
> Related to that topic, I'm not sure what's keeping a reference on the
> peer to guarantee it doesn't get freed before we're done with
> peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this
> tcp stuff should move from the peer to ovpn_socket?
Good point.
It may make sense to move everything to ovpn_socket and avoid this extra
dependency on the peer, while it is not needed at all.
I will play with it and see what comes out.
Thanks!
Regards,
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists