[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4a537df-900b-43e6-bcc2-5049036b1ca2@openvpn.net>
Date: Wed, 27 Nov 2024 02:40:02 +0100
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sergey Ryazanov <ryazanov.s.a@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Donald Hunter <donald.hunter@...il.com>,
sd@...asysnail.net, Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
On 26/11/2024 09:49, Antonio Quartulli wrote:
[...]
>>
>> The potential issue is tricky since we create it patch-by-patch.
>>
>> Up to this patch the socket releasing procedure looks solid and
>> reliable. E.g. the P2P netdev destroying:
>>
>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>> ovpn_peer_release_p2p
>> ovpn_peer_del_p2p
>> ovpn_peer_put
>> ovpn_peer_release_kref
>> ovpn_peer_release
>> ovpn_socket_put
>> ovpn_socket_release_kref
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>> netdev_run_todo
>> rcu_barrier <- no running ovpn_udp_encap_recv after this point
>> free_netdev
>>
>> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv()
>> will be spawned. And after the rcu_barrier() all running
>> ovpn_udp_encap_recv() will be done. All good.
>>
>
> ok
>
>> Then, the following patch 'ovpn: implement TCP transport' disjoin
>> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the
>> socket detach function call:
>>
>> ovpn_socket_release_kref
>> ovpn_socket_schedule_release
>> schedule_work(&sock->work)
>>
>> And long time after the socket will be actually detached:
>>
>> ovpn_socket_release_work
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>>
>> And until this detaching will take a place, UDP handler can call
>> ovpn_udp_encap_recv() whatever number of times.
>>
>> So, we can end up with this scenario:
>>
>> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>> ovpn_peer_release_p2p
>> ovpn_peer_del_p2p
>> ovpn_peer_put
>> ovpn_peer_release_kref
>> ovpn_peer_release
>> ovpn_socket_put
>> ovpn_socket_release_kref
>> ovpn_socket_schedule_release
>> schedule_work(&sock->work)
>> netdev_run_todo
>> rcu_barrier
>> free_netdev
>>
>> ovpn_udp_encap_recv <- called for an incoming UDP packet
>> ovpn_from_udp_sock <- returns pointer to freed memory
>> // Any access to ovpn pointer is the use-after-free
>>
>> ovpn_socket_release_work <- kernel finally ivoke the work
>> ovpn_socket_detach
>> ovpn_udp_socket_detach
>> setup_udp_tunnel_sock
>>
>> To address the issue, I see two possible solutions:
>> 1. flush the workqueue somewhere before the netdev release
>
> yes! This is what I was missing. This will also solve the "how can the
> module wait for all workers to be done before unloading?"
>
Actually there might be even a simpler solution: each ovpn_socket will
hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP).
I can simply increase the refcounter those objects while they are
referenced by the socket and decrease it when the socket is fully
released (in the detach() function called by the worker).
This way the netdev cannot be released until all socket (and all peers)
are gone.
This approach doesn't require any local workqueue or any other special
coordination as we'll just force the whole cleanup to happen in a
specific order.
Does it make sense?
Regards,
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists