[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0m_lNTOAV7yL9wo@hog>
Date: Fri, 29 Nov 2024 14:20:20 +0100
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: Sergey Ryazanov <ryazanov.s.a@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Donald Hunter <donald.hunter@...il.com>,
Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote:
> On 26/11/2024 09:49, Antonio Quartulli wrote:
> [...]
> > >
> > > The potential issue is tricky since we create it patch-by-patch.
> > >
> > > Up to this patch the socket releasing procedure looks solid and
> > > reliable. E.g. the P2P netdev destroying:
> > >
> > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> > > ovpn_peer_release_p2p
> > > ovpn_peer_del_p2p
> > > ovpn_peer_put
> > > ovpn_peer_release_kref
> > > ovpn_peer_release
> > > ovpn_socket_put
> > > ovpn_socket_release_kref
> > > ovpn_socket_detach
> > > ovpn_udp_socket_detach
> > > setup_udp_tunnel_sock
> > > netdev_run_todo
> > > rcu_barrier <- no running ovpn_udp_encap_recv after this point
> > > free_netdev
> > >
> > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv()
> > > will be spawned. And after the rcu_barrier() all running
> > > ovpn_udp_encap_recv() will be done. All good.
> > >
> >
> > ok
> >
> > > Then, the following patch 'ovpn: implement TCP transport' disjoin
> > > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling
> > > the socket detach function call:
> > >
> > > ovpn_socket_release_kref
> > > ovpn_socket_schedule_release
> > > schedule_work(&sock->work)
> > >
> > > And long time after the socket will be actually detached:
> > >
> > > ovpn_socket_release_work
> > > ovpn_socket_detach
> > > ovpn_udp_socket_detach
> > > setup_udp_tunnel_sock
> > >
> > > And until this detaching will take a place, UDP handler can call
> > > ovpn_udp_encap_recv() whatever number of times.
> > >
> > > So, we can end up with this scenario:
> > >
> > > ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> > > ovpn_peer_release_p2p
> > > ovpn_peer_del_p2p
> > > ovpn_peer_put
> > > ovpn_peer_release_kref
> > > ovpn_peer_release
> > > ovpn_socket_put
> > > ovpn_socket_release_kref
> > > ovpn_socket_schedule_release
> > > schedule_work(&sock->work)
> > > netdev_run_todo
> > > rcu_barrier
> > > free_netdev
> > >
> > > ovpn_udp_encap_recv <- called for an incoming UDP packet
> > > ovpn_from_udp_sock <- returns pointer to freed memory
> > > // Any access to ovpn pointer is the use-after-free
> > >
> > > ovpn_socket_release_work <- kernel finally ivoke the work
> > > ovpn_socket_detach
> > > ovpn_udp_socket_detach
> > > setup_udp_tunnel_sock
> > >
> > > To address the issue, I see two possible solutions:
> > > 1. flush the workqueue somewhere before the netdev release
> >
> > yes! This is what I was missing. This will also solve the "how can the
> > module wait for all workers to be done before unloading?"
> >
>
> Actually there might be even a simpler solution: each ovpn_socket will hold
> a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP).
> I can simply increase the refcounter those objects while they are referenced
> by the socket and decrease it when the socket is fully released (in the
> detach() function called by the worker).
>
> This way the netdev cannot be released until all socket (and all peers) are
> gone.
>
> This approach doesn't require any local workqueue or any other special
> coordination as we'll just force the whole cleanup to happen in a specific
> order.
>
> Does it make sense?
This dependency between refcounts worries me. I'm already having a
hard time remembering how all objects interact together.
And since ovpn_peer_release already calls ovpn_socket_put, you'd get a
refcount loop if ovpn_socket now also has a ref on the peer, no?
--
Sabrina
Powered by blists - more mailing lists