linux-kernel - Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z0m_lNTOAV7yL9wo@hog>
Date: Fri, 29 Nov 2024 14:20:20 +0100
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: Sergey Ryazanov <ryazanov.s.a@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Donald Hunter <donald.hunter@...il.com>,
	Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
	Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote:
> On 26/11/2024 09:49, Antonio Quartulli wrote:
> [...]
> > > 
> > > The potential issue is tricky since we create it patch-by-patch.
> > > 
> > > Up to this patch the socket releasing procedure looks solid and
> > > reliable. E.g. the P2P netdev destroying:
> > > 
> > >    ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> > >      ovpn_peer_release_p2p
> > >        ovpn_peer_del_p2p
> > >          ovpn_peer_put
> > >            ovpn_peer_release_kref
> > >              ovpn_peer_release
> > >                ovpn_socket_put
> > >                  ovpn_socket_release_kref
> > >                    ovpn_socket_detach
> > >                      ovpn_udp_socket_detach
> > >                        setup_udp_tunnel_sock
> > >    netdev_run_todo
> > >      rcu_barrier  <- no running ovpn_udp_encap_recv after this point
> > >      free_netdev
> > > 
> > > After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv()
> > > will be spawned. And after the rcu_barrier() all running
> > > ovpn_udp_encap_recv() will be done. All good.
> > > 
> > 
> > ok
> > 
> > > Then, the following patch 'ovpn: implement TCP transport' disjoin
> > > ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling
> > > the socket detach function call:
> > > 
> > >    ovpn_socket_release_kref
> > >      ovpn_socket_schedule_release
> > >        schedule_work(&sock->work)
> > > 
> > > And long time after the socket will be actually detached:
> > > 
> > >    ovpn_socket_release_work
> > >      ovpn_socket_detach
> > >        ovpn_udp_socket_detach
> > >          setup_udp_tunnel_sock
> > > 
> > > And until this detaching will take a place, UDP handler can call
> > > ovpn_udp_encap_recv() whatever number of times.
> > > 
> > > So, we can end up with this scenario:
> > > 
> > >    ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> > >      ovpn_peer_release_p2p
> > >        ovpn_peer_del_p2p
> > >          ovpn_peer_put
> > >            ovpn_peer_release_kref
> > >              ovpn_peer_release
> > >                ovpn_socket_put
> > >                  ovpn_socket_release_kref
> > >                    ovpn_socket_schedule_release
> > >                      schedule_work(&sock->work)
> > >    netdev_run_todo
> > >      rcu_barrier
> > >      free_netdev
> > > 
> > >    ovpn_udp_encap_recv  <- called for an incoming UDP packet
> > >      ovpn_from_udp_sock <- returns pointer to freed memory
> > >      // Any access to ovpn pointer is the use-after-free
> > > 
> > >    ovpn_socket_release_work  <- kernel finally ivoke the work
> > >      ovpn_socket_detach
> > >        ovpn_udp_socket_detach
> > >          setup_udp_tunnel_sock
> > > 
> > > To address the issue, I see two possible solutions:
> > > 1. flush the workqueue somewhere before the netdev release
> > 
> > yes! This is what I was missing. This will also solve the "how can the
> > module wait for all workers to be done before unloading?"
> > 
> 
> Actually there might be even a simpler solution: each ovpn_socket will hold
> a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP).
> I can simply increase the refcounter those objects while they are referenced
> by the socket and decrease it when the socket is fully released (in the
> detach() function called by the worker).
> 
> This way the netdev cannot be released until all socket (and all peers) are
> gone.
> 
> This approach doesn't require any local workqueue or any other special
> coordination as we'll just force the whole cleanup to happen in a specific
> order.
> 
> Does it make sense?

This dependency between refcounts worries me. I'm already having a
hard time remembering how all objects interact together.

And since ovpn_peer_release already calls ovpn_socket_put, you'd get a
refcount loop if ovpn_socket now also has a ref on the peer, no?

-- 
Sabrina