[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0nng5uN6dlQrQEa@hog>
Date: Fri, 29 Nov 2024 17:10:43 +0100
From: Sabrina Dubroca <sd@...asysnail.net>
To: Sergey Ryazanov <ryazanov.s.a@...il.com>
Cc: Antonio Quartulli <antonio@...nvpn.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Donald Hunter <donald.hunter@...il.com>,
Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
> On 15.11.2024 17:02, Antonio Quartulli wrote:
> > On 11/11/2024 02:54, Sergey Ryazanov wrote:
> > [...]
> > > > + skb_reset_transport_header(skb);
> > > > + skb_probe_transport_header(skb);
> > > > + skb_reset_inner_headers(skb);
> > > > +
> > > > + memset(skb->cb, 0, sizeof(skb->cb));
> > >
> > > Why do we need to zero the control buffer here?
> >
> > To avoid the next layer to assume the cb is clean while it is not.
> > Other drivers do the same as well.
>
> AFAIR, there is no convention to clean the control buffer before the handing
> over. The common practice is a bit opposite, programmer shall not assume
> that the control buffer has been zeroed.
>
> Not a big deal to clean it here, we just can save some CPU cycles avoiding
> it.
>
> > I think this was recommended by Sabrina as well.
>
> Curious. It's macsec that does not zero it, or I've not understood how it
> was done.
I only remember discussing a case [1] where one function within ovpn
was expecting a cleared skb->cb to behave correctly but the caller did
not clear it. In general, as you said, clearing cb "to be nice to
other layers" is not expected. Sorry if some comments I made were
confusing.
[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
> > > > +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
> > > > +{
> > > > + struct ovpn_socket *ovpn_sock;
> > > > +
> > > > + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) !=
> > > > UDP_ENCAP_OVPNINUDP))
> > > > + return NULL;
> > > > +
> > > > + ovpn_sock = rcu_dereference_sk_user_data(sk);
> > > > + if (unlikely(!ovpn_sock))
> > > > + return NULL;
> > > > +
> > > > + /* make sure that sk matches our stored transport socket */
> > > > + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
> > > > + return NULL;
> > > > +
> > > > + return ovpn_sock->ovpn;
> > >
> > > Now, returning of this pointer is safe. But the following TCP
> > > transport support calls the socket release via a scheduled work.
> > > What extends socket lifetime and makes it possible to receive a UDP
> > > packet way after the interface private data release. Is it correct
> > > assumption?
> >
> > Sorry you lost me when sayng "following *TCP* transp[ort support calls".
> > This function is invoked only in UDP context.
> > Was that a typ0?
>
> Yeah, you are right. The question sounds like a riddle. I should eventually
> stop composing emails at midnight. Let me paraphrase it.
>
> The potential issue is tricky since we create it patch-by-patch.
>
> Up to this patch the socket releasing procedure looks solid and reliable.
> E.g. the P2P netdev destroying:
>
> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> ovpn_peer_release_p2p
> ovpn_peer_del_p2p
> ovpn_peer_put
> ovpn_peer_release_kref
> ovpn_peer_release
> ovpn_socket_put
> ovpn_socket_release_kref
> ovpn_socket_detach
> ovpn_udp_socket_detach
> setup_udp_tunnel_sock
> netdev_run_todo
> rcu_barrier <- no running ovpn_udp_encap_recv after this point
It's more the synchronize_net in unregister_netdevice_many_notify?
rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu
waits for rcu_read_lock sections (see the comments for rcu_barrier and
synchronize_rcu in kernel/rcu/tree.c).
> free_netdev
>
> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be
> spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will
> be done. All good.
>
> Then, the following patch 'ovpn: implement TCP transport' disjoin
> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket
> detach function call:
>
> ovpn_socket_release_kref
> ovpn_socket_schedule_release
> schedule_work(&sock->work)
>
> And long time after the socket will be actually detached:
>
> ovpn_socket_release_work
> ovpn_socket_detach
> ovpn_udp_socket_detach
> setup_udp_tunnel_sock
>
> And until this detaching will take a place, UDP handler can call
> ovpn_udp_encap_recv() whatever number of times.
>
> So, we can end up with this scenario:
>
> ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
> ovpn_peer_release_p2p
> ovpn_peer_del_p2p
> ovpn_peer_put
> ovpn_peer_release_kref
> ovpn_peer_release
> ovpn_socket_put
> ovpn_socket_release_kref
> ovpn_socket_schedule_release
> schedule_work(&sock->work)
> netdev_run_todo
> rcu_barrier
> free_netdev
>
> ovpn_udp_encap_recv <- called for an incoming UDP packet
> ovpn_from_udp_sock <- returns pointer to freed memory
> // Any access to ovpn pointer is the use-after-free
>
> ovpn_socket_release_work <- kernel finally ivoke the work
> ovpn_socket_detach
> ovpn_udp_socket_detach
> setup_udp_tunnel_sock
>
> To address the issue, I see two possible solutions:
> 1. flush the workqueue somewhere before the netdev release
> 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach
Going with #2, we could fully split detach into a synchronous part and
async part (with async not needed for UDP). detach_sync clears the
pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so
that no more packets will be sent through the ovpn driver.
Related to that topic, I'm not sure what's keeping a reference on the
peer to guarantee it doesn't get freed before we're done with
peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this
tcp stuff should move from the peer to ovpn_socket?
--
Sabrina
Powered by blists - more mailing lists