[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0d7cbd7-6bc2-4034-b912-17c3a1959021@openvpn.net>
Date: Tue, 13 May 2025 11:19:29 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
Sabrina Dubroca <sd@...asysnail.net>, Al Viro <viro@...iv.linux.org.uk>,
Qingfang Deng <dqfext@...il.com>, Gert Doering <gert@...enie.muc.de>
Subject: Re: [PATCH net-next 10/10] ovpn: ensure sk is still valid during
cleanup
On 13/05/2025 10:21, Paolo Abeni wrote:
>
>
> On 5/13/25 3:37 AM, Jakub Kicinski wrote:
>> On Fri, 9 May 2025 16:26:20 +0200 Antonio Quartulli wrote:
>>> In case of UDP peer timeout, an openvpn client (userspace)
>>> performs the following actions:
>>> 1. receives the peer deletion notification (reason=timeout)
>>> 2. closes the socket
>>>
>>> Upon 1. we have the following:
>>> - ovpn_peer_keepalive_work()
>>> - ovpn_socket_release()
>>> - synchronize_rcu()
>>> At this point, 2. gets a chance to complete and ovpn_sock->sock->sk
>>> becomes NULL. ovpn_socket_release() will then attempt dereferencing it,
>>> resulting in the following crash log:
>>
>> What runs where is a bit unclear to me. Specifically I'm not sure what
>> runs the code under the "if (released)" branch of ovpn_socket_release()
>> if the user closes the socket. Because you now return without a WARN().
>>
>>> @@ -75,13 +76,14 @@ void ovpn_socket_release(struct ovpn_peer *peer)
>>> if (!sock)
>>> return;
>>>
>>> - /* sanity check: we should not end up here if the socket
>>> - * was already closed
>>> + /* sock->sk may be released concurrently, therefore we
>>> + * first attempt grabbing a reference.
>>> + * if sock->sk is NULL it means it is already being
>>> + * destroyed and we don't need any further cleanup
>>> */
>>> - if (!sock->sock->sk) {
>>> - DEBUG_NET_WARN_ON_ONCE(1);
>>> + sk = sock->sock->sk;
>>> + if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
>>
>> How is sk protected from getting reused here?
>> refcount_inc_not_zero() still needs the underlying object to be allocated.
>> I don't see any locking here, and code says this function may sleep so
>> it can't be called under RCU, either.
>
> I agree this still looks racy. When the socket close runs, nobody else
> should have access/reference to the 'struct socket'. I'm under the
> impression that ovpn_socket should acquire references to the underlying
> fd instead of keeping its own refcount.
This is what we were originally doing, but since the socket is not a
"kernel socket", increasing the refcount was preventing us from
understanding when the socket was supposed to be destroyed (because ovpn
itself was still holding a ref).
Hence we switched to this model where we get notified about the socket
going away via close()/destroy() call.
I think ovpn_socket should coordinate access to its sock member and
nullify it during destroy (which is invoked by sk_common_release()).
At that point no other part of the code will have a chance to access it.
I am gonna play with this idea right now.
>
> Side note: the ovpn_socket refcount release/detach path looks wrong, at
> least in case of an UDP socket, as ovpn_udp_socket_detach() calls
> setup_udp_tunnel_sock() which in turns will try to _increment_ various
> core counters, instead of decreasing them (i.e. udp_encap_enable should
> be wrongly accounted after that call).
You're right.
I had the impression I needed to "undo" the setup.
I see now that the encap key is decremented in the UDP sock destroy,
right after having called my implementation of .destroy().
I'll drop the call to setup_udp_tunnel_sock() with empty config then.
Regards,
>
> /P
>
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists