netdev - Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
[an error occurred while processing this directive]
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7254c556-8fe9-484c-9dc8-f55c30b11776@openvpn.net>
Date: Thu, 9 May 2024 15:44:26 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
 Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object

On 09/05/2024 15:04, Sabrina Dubroca wrote:
[..]
>>>> +	struct workqueue_struct *events_wq;
>>>> +	struct ovpn_peer __rcu *peer;
>>>>    	struct list_head dev_list;
>>>>    };
>>>> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
>>>> new file mode 100644
>>>> index 000000000000..2948b7320d47
>>>> --- /dev/null
>>>> +++ b/drivers/net/ovpn/peer.c
>>> [...]
>>>> +/**
>>>> + * ovpn_peer_free - release private members and free peer object
>>>> + * @peer: the peer to free
>>>> + */
>>>> +static void ovpn_peer_free(struct ovpn_peer *peer)
>>>> +{
>>>> +	ovpn_bind_reset(peer, NULL);
>>>> +
>>>> +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
>>>
>>> Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?
>>
>> hmm but if we remove the WARNs then we lose the possibility to catch
>> potential bugs, no? rings should definitely be empty at this point.
> 
> Ok, I haven't looked deep enough into how all the parts interact to
> understand that. The refcount bump around the tx_ring loop in
> ovpn_encrypt_work() takes care of that? Maybe worth a comment "$RING
> should be empty at this point because of XYZ" (for each of the rings).

Yeah, all piped skbs will be processed before exiting.
Ok, will add a comment.

> 
>> Or you think I should just not care and free any potentially remaining item?
> 
> Whether you WARN or not, any remaining item is going to be leaked. I'd
> go with WARN (or maybe DEBUG_NET_WARN_ON_ONCE) and free remaining
> items. It should never happen but seems easy to deal with, so why not
> handle it?

Sure, passing consume_skb as destructor to ptr_ring_cleanup should be 
enough.

> 
>>>> +void ovpn_peer_release(struct ovpn_peer *peer)
>>>> +{
>>>> +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
>>>> +}
>>>> +
>>>> +/**
>>>> + * ovpn_peer_delete_work - work scheduled to release peer in process context
>>>> + * @work: the work object
>>>> + */
>>>> +static void ovpn_peer_delete_work(struct work_struct *work)
>>>> +{
>>>> +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
>>>> +					      delete_work);
>>>> +	ovpn_peer_release(peer);
>>>
>>> Does call_rcu really need to run in process context?
>>
>> Reason for switching to process context is that we have to invoke
>> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
>> latter requires a reference to the peer.
> 
> I'm confused. When you say "requires a reference to the peer", do you
> mean accessing fields of the peer object? I don't see why this
> requires ovpn_nl_notify_del_peer to to run from process context.

ovpn_nl_notify_del_peer sends a netlink message to userspace and I was 
under the impression that it may block/sleep, no?
For this reason I assumed it must be executed in process context.

> 
>> For this reason I thought it would be safe to have ovpn_nl_notify_del_peer
>> and call_rcu invoked by the same context.
>>
>> If I invoke call_rcu in ovpn_peer_release_kref, how can I be sure that the
>> peer hasn't been free'd already when ovpn_nl_notify_del_peer is executed?
> 
> Put the ovpn_nl_notify_del_peer call before the call_rcu, it will
> access the peer and then once that's done call_rcu will do its job?

If ovpn_nl_notify_del_peer is allowed to run out of process context, 
then I totally agree.

Will test again.

> 
> 
>>>> +/**
>>>> + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
>>>> + * @peer: the peer to delete
>>>> + * @reason: reason why the peer was deleted (sent to userspace)
>>>> + *
>>>> + * Return: 0 on success or a negative error code otherwise
>>>> + */
>>>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
>>>> +			     enum ovpn_del_peer_reason reason)
>>>> +{
>>>> +	struct ovpn_peer *tmp;
>>>> +	int ret = -ENOENT;
>>>> +
>>>> +	spin_lock_bh(&peer->ovpn->lock);
>>>> +	tmp = rcu_dereference(peer->ovpn->peer);
>>>> +	if (tmp != peer)
>>>> +		goto unlock;
>>>
>>> How do we recover if all those objects got out of sync? Are we stuck
>>> with a broken peer?
>>
>> mhhh I don't fully get the scenario you are depicting.
>>
>> In P2P mode there is only peer stored (reference is saved in ovpn->peer)
>>
>> When we want to get rid of it, we invoke ovpn_peer_del_p2p().
>> The check we are performing here is just about being sure that we are
>> removing the exact peer we requested to remove (and not some other peer that
>> was still floating around for some reason).
> 
> But it's the right peer because it's the one the caller decided to get
> rid of.  How about DEBUG_NET_WARN_ON_ONCE(tmp != peer) and always
> releasing the peer?

sounds good. I should force myself to use more WARN_ON for conditions 
that are truly unexpected.

This said, I have a question regarding DEBUG_NET_WARN_ON_ONCE: it prints 
something only if CONFIG_DEBUG_NET is enabled.
Is this the case on standard desktop/server distribution? Otherwise how 
are we going to get reports from users?

> 
>>> And if this happens during interface deletion, aren't we leaking the
>>> peer memory here?
>>
>> at interface deletion we call
>>
>> ovpn_iface_destruct -> ovpn_peer_release_p2p ->
>> ovpn_peer_del_p2p(ovpn->peer)
>>
>> so at the last step we just ask to remove the very same peer that is
>> curently stored, which should just never fail.
> 
> But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
> the test in ovpn_peer_del_p2p will fail. That's "objects getting out
> of sync" in my previous email. The peer has a bogus back reference to
> its ovpn parent, but it's ovpn->peer nevertheless.
> 

Oh thanks for explaining that.

Ok, my assumption is that "ovpn->peer->ovpn != ovpn" can never be true.

Peers are created within the context of one ovpn object and are never 
exposed to other ovpns.

I hope it makes sense.

-- 
Antonio Quartulli
OpenVPN Inc.