lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjzJ5Hm8hHnE7LR9@hog>
Date: Thu, 9 May 2024 15:04:36 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
	Sergey Ryazanov <ryazanov.s.a@...il.com>,
	Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
	Andrew Lunn <andrew@...n.ch>, Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object

2024-05-08, 22:31:51 +0200, Antonio Quartulli wrote:
> On 08/05/2024 18:06, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:20 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> > > index ee05b8a2c61d..b79d4f0474b0 100644
> > > --- a/drivers/net/ovpn/ovpnstruct.h
> > > +++ b/drivers/net/ovpn/ovpnstruct.h
> > > @@ -17,12 +17,19 @@
> > >    * @dev: the actual netdev representing the tunnel
> > >    * @registered: whether dev is still registered with netdev or not
> > >    * @mode: device operation mode (i.e. p2p, mp, ..)
> > > + * @lock: protect this object
> > > + * @event_wq: used to schedule generic events that may sleep and that need to be
> > > + *            performed outside of softirq context
> > > + * @peer: in P2P mode, this is the only remote peer
> > >    * @dev_list: entry for the module wide device list
> > >    */
> > >   struct ovpn_struct {
> > >   	struct net_device *dev;
> > >   	bool registered;
> > >   	enum ovpn_mode mode;
> > > +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> > 
> > nit: the comment isn't really needed since you have kdoc saying the same thing
> 
> True, but checkpatch.pl (or some other script?) was still throwing a
> warning, therefore I added this comment to silence it.

Ok, then I guess the comment (and the other one below) can stay. That
sounds like a checkpatch.pl bug.

> > > +	struct workqueue_struct *events_wq;
> > > +	struct ovpn_peer __rcu *peer;
> > >   	struct list_head dev_list;
> > >   };
> > > diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> > > new file mode 100644
> > > index 000000000000..2948b7320d47
> > > --- /dev/null
> > > +++ b/drivers/net/ovpn/peer.c
> > [...]
> > > +/**
> > > + * ovpn_peer_free - release private members and free peer object
> > > + * @peer: the peer to free
> > > + */
> > > +static void ovpn_peer_free(struct ovpn_peer *peer)
> > > +{
> > > +	ovpn_bind_reset(peer, NULL);
> > > +
> > > +	WARN_ON(!__ptr_ring_empty(&peer->tx_ring));
> > 
> > Could you pass a destructor to ptr_ring_cleanup instead of all these WARNs?
> 
> hmm but if we remove the WARNs then we lose the possibility to catch
> potential bugs, no? rings should definitely be empty at this point.

Ok, I haven't looked deep enough into how all the parts interact to
understand that. The refcount bump around the tx_ring loop in
ovpn_encrypt_work() takes care of that? Maybe worth a comment "$RING
should be empty at this point because of XYZ" (for each of the rings).

> Or you think I should just not care and free any potentially remaining item?

Whether you WARN or not, any remaining item is going to be leaked. I'd
go with WARN (or maybe DEBUG_NET_WARN_ON_ONCE) and free remaining
items. It should never happen but seems easy to deal with, so why not
handle it?

> > > +void ovpn_peer_release(struct ovpn_peer *peer)
> > > +{
> > > +	call_rcu(&peer->rcu, ovpn_peer_release_rcu);
> > > +}
> > > +
> > > +/**
> > > + * ovpn_peer_delete_work - work scheduled to release peer in process context
> > > + * @work: the work object
> > > + */
> > > +static void ovpn_peer_delete_work(struct work_struct *work)
> > > +{
> > > +	struct ovpn_peer *peer = container_of(work, struct ovpn_peer,
> > > +					      delete_work);
> > > +	ovpn_peer_release(peer);
> > 
> > Does call_rcu really need to run in process context?
> 
> Reason for switching to process context is that we have to invoke
> ovpn_nl_notify_del_peer (that sends a netlink event to userspace) and the
> latter requires a reference to the peer.

I'm confused. When you say "requires a reference to the peer", do you
mean accessing fields of the peer object? I don't see why this
requires ovpn_nl_notify_del_peer to to run from process context.

> For this reason I thought it would be safe to have ovpn_nl_notify_del_peer
> and call_rcu invoked by the same context.
> 
> If I invoke call_rcu in ovpn_peer_release_kref, how can I be sure that the
> peer hasn't been free'd already when ovpn_nl_notify_del_peer is executed?

Put the ovpn_nl_notify_del_peer call before the call_rcu, it will
access the peer and then once that's done call_rcu will do its job?


> > > +/**
> > > + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
> > > + * @peer: the peer to delete
> > > + * @reason: reason why the peer was deleted (sent to userspace)
> > > + *
> > > + * Return: 0 on success or a negative error code otherwise
> > > + */
> > > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> > > +			     enum ovpn_del_peer_reason reason)
> > > +{
> > > +	struct ovpn_peer *tmp;
> > > +	int ret = -ENOENT;
> > > +
> > > +	spin_lock_bh(&peer->ovpn->lock);
> > > +	tmp = rcu_dereference(peer->ovpn->peer);
> > > +	if (tmp != peer)
> > > +		goto unlock;
> > 
> > How do we recover if all those objects got out of sync? Are we stuck
> > with a broken peer?
> 
> mhhh I don't fully get the scenario you are depicting.
> 
> In P2P mode there is only peer stored (reference is saved in ovpn->peer)
> 
> When we want to get rid of it, we invoke ovpn_peer_del_p2p().
> The check we are performing here is just about being sure that we are
> removing the exact peer we requested to remove (and not some other peer that
> was still floating around for some reason).

But it's the right peer because it's the one the caller decided to get
rid of.  How about DEBUG_NET_WARN_ON_ONCE(tmp != peer) and always
releasing the peer?

> > And if this happens during interface deletion, aren't we leaking the
> > peer memory here?
> 
> at interface deletion we call
> 
> ovpn_iface_destruct -> ovpn_peer_release_p2p ->
> ovpn_peer_del_p2p(ovpn->peer)
> 
> so at the last step we just ask to remove the very same peer that is
> curently stored, which should just never fail.

But that's not what the test checks for. If ovpn->peer->ovpn != ovpn,
the test in ovpn_peer_del_p2p will fail. That's "objects getting out
of sync" in my previous email. The peer has a bogus back reference to
its ovpn parent, but it's ovpn->peer nevertheless.

-- 
Sabrina


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ