lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG=2xmPCYdYBk9zc9EHt2dmGUBuXJHqnMLByac17UHOqSt2CFw@mail.gmail.com>
Date: Wed, 10 Dec 2025 16:59:25 +0100
From: Adrián Moreno <amorenoz@...hat.com>
To: Eelco Chaudron <echaudro@...hat.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>, 
	Aaron Conole <aconole@...hat.com>, Ilya Maximets <i.maximets@....org>, 
	Alexei Starovoitov <ast@...nel.org>, Jesse Gross <jesse@...ira.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	netdev@...r.kernel.org, dev@...nvswitch.org
Subject: Re: [PATCH net] net: openvswitch: Avoid needlessly taking the RTNL on
 vport destroy

On Wed, Dec 10, 2025 at 04:30:13PM +0100, Eelco Chaudron wrote:
>
>
> On 10 Dec 2025, at 16:12, Adrián Moreno wrote:
>
> > On Wed, Dec 10, 2025 at 02:28:36PM +0100, Eelco Chaudron wrote:
> >>
> >>
> >> On 10 Dec 2025, at 13:59, Toke Høiland-Jørgensen wrote:
> >>
> >>> The openvswitch teardown code will immediately call
> >>> ovs_netdev_detach_dev() in response to a NETDEV_UNREGISTER notification.
> >>> It will then start the dp_notify_work workqueue, which will later end up
> >>> calling the vport destroy() callback. This callback takes the RTNL to do
> >>> another ovs_netdev_detach_port(), which in this case is unnecessary.
> >>> This causes extra pressure on the RTNL, in some cases leading to
> >>> "unregister_netdevice: waiting for XX to become free" warnings on
> >>> teardown.
> >>>
> >>> We can straight-forwardly avoid the extra RTNL lock acquisition by
> >>> checking the device flags before taking the lock, and skip the locking
> >>> altogether if the IFF_OVS_DATAPATH flag has already been unset.
> >>>
> >>> Fixes: b07c26511e94 ("openvswitch: fix vport-netdev unregister")
> >>> Tested-by: Adrian Moreno <amorenoz@...hat.com>
> >>> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
> >>> ---
> >>>  net/openvswitch/vport-netdev.c | 11 +++++++----
> >>>  1 file changed, 7 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> >>> index 91a11067e458..519f038526f9 100644
> >>> --- a/net/openvswitch/vport-netdev.c
> >>> +++ b/net/openvswitch/vport-netdev.c
> >>> @@ -160,10 +160,13 @@ void ovs_netdev_detach_dev(struct vport *vport)
> >>>
> >>>  static void netdev_destroy(struct vport *vport)
> >>>  {
> >>> -	rtnl_lock();
> >>> -	if (netif_is_ovs_port(vport->dev))
> >>> -		ovs_netdev_detach_dev(vport);
> >>> -	rtnl_unlock();
> >>> +	if (netif_is_ovs_port(vport->dev)) {
> >>
> >> Hi Toke,
> >>
> >> Thanks for digging into this!
> >>
> >> The patch looks technically correct to me, but maybe we should add a comment here explaining why we can do it this way, i.e., why we can call netif_is_ovs_port() without the lock.
> >> For example:
> >>
> >> /* We can avoid taking the rtnl lock as the IFF_OVS_DATAPATH flag is set/cleared in either netdev_create()/netdev_destroy(), which are both called under the global ovs_lock(). */
> >>
> >> Additionally, I think the second netif_is_ovs_port() under the rtnl lock is not required due to the above.
> >
> > In the case of netdevs being unregistered outside of OVS, the
> > ovs_dp_device_notifier gets called which then runs
> > "ovs_netdev_detach_dev" only under RTNL. Locking ovs_lock() in that
> > callback would be problematic since the rest of the OVS code assumes
> > ovs_lock is nested outside of RTNL.
> >
> > So this could race with a ovs_vport_cmd_del AFAICS.
>
> Not fully sure I understand the code path you are referring to, but if it’s through ovs_dp_notify_wq()->dp_detach_port_notify()->ovs_dp_detach_port(), it takes the ovs_lock().

The codepath described by Toke is:
(netdev gets unregistered outside of OVS) ->
dp_device_event (under RTNL) -> ovs_netdev_detach_dev()
(IFF_OVS_DATAPATH is cleared)

Then dp_notify_work is scheduled and it does what you mention:
ovs_dp_notify_wq (lock ovs_mutex) -> dp_detach_port_notify -> ovs_dp_detach_port
    -> ovs_vport_del -> netdev_destroy (at this point
netif_is_ovs_port is false)

The first part of this codepath (NETDEV_UNREGISTER notification) happens
under RTNL, not under ovs_mutex and it manipulates vport->dev->priv_flags.

So in theory we could receive the netdev notification while we process a
ovs_vport_cmd_del() command from userspace, which also ends up calling
netdev_destroy.

>
> By the way: in your testing, did you see the expected improvement, i.e., no more “unregister” delays?

I did see a reduction in the use of RTNL, which is obvious. I have not
been able to reproduce the "unregister_netdevice: waiting ..." spat yet.

In such a high RTNL-contented scenario, I still don't know how much that
extra rtnl_lock is slowing things up or whether the optimization will be
enough to reduce the spat in all cases, I guess not.

I will try simulating the contention with delay-kfunc.

What I have tried is some manual concurrent manipulation of netdevs and
also ran the OVS kernel unit tests.

Thanks.
Adrián

>
> //Eelco
>
> >>
> >>> +		rtnl_lock();
> >>> +		/* check again while holding the lock */
> >>> +		if (netif_is_ovs_port(vport->dev))
> >>> +			ovs_netdev_detach_dev(vport);
> >>> +		rtnl_unlock();
> >>> +	}
> >>>
> >>>  	call_rcu(&vport->rcu, vport_netdev_free);
> >>>  }
> >>> --
> >>> 2.52.0
> >>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ