[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iL6CoVLdd9qVkcJ50pSU1dS7Fn4naKGcK2K6ci9Xp_cYg@mail.gmail.com>
Date: Sun, 4 Feb 2024 11:15:43 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S . Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
Antoine Tenart <atenart@...nel.org>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v2 net-next 15/16] bridge: use exit_batch_rtnl() method
On Sun, Feb 4, 2024 at 6:10 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 2 Feb 2024 17:40:00 +0000 Eric Dumazet wrote:
> > exit_batch_rtnl() is called while RTNL is held,
> > and devices to be unregistered can be queued in the dev_kill_list.
> >
> > This saves one rtnl_lock()/rtnl_unlock() pair per netns
> > and one unregister_netdevice_many() call.
>
> This one appears to cause a lot of crashes in the selftests:
>
> https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-02-03--21-00&pw-n=0&pass=0
>
> Example crash:
>
> https://netdev-2.bots.linux.dev/vmksft-bonding/results/449900/vm-crash-thr0-2
> --
> pw-bot: cr
Hi Jakub, thanks for letting me know.
It seems default_device_exit_batch_rtnl() is called before
br_net_exit_batch_rtnl().
We call the br_dev_delete() function twice.
unregister_netdevice_queue() is called twice.
So the real issue is with patch "net: convert
default_device_exit_batch() to exit_batch_rtnl method".
We depended on the fact that the rtnl_lock()/rtnl_unlock() pairs were
committing small batches
of device removals.
I will rework this patch and move it to the last patch in the series.
(use list_empty(&dev->unreg_list) to detect a device is already queued
for removal)
Powered by blists - more mailing lists