[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+DYN4j2+MGK3Sh0=YAqmCyw0arcpm2bGO3qVFkzU_B4g@mail.gmail.com>
Date: Mon, 30 Nov 2020 20:22:01 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>,
netdev <netdev@...r.kernel.org>,
Paul Gortmaker <paul.gortmaker@...driver.com>,
Jiri Benc <jbenc@...hat.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>
Subject: Re: Correct usage of dev_base_lock in 2020
On Mon, Nov 30, 2020 at 8:03 PM Vladimir Oltean <olteanv@...il.com> wrote:
>
> On Mon, Nov 30, 2020 at 08:00:40PM +0100, Eric Dumazet wrote:
> >
> >
> > On 11/30/20 7:48 PM, Vladimir Oltean wrote:
> > > On Mon, Nov 30, 2020 at 10:14:05AM -0800, Jakub Kicinski wrote:
> > >> On Mon, 30 Nov 2020 11:41:10 +0100 Eric Dumazet wrote:
> > >>>> So dev_base_lock dates back to the Big Kernel Lock breakup back in Linux 2.4
> > >>>> (ie before my time). The time has come to get rid of it.
> > >>>>
> > >>>> The use is sysfs is because could be changed to RCU. There have been issues
> > >>>> in the past with sysfs causing lock inversions with the rtnl mutex, that
> > >>>> is why you will see some trylock code there.
> > >>>>
> > >>>> My guess is that dev_base_lock readers exist only because no one bothered to do
> > >>>> the RCU conversion.
> > >>>
> > >>> I think we did, a long time ago.
> > >>>
> > >>> We took care of all ' fast paths' already.
> > >>>
> > >>> Not sure what is needed, current situation does not bother me at all ;)
> > >>
> > >> Perhaps Vladimir has a plan to post separately about it (in that case
> > >> sorry for jumping ahead) but the initial problem was procfs which is
> > >> (hopefully mostly irrelevant by now, and) taking the RCU lock only
> > >> therefore forcing drivers to have re-entrant, non-sleeping
> > >> .ndo_get_stats64 implementations.
> > >
> > > Right, the end reason why I'm even looking at this is because I want to
> > > convert all callers of dev_get_stats to use sleepable context and not
> > > atomic. This makes it easier to gather statistics from devices that have
> > > a firmware, or off-chip devices behind a slow bus like SPI.
> > >
> > > Like Jakub pointed out, some places call dev_get_stats while iterating
> > > through the list of network interfaces - one would be procfs, but not
> > > only. These callers are pure readers, so they use RCU protection. But
> > > that gives us atomic context when calling dev_get_stats. The naive
> > > solution is to convert all those callers to hold the RTNL mutex, which
> > > is the writer-side protection for the network interface lists, and which
> > > is sleepable. In fact I do have a series of 8 patches where I get that
> > > done. But there are some weirder cases, such as the bonding driver,
> > > where I need to do this:
> > >
> > > -----------------------------[cut here]-----------------------------
> > > From 369a0e18a2446cda8ff52d72c02aa144ae6687ec Mon Sep 17 00:00:00 2001
> > > From: Vladimir Oltean <vladimir.oltean@....com>
> > > Date: Mon, 30 Nov 2020 02:39:46 +0200
> > > Subject: [PATCH] net: bonding: retrieve device statistics under RTNL, not RCU
> > >
> > > In the effort of making .ndo_get_stats64 be able to sleep, we need to
> > > ensure the callers of dev_get_stats do not use atomic context.
> > >
> > > The bonding driver uses an RCU read-side critical section to ensure the
> > > integrity of the list of network interfaces, because the driver iterates
> > > through all net devices in the netns to find the ones which are its
> > > configured slaves. We still need some protection against an interface
> > > registering or deregistering, and the writer-side lock, the RTNL mutex,
> > > is fine for that, because it offers sleepable context.
> > >
> > > We are taking the RTNL this way (checking for rtnl_is_locked first)
> > > because the RTNL is not guaranteed to be held by all callers of
> > > ndo_get_stats64, in fact there will be work in the future that will
> > > avoid as much RTNL-holding as possible.
> > >
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > > ---
> > > drivers/net/bonding/bond_main.c | 18 +++++++-----------
> > > include/net/bonding.h | 1 -
> > > 2 files changed, 7 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > > index e0880a3840d7..1d44534e95d2 100644
> > > --- a/drivers/net/bonding/bond_main.c
> > > +++ b/drivers/net/bonding/bond_main.c
> > > @@ -3738,21 +3738,17 @@ static void bond_get_stats(struct net_device *bond_dev,
> > > struct rtnl_link_stats64 *stats)
> > > {
> > > struct bonding *bond = netdev_priv(bond_dev);
> > > + bool rtnl_locked = rtnl_is_locked();
> > > struct rtnl_link_stats64 temp;
> > > struct list_head *iter;
> > > struct slave *slave;
> > > - int nest_level = 0;
> > >
> > > + if (!rtnl_locked)
> > > + rtnl_lock();
> >
> > Gosh, do not do that.
> >
> > Convert the bonding ->stats_lock to a mutex instead.
> >
> > Adding more reliance to RTNL is not helping cases where
> > access to stats should not be blocked by other users of RTNL (which can be abused)
>
> I can't, Eric. The bond_for_each_slave() macro needs protection against
> net devices being registered and unregistered.
And ?
A bonding device can absolutely maintain a private list, ready for
bonding ndo_get_stats() use, regardless
of register/unregister logic.
bond_for_each_slave() is simply a macro, you can replace it by something else.
Powered by blists - more mailing lists