netdev - Re: Correct usage of dev_base

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201130190348.ayg7yn5fieyr4ksy@skbuf>
Date:   Mon, 30 Nov 2020 21:03:48 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        netdev <netdev@...r.kernel.org>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Jiri Benc <jbenc@...hat.com>,
        Or Gerlitz <ogerlitz@...lanox.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>
Subject: Re: Correct usage of dev_base_lock in 2020

On Mon, Nov 30, 2020 at 08:00:40PM +0100, Eric Dumazet wrote:
>
>
> On 11/30/20 7:48 PM, Vladimir Oltean wrote:
> > On Mon, Nov 30, 2020 at 10:14:05AM -0800, Jakub Kicinski wrote:
> >> On Mon, 30 Nov 2020 11:41:10 +0100 Eric Dumazet wrote:
> >>>> So dev_base_lock dates back to the Big Kernel Lock breakup back in Linux 2.4
> >>>> (ie before my time). The time has come to get rid of it.
> >>>>
> >>>> The use is sysfs is because could be changed to RCU. There have been issues
> >>>> in the past with sysfs causing lock inversions with the rtnl mutex, that
> >>>> is why you will see some trylock code there.
> >>>>
> >>>> My guess is that dev_base_lock readers exist only because no one bothered to do
> >>>> the RCU conversion.
> >>>
> >>> I think we did, a long time ago.
> >>>
> >>> We took care of all ' fast paths' already.
> >>>
> >>> Not sure what is needed, current situation does not bother me at all ;)
> >>
> >> Perhaps Vladimir has a plan to post separately about it (in that case
> >> sorry for jumping ahead) but the initial problem was procfs which is
> >> (hopefully mostly irrelevant by now, and) taking the RCU lock only
> >> therefore forcing drivers to have re-entrant, non-sleeping
> >> .ndo_get_stats64 implementations.
> >
> > Right, the end reason why I'm even looking at this is because I want to
> > convert all callers of dev_get_stats to use sleepable context and not
> > atomic. This makes it easier to gather statistics from devices that have
> > a firmware, or off-chip devices behind a slow bus like SPI.
> >
> > Like Jakub pointed out, some places call dev_get_stats while iterating
> > through the list of network interfaces - one would be procfs, but not
> > only. These callers are pure readers, so they use RCU protection. But
> > that gives us atomic context when calling dev_get_stats. The naive
> > solution is to convert all those callers to hold the RTNL mutex, which
> > is the writer-side protection for the network interface lists, and which
> > is sleepable. In fact I do have a series of 8 patches where I get that
> > done. But there are some weirder cases, such as the bonding driver,
> > where I need to do this:
> >
> > -----------------------------[cut here]-----------------------------
> > From 369a0e18a2446cda8ff52d72c02aa144ae6687ec Mon Sep 17 00:00:00 2001
> > From: Vladimir Oltean <vladimir.oltean@....com>
> > Date: Mon, 30 Nov 2020 02:39:46 +0200
> > Subject: [PATCH] net: bonding: retrieve device statistics under RTNL, not RCU
> >
> > In the effort of making .ndo_get_stats64 be able to sleep, we need to
> > ensure the callers of dev_get_stats do not use atomic context.
> >
> > The bonding driver uses an RCU read-side critical section to ensure the
> > integrity of the list of network interfaces, because the driver iterates
> > through all net devices in the netns to find the ones which are its
> > configured slaves. We still need some protection against an interface
> > registering or deregistering, and the writer-side lock, the RTNL mutex,
> > is fine for that, because it offers sleepable context.
> >
> > We are taking the RTNL this way (checking for rtnl_is_locked first)
> > because the RTNL is not guaranteed to be held by all callers of
> > ndo_get_stats64, in fact there will be work in the future that will
> > avoid as much RTNL-holding as possible.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> > ---
> >  drivers/net/bonding/bond_main.c | 18 +++++++-----------
> >  include/net/bonding.h           |  1 -
> >  2 files changed, 7 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index e0880a3840d7..1d44534e95d2 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -3738,21 +3738,17 @@ static void bond_get_stats(struct net_device *bond_dev,
> >  			   struct rtnl_link_stats64 *stats)
> >  {
> >  	struct bonding *bond = netdev_priv(bond_dev);
> > +	bool rtnl_locked = rtnl_is_locked();
> >  	struct rtnl_link_stats64 temp;
> >  	struct list_head *iter;
> >  	struct slave *slave;
> > -	int nest_level = 0;
> >
> > +	if (!rtnl_locked)
> > +		rtnl_lock();
>
> Gosh, do not do that.
>
> Convert the bonding ->stats_lock to a mutex instead.
>
> Adding more reliance to RTNL is not helping cases where
> access to stats should not be blocked by other users of RTNL (which can be abused)

I can't, Eric. The bond_for_each_slave() macro needs protection against
net devices being registered and unregistered.