netdev - Re: [PATCH net-next v2 2/3] net: dsa: add Arrow SpeedChips XRS700x driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201127103503.5cda7f24@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>
Date:   Fri, 27 Nov 2020 10:35:03 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     George McCollister <george.mccollister@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        "open list:OPEN FIRMWARE AND..." <devicetree@...r.kernel.org>
Subject: Re: [PATCH net-next v2 2/3] net: dsa: add Arrow SpeedChips XRS700x
 driver

On Fri, 27 Nov 2020 00:05:00 +0200 Vladimir Oltean wrote:
> On Thu, Nov 26, 2020 at 01:07:12PM -0600, George McCollister wrote:
> > On Thu, Nov 26, 2020 at 11:56 AM Vladimir Oltean <olteanv@...il.com> wrote:  
> > > On Thu, Nov 26, 2020 at 03:24:18PM +0200, Vladimir Oltean wrote:  
> > > > On Wed, Nov 25, 2020 at 08:25:11PM -0600, George McCollister wrote:  
> > > > > > > +     {XRS_RX_UNDERSIZE_L, "rx_undersize"},
> > > > > > > +     {XRS_RX_FRAGMENTS_L, "rx_fragments"},
> > > > > > > +     {XRS_RX_OVERSIZE_L, "rx_oversize"},
> > > > > > > +     {XRS_RX_JABBER_L, "rx_jabber"},
> > > > > > > +     {XRS_RX_ERR_L, "rx_err"},
> > > > > > > +     {XRS_RX_CRC_L, "rx_crc"},  
> > > > > >
> > > > > > As Vladimir already mentioned to you the statistics which have
> > > > > > corresponding entries in struct rtnl_link_stats64 should be reported
> > > > > > the standard way. The infra for DSA may not be in place yet, so best
> > > > > > if you just drop those for now.  
> > > > >
> > > > > Okay, that clears it up a bit. Just drop these 6? I'll read through
> > > > > that thread again and try to make sense of it.  
> > > >
> > > > I feel that I should ask. Do you want me to look into exposing RMON
> > > > interface counters through rtnetlink (I've never done anything like that
> > > > before either, but there's a beginning for everything), or are you going
> > > > to?  
> > >
> > > So I started to add .ndo_get_stats64 based on the hardware counters, but
> > > I already hit the first roadblock, as described by the wise words of
> > > Documentation/networking/statistics.rst:
> > >
> > > | The `.ndo_get_stats64` callback can not sleep because of accesses
> > > | via `/proc/net/dev`. If driver may sleep when retrieving the statistics
> > > | from the device it should do so periodically asynchronously and only return
> > > | a recent copy from `.ndo_get_stats64`. Ethtool interrupt coalescing interface
> > > | allows setting the frequency of refreshing statistics, if needed.

I should have probably also mentioned here that unlike most NDOs
.ndo_get_stats64 is called without rtnl lock held at all.

> > > Unfortunately, I feel this is almost unacceptable for a DSA driver that
> > > more often than not needs to retrieve these counters from a slow and
> > > bottlenecked bus (SPI, I2C, MDIO etc). Periodic readouts are not an
> > > option, because the only periodic interval that would not put absurdly
> > > high pressure on the limited SPI bandwidth would be a readout interval
> > > that gives you very old counters.  

What's a high interval? It's not uncommon to refresh the stats once a
second even in high performance NICs.

> > Indeed it seems ndo_get_stats64() usually gets data over something
> > like a local or PCIe bus or from software. I had a brief look to see
> > if I could find another driver that was getting the stats over a slow
> > bus and didn't notice anything. If you haven't already you might do a
> > quick grep and see if anything pops out to you.
> >  
> > >
> > > What exactly is it that incurs the atomic context? I cannot seem to
> > > figure out from this stack trace:  
> >
> > I think something in fs/seq_file.c is taking an rcu lock.  
> 
> Not quite. It _is_ the RCU read-side lock that's taken, but it's taken
> locally from dev_seq_start in net/core/net-procfs.c. The reason is that
> /proc/net/dev iterates through all interfaces from the current netns,
> and it is precisely that that creates atomic context. You used to need
> to hold the rwlock_t dev_base_lock, but now you can also "get away" with
> the RCU read-side lock. Either way, both are atomic context, so it
> doesn't help.
> 
> commit c6d14c84566d6b70ad9dc1618db0dec87cca9300
> Author: Eric Dumazet <eric.dumazet@...il.com>
> Date:   Wed Nov 4 05:43:23 2009 -0800
> 
>     net: Introduce for_each_netdev_rcu() iterator
> 
>     Adds RCU management to the list of netdevices.
> 
>     Convert some for_each_netdev() users to RCU version, if
>     it can avoid read_lock-ing dev_base_lock
> 
>     Ie:
>             read_lock(&dev_base_loack);
>             for_each_netdev(net, dev)
>                     some_action();
>             read_unlock(&dev_base_lock);
> 
>     becomes :
> 
>             rcu_read_lock();
>             for_each_netdev_rcu(net, dev)
>                     some_action();
>             rcu_read_unlock();
> 
> 
>     Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
> 
> So... yeah. As long as this kernel interface exists, it needs to run in
> atomic context, by construction. Great.
>
> > I suppose it doesn't really matter though since the documentation says
> > we can't sleep.  
> 
> You're talking, I suppose, about these words of wisdom in
> Documentation/filesystems/seq_file.rst?
> 
> | However, the seq_file code (by design) will not sleep between the calls
> | to start() and stop(), so holding a lock during that time is a
> | reasonable thing to do. The seq_file code will also avoid taking any
> | other locks while the iterator is active.
> 
> It _doesn't_ say that you can't sleep between start() and stop(), right?
> It just says that if you want to keep the seq_file iterator atomic, the
> seq_file code is not sabotaging you by sleeping. But you still could
> sleep if you wanted to.
> 
> Back to the statistics counters.
> 
> How accurate do the counters in /proc/net/dev need to be? What programs
> consume those? Could they be more out of date than the ones retrieved
> through rtnetlink?

ifconfig does for sure.

> I'm thinking that maybe we could introduce another ndo, something like
> .ndo_get_stats64_blocking, that could be called from all places except
> from net/core/net-procfs.c. That one could still call the non-blocking
> variant. Then, depending on the answer to the question "how inaccurate
> could we reasonably leave /proc/net/dev", we could:
> - just return zeroes there
> - return the counters cached from the last blocking call

I'd rather not introduce divergent behavior like that.

Is the periodic refresh really that awful? We're mostly talking error
counters here so every second or every few seconds should be perfectly
fine.

> > It does seem to me that this is something that needs to be sorted out
> > at the subsystem level and that this driver has been "caught in the
> > crossfire". Any guidance on how we could proceed with this driver and
> > revisit this when we have answers to these questions at the subsystem
> > level would be appreciated if substantial time will be required to
> > work this out.  
> 
> Now seriously, who isn't caught in the crossfire here? Let's do some
> brainstorming and it will be quick and painless.