lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240105142732.1903bc70@meshulam.tesarici.cz>
Date: Fri, 5 Jan 2024 14:27:32 +0100
From: Petr Tesařík <petr@...arici.cz>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Alexandre Torgue <alexandre.torgue@...s.st.com>, Jose Abreu
 <joabreu@...opsys.com>, "David S. Miller" <davem@...emloft.net>, Jakub
 Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Maxime
 Coquelin <mcoquelin.stm32@...il.com>, Chen-Yu Tsai <wens@...e.org>, Jernej
 Skrabec <jernej.skrabec@...il.com>, Samuel Holland <samuel@...lland.org>,
 "open list:STMMAC ETHERNET DRIVER" <netdev@...r.kernel.org>, "moderated
 list:ARM/STM32 ARCHITECTURE" <linux-stm32@...md-mailman.stormreply.com>,
 "moderated list:ARM/STM32 ARCHITECTURE"
 <linux-arm-kernel@...ts.infradead.org>, open list
 <linux-kernel@...r.kernel.org>, "open list:ARM/Allwinner sunXi SoC support"
 <linux-sunxi@...ts.linux.dev>, Jiri Pirko <jiri@...nulli.us>
Subject: Re: [PATCH] net: stmmac: protect statistics updates with a spinlock

Hi Eric,

yeah, it's me again...

On Fri, 5 Jan 2024 12:14:47 +0100
Petr Tesařík <petr@...arici.cz> wrote:

> On Fri, 5 Jan 2024 11:48:19 +0100
> Eric Dumazet <edumazet@...gle.com> wrote:
> 
> > On Fri, Jan 5, 2024 at 11:34 AM Petr Tesařík <petr@...arici.cz> wrote:  
> > >
> > > On Fri, 5 Jan 2024 10:58:42 +0100
> > > Eric Dumazet <edumazet@...gle.com> wrote:
> > >    
> > > > On Fri, Jan 5, 2024 at 10:16 AM Petr Tesarik <petr@...arici.cz> wrote:    
> > > > >
> > > > > Add a spinlock to fix race conditions while updating Tx/Rx statistics.
> > > > >
> > > > > As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
> > > > > u64_stats_sync must ensure mutual exclusion, or one seqcount update could
> > > > > be lost on 32-bit platforms, thus blocking readers forever.
> > > > >
> > > > > Such lockups have been actually observed on 32-bit Arm after stmmac_xmit()
> > > > > on one core raced with stmmac_napi_poll_tx() on another core.
> > > > >
> > > > > Signed-off-by: Petr Tesarik <petr@...arici.cz>    
> > > >
> > > > This is going to add more costs to 64bit platforms ?    
> > >
> > > Yes, it adds a (hopefully not too contended) spinlock and in most
> > > places an interrupt disable/enable pair.
> > >
> > > FWIW the race condition is also present on 64-bit platforms, resulting
> > > in inaccurate statistic counters. I can understand if you consider it a
> > > mild annoyance, not worth fixing.
> > >    
> > > > It seems to me that the same syncp can be used from two different
> > > > threads : hard irq and napi poller...    
> > >
> > > Yes, that's exactly the scenario that locks up my system.
> > >    
> > > > At this point, I do not see why you keep linux/u64_stats_sync.h if you
> > > > decide to go for a spinlock...    
> > >
> > > The spinlock does not havce to be taken on the reader side, so the
> > > seqcounter still adds some value.
> > >    
> > > > Alternative would use atomic64_t fields for the ones where there is no
> > > > mutual exclusion.
> > > >
> > > > RX : napi poll is definitely safe (protected by an atomic bit)
> > > > TX : each TX queue is also safe (protected by an atomic exclusion for
> > > > non LLTX drivers)
> > > >
> > > > This leaves the fields updated from hardware interrupt context ?    
> > >
> > > I'm afraid I don't have enough network-stack-foo to follow here.
> > >
> > > My issue on 32 bit is that stmmac_xmit() may be called directly from
> > > process context while another core runs the TX napi on the same channel
> > > (in interrupt context). I didn't observe any race on the RX path, but I
> > > believe it's possible with NAPI busy polling.
> > >
> > > In any case, I don't see the connection with LLTX. Maybe you want to
> > > say that the TX queue is safe for stmmac (because it is a non-LLTX
> > > driver), but might not be safe for LLTX drivers?    
> > 
> > LLTX drivers (mostly virtual drivers like tunnels...) can have multiple cpus
> > running ndo_start_xmit() concurrently. So any use of a 'shared syncp'
> > would be a bug.
> > These drivers usually use per-cpu stats, to avoid races and false
> > sharing anyway.
> > 
> > I think you should split the structures into two separate groups, each
> > guarded with its own syncp.
> > 
> > No extra spinlocks, no extra costs on 64bit arches...
> > 
> > If TX completion can run in parallel with ndo_start_xmit(), then
> > clearly we have to split stmmac_txq_stats in two halves:  
> 
> Oh, now I get it. Yes, that's much better, indeed.
> 
> I mean, the counters have never been consistent (due to the race on the
> writer side), and nobody is concerned. So, there is no value in taking
> a consistent snapshot in stmmac_get_ethtool_stats().
> 
> I'm going to rework and retest my patch. Thank you for pointing me in
> the right direction!
> 
> Petr T
> 
> > Also please note the conversion from u64 to u64_stats_t  
> 
> Noted. IIUC this will in turn close the update race on 64-bit by using
> an atomic type and on 32-bit by using a seqlock. Clever.
> 
> Petr T
> 
> > Very partial patch, only to show the split and new structure :
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h
> > b/drivers/net/ethernet/stmicro/stmmac/common.h
> > index e3f650e88f82f927f0dcf95748fbd10c14c30cbe..702bceea5dc8c875a80f5e3a92b7bb058f373eda
> > 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/common.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> > @@ -60,16 +60,22 @@
> >  /* #define FRAME_FILTER_DEBUG */
> > 
> >  struct stmmac_txq_stats {
> > -       u64 tx_bytes;
> > -       u64 tx_packets;
> > -       u64 tx_pkt_n;
> > -       u64 tx_normal_irq_n;
> > -       u64 napi_poll;
> > -       u64 tx_clean;
> > -       u64 tx_set_ic_bit;
> > -       u64 tx_tso_frames;
> > -       u64 tx_tso_nfrags;
> > -       struct u64_stats_sync syncp;
> > +/* First part, updated from ndo_start_xmit(), protected by tx queue lock */
> > +       struct u64_stats_sync syncp_tx;
> > +       u64_stats_t tx_bytes;
> > +       u64_stats_t tx_packets;
> > +       u64_stats_t tx_pkt_n;
> > +       u64_stats_t tx_tso_frames;
> > +       u64_stats_t tx_tso_nfrags;
> > +
> > +/* Second part, updated from TX completion (protected by NAPI poll logic) */
> > +       struct u64_stats_sync syncp_tx_completion;
> > +       u64_stats_t napi_poll;
> > +       u64_stats_t tx_clean;
> > +       u64_stats_t tx_set_ic_bit;

Unfortunately, this field is also updated from ndo_start_xmit():

4572)     if (set_ic)
4573)             txq_stats->tx_set_ic_bit++;

I feel it would be a shame to introduce a spinlock just for this one
update. But I think the field could be converted to an atomic64_t.

Which raises a question: Why aren't all stat counters simply atomic64_t? There
is no guarantee that the reader side takes a consistent snapshot
(except on 32-bit). So, why do we even bother with u64_stats_sync?

Is it merely because u64_stats_add() should be cheaper than
atomic64_add()? Or is there anything else I'm missing? If yes, does it
invalidate my proposal to convert tx_set_ic_bit to an atomic64_t?

Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ