netdev - Re: [PATCH] net: stmmac: protect statistics updates with a spinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKWSemsKmfsLjupwWBnyeKjtHH+mZjTzYiJT4G=xyUrNQ@mail.gmail.com>
Date: Fri, 5 Jan 2024 15:28:41 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Petr Tesařík <petr@...arici.cz>
Cc: Alexandre Torgue <alexandre.torgue@...s.st.com>, Jose Abreu <joabreu@...opsys.com>, 
	"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Maxime Coquelin <mcoquelin.stm32@...il.com>, Chen-Yu Tsai <wens@...e.org>, 
	Jernej Skrabec <jernej.skrabec@...il.com>, Samuel Holland <samuel@...lland.org>, 
	"open list:STMMAC ETHERNET DRIVER" <netdev@...r.kernel.org>, 
	"moderated list:ARM/STM32 ARCHITECTURE" <linux-stm32@...md-mailman.stormreply.com>, 
	"moderated list:ARM/STM32 ARCHITECTURE" <linux-arm-kernel@...ts.infradead.org>, 
	open list <linux-kernel@...r.kernel.org>, 
	"open list:ARM/Allwinner sunXi SoC support" <linux-sunxi@...ts.linux.dev>, Jiri Pirko <jiri@...nulli.us>
Subject: Re: [PATCH] net: stmmac: protect statistics updates with a spinlock

On Fri, Jan 5, 2024 at 3:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Jan 5, 2024 at 2:27 PM Petr Tesařík <petr@...arici.cz> wrote:
> >
> > Hi Eric,
> >
> > yeah, it's me again...
> >
> > On Fri, 5 Jan 2024 12:14:47 +0100
> > Petr Tesařík <petr@...arici.cz> wrote:
> >
> > > On Fri, 5 Jan 2024 11:48:19 +0100
> > > Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > > On Fri, Jan 5, 2024 at 11:34 AM Petr Tesařík <petr@...arici.cz> wrote:
> > > > >
> > > > > On Fri, 5 Jan 2024 10:58:42 +0100
> > > > > Eric Dumazet <edumazet@...gle.com> wrote:
> > > > >
> > > > > > On Fri, Jan 5, 2024 at 10:16 AM Petr Tesarik <petr@...arici.cz> wrote:
> > > > > > >
> > > > > > > Add a spinlock to fix race conditions while updating Tx/Rx statistics.
> > > > > > >
> > > > > > > As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
> > > > > > > u64_stats_sync must ensure mutual exclusion, or one seqcount update could
> > > > > > > be lost on 32-bit platforms, thus blocking readers forever.
> > > > > > >
> > > > > > > Such lockups have been actually observed on 32-bit Arm after stmmac_xmit()
> > > > > > > on one core raced with stmmac_napi_poll_tx() on another core.
> > > > > > >
> > > > > > > Signed-off-by: Petr Tesarik <petr@...arici.cz>
> > > > > >
> > > > > > This is going to add more costs to 64bit platforms ?
> > > > >
> > > > > Yes, it adds a (hopefully not too contended) spinlock and in most
> > > > > places an interrupt disable/enable pair.
> > > > >
> > > > > FWIW the race condition is also present on 64-bit platforms, resulting
> > > > > in inaccurate statistic counters. I can understand if you consider it a
> > > > > mild annoyance, not worth fixing.
> > > > >
> > > > > > It seems to me that the same syncp can be used from two different
> > > > > > threads : hard irq and napi poller...
> > > > >
> > > > > Yes, that's exactly the scenario that locks up my system.
> > > > >
> > > > > > At this point, I do not see why you keep linux/u64_stats_sync.h if you
> > > > > > decide to go for a spinlock...
> > > > >
> > > > > The spinlock does not havce to be taken on the reader side, so the
> > > > > seqcounter still adds some value.
> > > > >
> > > > > > Alternative would use atomic64_t fields for the ones where there is no
> > > > > > mutual exclusion.
> > > > > >
> > > > > > RX : napi poll is definitely safe (protected by an atomic bit)
> > > > > > TX : each TX queue is also safe (protected by an atomic exclusion for
> > > > > > non LLTX drivers)
> > > > > >
> > > > > > This leaves the fields updated from hardware interrupt context ?
> > > > >
> > > > > I'm afraid I don't have enough network-stack-foo to follow here.
> > > > >
> > > > > My issue on 32 bit is that stmmac_xmit() may be called directly from
> > > > > process context while another core runs the TX napi on the same channel
> > > > > (in interrupt context). I didn't observe any race on the RX path, but I
> > > > > believe it's possible with NAPI busy polling.
> > > > >
> > > > > In any case, I don't see the connection with LLTX. Maybe you want to
> > > > > say that the TX queue is safe for stmmac (because it is a non-LLTX
> > > > > driver), but might not be safe for LLTX drivers?
> > > >
> > > > LLTX drivers (mostly virtual drivers like tunnels...) can have multiple cpus
> > > > running ndo_start_xmit() concurrently. So any use of a 'shared syncp'
> > > > would be a bug.
> > > > These drivers usually use per-cpu stats, to avoid races and false
> > > > sharing anyway.
> > > >
> > > > I think you should split the structures into two separate groups, each
> > > > guarded with its own syncp.
> > > >
> > > > No extra spinlocks, no extra costs on 64bit arches...
> > > >
> > > > If TX completion can run in parallel with ndo_start_xmit(), then
> > > > clearly we have to split stmmac_txq_stats in two halves:
> > >
> > > Oh, now I get it. Yes, that's much better, indeed.
> > >
> > > I mean, the counters have never been consistent (due to the race on the
> > > writer side), and nobody is concerned. So, there is no value in taking
> > > a consistent snapshot in stmmac_get_ethtool_stats().
> > >
> > > I'm going to rework and retest my patch. Thank you for pointing me in
> > > the right direction!
> > >
> > > Petr T
> > >
> > > > Also please note the conversion from u64 to u64_stats_t
> > >
> > > Noted. IIUC this will in turn close the update race on 64-bit by using
> > > an atomic type and on 32-bit by using a seqlock. Clever.
> > >
> > > Petr T
> > >
> > > > Very partial patch, only to show the split and new structure :
> > > >
> > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h
> > > > b/drivers/net/ethernet/stmicro/stmmac/common.h
> > > > index e3f650e88f82f927f0dcf95748fbd10c14c30cbe..702bceea5dc8c875a80f5e3a92b7bb058f373eda
> > > > 100644
> > > > --- a/drivers/net/ethernet/stmicro/stmmac/common.h
> > > > +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
> > > > @@ -60,16 +60,22 @@
> > > >  /* #define FRAME_FILTER_DEBUG */
> > > >
> > > >  struct stmmac_txq_stats {
> > > > -       u64 tx_bytes;
> > > > -       u64 tx_packets;
> > > > -       u64 tx_pkt_n;
> > > > -       u64 tx_normal_irq_n;
> > > > -       u64 napi_poll;
> > > > -       u64 tx_clean;
> > > > -       u64 tx_set_ic_bit;
> > > > -       u64 tx_tso_frames;
> > > > -       u64 tx_tso_nfrags;
> > > > -       struct u64_stats_sync syncp;
> > > > +/* First part, updated from ndo_start_xmit(), protected by tx queue lock */
> > > > +       struct u64_stats_sync syncp_tx;
> > > > +       u64_stats_t tx_bytes;
> > > > +       u64_stats_t tx_packets;
> > > > +       u64_stats_t tx_pkt_n;
> > > > +       u64_stats_t tx_tso_frames;
> > > > +       u64_stats_t tx_tso_nfrags;
> > > > +
> > > > +/* Second part, updated from TX completion (protected by NAPI poll logic) */
> > > > +       struct u64_stats_sync syncp_tx_completion;
> > > > +       u64_stats_t napi_poll;
> > > > +       u64_stats_t tx_clean;
> > > > +       u64_stats_t tx_set_ic_bit;
> >
> > Unfortunately, this field is also updated from ndo_start_xmit():
> >
> > 4572)     if (set_ic)
> > 4573)             txq_stats->tx_set_ic_bit++;
> >
> > I feel it would be a shame to introduce a spinlock just for this one
> > update. But I think the field could be converted to an atomic64_t.
> >
> > Which raises a question: Why aren't all stat counters simply atomic64_t? There
> > is no guarantee that the reader side takes a consistent snapshot
> > (except on 32-bit). So, why do we even bother with u64_stats_sync?
>
> This infra was added to have no _lock_ overhead on 64bit arches.
>
> If a counter must be updated from multiple cpus (regardless of 32/64
> bit kernel), then use atomic64_t

Or use two different u64_stats_t (each guarded by a different syncp),
then fold them at stats gathering time.

>
> >
> > Is it merely because u64_stats_add() should be cheaper than
> > atomic64_add()? Or is there anything else I'm missing? If yes, does it
> > invalidate my proposal to convert tx_set_ic_bit to an atomic64_t?
>
> atomic64_add() is expensive, especially in contexts where updates are
> already guarded by a spinlock or something.