linux-kernel - Re: [PATCH net v3] net: stmmac: protect updates of 64-bit statistics counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240228120308.48d6a9c2@meshulam.tesarici.cz>
Date: Wed, 28 Feb 2024 12:03:08 +0100
From: Petr Tesařík <petr@...arici.cz>
To: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>, Eric
 Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>,
 Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, Jisheng
 Zhang <jszhang@...nel.org>, Alexandre Torgue
 <alexandre.torgue@...s.st.com>, Jose Abreu <joabreu@...opsys.com>, Maxime
 Coquelin <mcoquelin.stm32@...il.com>, Chen-Yu Tsai <wens@...e.org>, Jernej
 Skrabec <jernej.skrabec@...il.com>, Samuel Holland <samuel@...lland.org>,
 "open list:STMMAC ETHERNET DRIVER" <netdev@...r.kernel.org>, "moderated
 list:ARM/STM32 ARCHITECTURE" <linux-stm32@...md-mailman.stormreply.com>,
 "moderated list:ARM/STM32 ARCHITECTURE"
 <linux-arm-kernel@...ts.infradead.org>, open list
 <linux-kernel@...r.kernel.org>, "open list:ARM/Allwinner sunXi SoC support"
 <linux-sunxi@...ts.linux.dev>, Marc Haber <mh+netdev@...schlus.de>, Andrew
 Lunn <andrew@...n.ch>, Florian Fainelli <f.fainelli@...il.com>,
 stable@...r.kernel.org, alexis.lothore@...tlin.com, Guenter Roeck
 <linux@...ck-us.net>
Subject: Re: [PATCH net v3] net: stmmac: protect updates of 64-bit
 statistics counters

On Wed, 28 Feb 2024 07:19:56 +0100
"Linux regression tracking (Thorsten Leemhuis)" <regressions@...mhuis.info> wrote:

> Net maintainers, chiming in here, as it seems handling this regression
> stalled.

Indeed, I was too busy with sandbox mode...

> On 13.02.24 16:52, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 4:26 PM Guenter Roeck <linux@...ck-us.net> wrote:  
> >> On Tue, Feb 13, 2024 at 03:51:35PM +0100, Eric Dumazet wrote:  
> >>> On Tue, Feb 13, 2024 at 3:29 PM Jisheng Zhang <jszhang@...nelorg> wrote:  
> >>>> On Sun, Feb 11, 2024 at 08:30:21PM -0800, Guenter Roeck wrote:  
> >>>>> On Sat, Feb 03, 2024 at 08:09:27PM +0100, Petr Tesarik wrote:  
> >>>>>> As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
> >>>>>> u64_stats_sync must ensure mutual exclusion, or one seqcount update could
> >>>>>> be lost on 32-bit platforms, thus blocking readers forever. Such lockups
> >>>>>> have been observed in real world after stmmac_xmit() on one CPU raced with
> >>>>>> stmmac_napi_poll_tx() on another CPU.
> >>>>>>
> >>>>>> To fix the issue without introducing a new lock, split the statics into
> >>>>>> three parts:
> >>>>>>
> >>>>>> 1. fields updated only under the tx queue lock,
> >>>>>> 2. fields updated only during NAPI poll,
> >>>>>> 3. fields updated only from interrupt context,
> >>>>>>
> >>>>>> Updates to fields in the first two groups are already serialized through
> >>>>>> other locks. It is sufficient to split the existing struct u64_stats_sync
> >>>>>> so that each group has its own.
> >>>>>>
> >>>>>> Note that tx_set_ic_bit is updated from both contexts. Split this counter
> >>>>>> so that each context gets its own, and calculate their sum to get the total
> >>>>>> value in stmmac_get_ethtool_stats().
> >>>>>>
> >>>>>> For the third group, multiple interrupts may be processed by different CPUs
> >>>>>> at the same time, but interrupts on the same CPU will not nest. Move fields
> >>>>>> from this group to a newly created per-cpu struct stmmac_pcpu_stats.
> >>>>>>
> >>>>>> Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary")
> >>>>>> Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
> >>>>>> Cc: stable@...r.kernel.org
> >>>>>> Signed-off-by: Petr Tesarik <petr@...arici.cz>  
> >>>>>
> >>>>> This patch results in a lockdep splat. Backtrace and bisect results attached.
> >>>>>
> >>>>> ---
> >>>>> [   33.736728] ================================
> >>>>> [   33.736805] WARNING: inconsistent lock state
> >>>>> [   33.736953] 6.8.0-rc4 #1 Tainted: G                 N
> >>>>> [   33.737080] --------------------------------
> >>>>> [   33.737155] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> >>>>> [   33.737309] kworker/0:2/39 [HC1[1]:SC0[2]:HE0:SE0] takes:
> >>>>> [   33.737459] ef792074 (&syncp->seq#2){?...}-{0:0}, at: sun8i_dwmac_dma_interrupt+0x9c/0x28c
> >>>>> [   33.738206] {HARDIRQ-ON-W} state was registered at:
> >>>>> [   33.738318]   lock_acquire+0x11c/0x368
> >>>>> [   33.738431]   __u64_stats_update_begin+0x104/0x1ac
> >>>>> [   33.738525]   stmmac_xmit+0x4d0/0xc58  
> >>>>
> >>>> interesting lockdep splat...
> >>>> stmmac_xmit() operates on txq_stats->q_syncp, while the
> >>>> sun8i_dwmac_dma_interrupt() operates on pcpu's priv->xstats.pcpu_stats
> >>>> they are different syncp. so how does lockdep splat happen.  
> >>>
> >>> Right, I do not see anything obvious yet.  
> >>
> >> Wild guess: I think it maybe saying that due to
> >>
> >>         inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> >>
> >> the critical code may somehow be interrupted and, while handling the
> >> interrupt, try to acquire the same lock again.  
> > 
> > This should not happen, the 'syncp' are different. They have different
> > lockdep classes.
> > 
> > One is exclusively used from hard irq context.
> > 
> > The second one only used from BH context.  
> 
> Alexis Lothoré hit this now as well, see yesterday report in this
> thread; apart from that nothing seem to have happened for two weeks now.
> The change recently made it to some stable/longterm kernels, too. Makes
> me wonder:
> 
> What's the plan forward here? Is this considered to be a false positive?

Although my system has run stable for a couple of months, I am hesitant
to call it a false positive.

> Or a real problem?

That's what I think. But I would have to learn a lot about the network
stack to understand what exactly happens here.

It may go faster if somebody else on the Cc can give me a hint where to
start looking based on the lockdep warning.

Petr T