lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YEXicy6+9MksdLZh@hirez.programming.kicks-ass.net>
Date:   Mon, 8 Mar 2021 09:38:11 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Ahmed S. Darwish" <a.darwish@...utronix.de>
Cc:     Jakub Kicinski <kuba@...nel.org>, erhard_f@...lbox.org,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: seqlock lockdep false positives?

On Sun, Mar 07, 2021 at 10:20:08AM +0100, Ahmed S. Darwish wrote:
> Hi Jakub,
> 
> On Wed, Mar 03, 2021 at 04:40:35PM -0800, Jakub Kicinski wrote:
> > Hi Ahmed!
> >
> > Erhard is reporting a lockdep splat in drivers/net/ethernet/realtek/8139too.c
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=211575
> >
> > I can't quite grasp how that happens it looks like it's the Rx
> > lock/syncp on one side and the Tx lock on the other side :S
> >
> > ================================
> > WARNING: inconsistent lock state
> > 5.12.0-rc1-Pentium4 #2 Not tainted
> > --------------------------------
> > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > c113c804 (&syncp->seq#2){?.-.}-{0:0}, at: rtl8139_poll+0x251/0x350
> > {IN-HARDIRQ-W} state was registered at:
> >   lock_acquire+0x239/0x2c5
> >   do_write_seqcount_begin_nested.constprop.0+0x1a/0x1f
> >   rtl8139_interrupt+0x346/0x3cb
> 
> That's really weird.
> 
> The only way I can see this happening is lockdep mistakenly treating
> both "tx_stats->syncp.seq" and "rx_stats->syncp.seq" as the same lockdep
> class key... somehow.
> 
> It is claiming that the softirq code path at rtl8139_poll() is acquiring
> the *tx*_stats sequence counter. But at rtl8139_poll(), I can only see
> the *rx*_stats sequence counter getting acquired.
> 
> I've re-checked where tx/rx stats sequence counters are initialized, and
> I see:
> 
>   static struct net_device *rtl8139_init_board(struct pci_dev *pdev)
>   {
> 	...
> 	u64_stats_init(&tp->rx_stats.syncp);
> 	u64_stats_init(&tp->tx_stats.syncp);
> 	...
>   }
> 
> which means they should have different lockdep class keys.  The
> u64_stats sequence counters are also initialized way before any IRQ
> handlers are registered.

Indeed, that's one area where inlines are very much not equivalent to
macros. Static variables in inline functions aren't exact, but they very
much do not get to be one per invocation.

Something like the below ought to be the right fix I think.

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index c6abb79501b3..e81856c0ba13 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -115,12 +115,13 @@ static inline void u64_stats_inc(u64_stats_t *p)
 }
 #endif
 
+#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
+#define u64_stats_init(syncp)	seqcount_init(&(syncp)->seq)
+#else
 static inline void u64_stats_init(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
-	seqcount_init(&syncp->seq);
-#endif
 }
+#endif
 
 static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
 {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ