lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240507024258.07980f55@yea>
Date: Tue, 7 May 2024 02:42:58 +0200
From: Erhard Furtner <erhard_f@...lbox.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
 linux-kernel@...r.kernel.org
Subject: Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370
 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel
 v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

On Mon, 6 May 2024 07:26:45 -0700
Jakub Kicinski <kuba@...nel.org> wrote:

> On Sun, 5 May 2024 23:27:13 +0200 Erhard Furtner wrote:
> > > On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:    
> > > > With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> > > > net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> > > > CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> > > > warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> > > > Happens more often than not (6-7 out of 10 times booting):      
> > > 
> > > Could you try with LOCKDEP enabled?
> > > I wonder if irqs_disabled() behaves differently than we expect.    
> > 
> > Ok, after a few tries I got a "BUG: spinlock wrong CPU on CPU#0, swapper/0/1" LOCKDEP hit. But this does not happen every time when I get the netpoll_send WARNING:  
> 
> Oh, can you try deleting the gem_poll_controller() function?
> Unhook it from ndo_poll_controller and remove it completely.

Ok, this is the resulting diff:

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 9bd1df8308d2..d3a2fbb14140 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -949,17 +949,6 @@ static irqreturn_t gem_interrupt(int irq, void *dev_id)
        return IRQ_HANDLED;
 }
 
-#ifdef CONFIG_NET_POLL_CONTROLLER
-static void gem_poll_controller(struct net_device *dev)
-{
-       struct gem *gp = netdev_priv(dev);
-
-       disable_irq(gp->pdev->irq);
-       gem_interrupt(gp->pdev->irq, dev);
-       enable_irq(gp->pdev->irq);
-}
-#endif
-
 static void gem_tx_timeout(struct net_device *dev, unsigned int txqueue)
 {
        struct gem *gp = netdev_priv(dev);
@@ -2839,9 +2828,6 @@ static const struct net_device_ops gem_netdev_ops = {
        .ndo_change_mtu         = gem_change_mtu,
        .ndo_validate_addr      = eth_validate_addr,
        .ndo_set_mac_address    = gem_set_mac_address,
-#ifdef CONFIG_NET_POLL_CONTROLLER
-       .ndo_poll_controller    = gem_poll_controller,
-#endif
 };
 
 static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)


And indeed without gem_poll_controller() I don't hit the "WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING: CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x44" or the according lockdep bug at boot!

Re-booted the machine about 20 times without anything suspicious showing up in the dmesg. With the unpatched kernel I got the WARNING at the 2nd reboot.

What I still get with 'modprobe -v dev_addr_lists_test', even with gem_poll_controller() removed is:

[...]
KTAP version 1
1..1
    KTAP version 1
    # Subtest: dev-addr-list-test
    # module: dev_addr_lists_test
    1..6

====================================
WARNING: kunit_try_catch/1770 still has locks held!
6.9.0-rc6-PMacG4-dirty #5 Tainted: G        W        N
------------------------------------
1 lock held by kunit_try_catch/1770:
 #0: c0dbfce4 (rtnl_mutex){....}-{3:3}, at: dev_addr_test_init+0xbc/0xc8 [dev_addr_lists_test]

stack backtrace:
CPU: 0 PID: 1770 Comm: kunit_try_catch Tainted: G        W        N 6.9.0-rc6-PMacG4-dirty #5
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[f3749ef0] [c07c2bec] dump_stack_lvl+0x80/0xac (unreliable)
[f3749f10] [c004fe64] do_exit+0x5b4/0x834
[f3749f60] [c006d848] kthread_complete_and_exit+0x0/0x28
[f3749f80] [c006d870] kthread+0x0/0xe8
[f3749fa0] [bebf0cf4] kunit_try_catch_run+0x0/0x15c [kunit]
[f3749fc0] [c006d954] kthread+0xe4/0xe8
[f3749ff0] [c0015304] start_kernel_thread+0x10/0x14
    ok 1 dev_addr_test_basic
    ok 2 dev_addr_test_sync_one
    ok 3 dev_addr_test_add_del
    ok 4 dev_addr_test_del_main
    ok 5 dev_addr_test_add_set
    ok 6 dev_addr_test_add_excl
# dev-addr-list-test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 dev-addr-list-test
[...]

Regards,
Erhard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ