lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091125213546.GA6168@rhlx01.hs-esslingen.de>
Date:	Wed, 25 Nov 2009 22:35:46 +0100
From:	Andreas Mohr <andi@...as.de>
To:	davem@...emloft.net
Cc:	zambrano@...adcom.com, dave@...dillows.org, mb@...sch.de,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll
	error message

Hello all,

my OpenWrt MIPSEL ASUS WL-500gP v2 kept locking up during driver
debugging with kobject debugging turned on (when debugging non-working
drivers such as USB, ftdi_sio and usb_audio).
Turns out that it's not very helpful to have a netconsole-deployed
network driver's interrupt handler message spew noise permanently
(see comment in patch).

See
http://bugzilla.kernel.org/show_bug.cgi?id=14691
for background information.

Tested on 2.6.30.9 and working (now normal debug messages manage
to come through, too, yet box still locked up, simply due to excessive
_normal_ log messages such as kobject logging, but that's _expected_
and user-correctable).

checkpatch.pl'd, applies cleanly to -rc8.
Likely -stable candidate material.

r8169 (and perhaps other netconsole-capable drivers?)
might need such a correction, too, if the
                        if (likely(napi_schedule_prep(&tp->napi)))
                                __napi_schedule(&tp->napi);
                        else if (netif_msg_intr(tp)) {
                                printk(KERN_INFO "%s: interrupt %04x in poll\n",
                                dev->name, status);
                        }
is similarly problematic.
BTW: this patch here is to be seen separate from the r8169 interrupt handler
rework as discussed at "r8169: avoid losing MSI interrupts", since even with a
"working" interrupt handler (i.e. one that closes the "new interrupt" race window,
which the one in b44 currently probably doesn't) such error messages
have a fatal feedback loop in the case of netconsole deployment.

Thanks,

Signed-off-by: Andreas Mohr <andi@...as.de>

--- linux-2.6.30.9/drivers/net/b44.c.orig	2009-11-25 20:49:48.000000000 +0100
+++ linux-2.6.30.9/drivers/net/b44.c	2009-11-25 21:26:34.000000000 +0100
@@ -913,8 +913,18 @@
 			__b44_disable_ints(bp);
 			__napi_schedule(&bp->napi);
 		} else {
-			printk(KERN_ERR PFX "%s: Error, poll already scheduled\n",
-			       dev->name);
+			/* netconsole fix!!:
+			 * without ratelimiting, this message would:
+			 * immediately find its way out to b44 netconsole ->
+			 * new IRQ -> re-encounter failed napi_schedule_prep()
+			 * -> error message -> ad nauseam -> box LOCKUP.
+			 * See thread "r8169: avoid losing MSI interrupts"
+			 * for further inspiration. */
+			if (printk_ratelimit())
+				printk(KERN_ERR PFX
+					"%s: Error, poll already scheduled; "
+					"istat 0x%x, imask 0x%x\n",
+						dev->name, istat, imask);
 		}
 
 irq_ack:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ