lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070329194604.GH4892@waste.org>
Date:	Thu, 29 Mar 2007 14:46:04 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Mariusz Kozłowski <m.kozlowski@...land.pl>
Cc:	Adrian Bunk <bunk@...sta.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, john stultz <johnstul@...ibm.com>,
	David Miller <davem@...emloft.net>
Subject: Re: 2.6.21-rc5-mm1

On Thu, Mar 29, 2007 at 08:55:25PM +0200, Mariusz Kozłowski wrote:
> > > > > > > 	I run 2.6.21-rc4-mm1 with no hangs for a week.
> > > > > > > Then when 2.6.21-rc5-mm1 showed up so I switched to it. Unfortunately
> > > > > > > today my laptop hunged twice in a similar way as described here:
> > > > > > > 
> > > > > > > http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1165
> > > > > > 
> > > > > > It's not good that we went backwards between those two releases.
> > > > > > 
> > > > > > > The difference is that it happened when I closed the lid in my laptop.
> > > > > > > When reopend it the box was frozen (ACPI?). Again disk I/O was dead
> > > > > > > so nothing was found in syslog.
> > > > > > 
> > > > > > Adrian, does this look like any of the bugs whcih you're monitoring?
> > > > > >...
> > > > > 
> > > > > Is it also present in 2.6.21-rc5?
> > > > 
> > > > Don't know. I usualy test -mm series. Will test tommorow and let you know
> > > > after some reasonable uptime.
> > > > 
> > > > > Is it also present with CONFIG_NO_HZ=n?
> > > > 
> > > > Don't know. Did not try recently. Will let you know.
> > > > 
> > > > It takes time as these hangs are not easy to trigger. With 2.6.21-rc2-mm1
> > > > it was easy -> push the system and watch it die in minutes. With
> > > > 2.6.21-rc5-mm1 it takes hours (3 hangs in ~15 hours) and not sure how to
> > > > trigger it. It just happens from time to time.
> > > 
> > > Ok. CONGIG_NO_HZ=n and uptime ~12 hours, netconsole loaded, and no hangs
> > > ... until I moved my laptop. The same scenario happened yesterday during the
> > > last hang. I started playing and repeating some steps which I naturally do when
> > > I move my laptop.
> > > 
> > > An hour later I came to this -> steps to hang 2.6.21-rc5-mm1 on my laptop:
> > > 1. boot the system and login as root
> > > 2. load netconsole (insmod netconole.ko netconsole=blah... blah...)
> > > 3. unplug the cable (link goes down)
> > > 4. system is frozen until forced to reboot
> > > 
> > > This is verified and repeatable _every_ single time I tried. Unfortunately
> > > the last thing seen on the screen before system is frozen is 'eth0: link down'.
> > > So my guess is that when hunting for hangs I found something else that can hang
> > > my laptop (netconsole that is).
> > 
> > Which NIC do you have? Odds are that the driver is doing that printk
> > while holding a driver-internal spinlock that causes it to deadlock
> > when netpoll tries to send that message out.
> 
> 8139too was in use. This is from lspci:
> 
> 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
>         Subsystem: Sony Corporation Unknown device 8158
>         Flags: bus master, medium devsel, latency 64, IRQ 11
>         I/O ports at 9c00 [size=256]
>         Memory at f0404c00 (32-bit, non-prefetchable) [size=512]
>         Capabilities: [50] Power Management version 2

Yep, that's a known problem with this driver:

http://lkml.org/lkml/2007/2/17/222

This was a known theoretical problem with netconsole from the start
but I'm actually not aware of any other drivers that have run into it.
My preferred fix is to create a netconsole_disable/enable API for
drivers that have this sort of reentrancy problem.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ