[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090717141552.GA3532@localhost.localdomain>
Date: Fri, 17 Jul 2009 10:15:52 -0400
From: Neil Horman <nhorman@...driver.com>
To: David Hill <hilld@...arystorm.net>
Cc: Andrew Morton <akpm@...ux-foundation.org>, netdev@...r.kernel.org,
bugzilla-daemon@...zilla.kernel.org,
bugme-daemon@...zilla.kernel.org
Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled
inkernel, computer crashes after 120seconds (approx)
On Fri, Jul 17, 2009 at 01:55:44AM -0400, David Hill wrote:
> Hi back,
> Look at bug 13219. I'm not sure the bug is related to NETCONSOLE.
> It may be with the NIC drivers or the tools miidiag/ethtool or anything
> else.
> The behavior of the system is random.
>
> I attached the NMI stack trace ... but for the kdump, I need to read a
> bit more about it and think I'll need to patch the kernel... will I ?
>
> Thanks again,
>
> Dave
>
Neither of the logs you attached in the associated bugs seem to have the NMI
lockup backtrace included. As for a kdump, you won't need to patch the kernel,
no, but depending on what kernel you're using, you may need to build the kernel
with CONFIG_CRASH and CONFIG_KEXEC turned on.
Neil
>
> ----- Original Message ----- From: "David Hill" <hilld@...arystorm.net>
> To: "Neil Horman" <nhorman@...driver.com>; "Andrew Morton"
> <akpm@...ux-foundation.org>
> Cc: <netdev@...r.kernel.org>; <bugzilla-daemon@...zilla.kernel.org>;
> <bugme-daemon@...zilla.kernel.org>
> Sent: Thursday, July 16, 2009 1:42 AM
> Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled
> inkernel, computer crashes after 120seconds (approx)
>
>
>> Will try that in the next few days... sorry for the delay. I was on
>> vacation for the last 2 weeks and thus, out of town :D
>>
>>
>>
>> ----- Original Message ----- From: "Neil Horman"
>> <nhorman@...driver.com>
>> To: "Andrew Morton" <akpm@...ux-foundation.org>
>> Cc: <netdev@...r.kernel.org>; <bugzilla-daemon@...zilla.kernel.org>;
>> <bugme-daemon@...zilla.kernel.org>; <hilld@...arystorm.net>
>> Sent: Tuesday, June 23, 2009 9:05 PM
>> Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled
>> inkernel, computer crashes after 120seconds (approx)
>>
>>
>>> On Tue, Jun 23, 2009 at 02:07:43PM -0700, Andrew Morton wrote:
>>>>
>>>> (switched to email. Please respond via emailed reply-to-all, not
>>>> via the
>>>> bugzilla web interface).
>>>>
>>>> On Wed, 17 Jun 2009 01:55:54 GMT
>>>> bugzilla-daemon@...zilla.kernel.org wrote:
>>>>
>>>> > http://bugzilla.kernel.org/show_bug.cgi?id=13553
>>>> >
>>>> > Summary: When NETCONSOLE is enabled in kernel,
>>>> computer > crashes
>>>> > after 120seconds (approx)
>>>> > Product: Networking
>>>> > Version: 2.5
>>>> > Kernel Version: 2.6.29.4, 2.6.30
>>>> > Platform: All
>>>> > OS/Version: Linux
>>>> > Tree: Mainline
>>>> > Status: NEW
>>>> > Severity: high
>>>> > Priority: P1
>>>> > Component: Other
>>>> > AssignedTo: acme@...stprotocols.net
>>>> > ReportedBy: hilld@...arystorm.net
>>>> > Regression: No
>>>> >
>>>> >
>>>>
>>>> > 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
>>>> > 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
>>>> > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
>>>> > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE
>>>> (rev > 01)
>>>> > 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB
>>>> (rev > 01)
>>>> > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
>>>> > 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>>> > 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>>> > 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1
>>>> Ethernet > Pro 100
>>>> > (rev 08)
>>>> > 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>> > RTL-8139/8139C/8139C+ (rev 10)
>>>> > 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128
>>>> RL/VR > AGP
>>>> >
>>>> > ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) >
>>>> [reply] -------
>>>> >
>>>> > With NETCONSOLE enabled, if I type:
>>>> > ethtool -s eth1 speed 100 duplex full autoneg on
>>>> >
>>>> > the computer freezes with kernel 2.6.29.4 and 2.6.30...
>>>> >
>>>> > I can reproduce it anytime you want.
>>>> >
>>>>
>>>> Interesting. I wonder what the significance is of the 120 seconds. I
>>>> see no such timers in e100.c. Does the networking core have timers on
>>>> such intervals?
>>>>
>>> My guess is the 120 seconds has less to do with the driver, and more
>>> to do with
>>> some other periodic event in the kernel that triggers a message
>>> getting written
>>> to the console, which in turn triggers whatever deadlock it is thats
>>> getting hit
>>> here. I imagine we could diagnose it pretty quick if a stack trace
>>> or vmcore
>>> could be captured on this. David, can you enable the NMI watchdog on
>>> this
>>> system to trigger a panic on the system after a deadlock? Then if
>>> you could
>>> enable a second serial console, or setup kdump to capture a vmcore on
>>> this
>>> system, we should be able to figure out whats going on. My guess is
>>> that in
>>> the e100 driver we're taking a lock in the ethtool set path, then calling
>>> printk, which winds up recursing into the driver, trying to take the
>>> same lock
>>> again. A stack trace will tell us for certain.
>>>
>>> Regards
>>> Neil
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@...r.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>>>
>>>
>>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists