lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48FDFB97.1050208@suse.de>
Date:	Tue, 21 Oct 2008 17:56:07 +0200
From:	Stefan Assmann <sassmann@...e.de>
To:	"M. Vefa Bicakci" <bicave@...eronline.com>
Cc:	Sven-Thorsten Dietrich <sdietrich@...ell.com>,
	Olaf Dabrunz <odabrunz@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: Regression in 2.6.27: "irq 18: nobody cared" on Toshiba Satellite
 A100

M. Vefa Bicakci wrote:
> Stefan Assmann wrote:
>> M. Vefa Bicakci wrote:
>>> Sven-Thorsten Dietrich wrote:
>>>> On Sun, 2008-10-19 at 10:06 -0400, M. Vefa Bicakci wrote:
>>>>> Hello,
>>>>>
>>>>> As you might guess from the subject line, since I started to use 2.6.27-rcX
>>>>> series, I began to get "irq 18: nobody cared" messages in dmesg. Currently I am
>>>>> using 2.6.27.2 with Sidux on this laptop, which is a Toshiba Satellite A100.
>>>>> I have reproduced this problem with vanilla and sidux's kernels.
>>>>>
>>>> Can you provide the contents of /proc/interrupts? 
>> Could you provide the following:
>> - output of lspci -nn
>> - dmesg output with kernel commandline option apic=debug
> 
> The dmesg output with "apic=debug" is appended to this e-mail. Please note that
> since the regression needs quite a few hours with the computer doing nothing to
> show itself, this dmesg output does not include the "nobody cared" message. If
> you need the dmesg output to contain the "nobody cared" message, then please let
> me know.

No that is not necessary for now. I was curious how many IO-APICs are
present in your system and there's only one. So it's not a routing
problem with multiple IO-APICs. I just wanted to make sure of that.

To get some more information I have some more things to suggest:
1. try the noapic option
2. try the irqpoll option
3. try the latest 2.6.26 kernel to verify this has been introduced with
2.6.27

I know this takes some time to reproduce so try the following patch,
it might trigger the problem more frequently.

--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -200,7 +200,7 @@ void note_interrupt(unsigned int irq, st
 		return;

 	desc->irq_count = 0;
-	if (unlikely(desc->irqs_unhandled > 99900)) {
+	if (unlikely(desc->irqs_unhandled > 999)) {
 		/*
 		 * The interrupt is stuck
 		 */

> 
> Here's the output of "lspci -nn":
> 
> === 8< ===
> 00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub [8086:27a0] (rev 03)
> 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
> 00:02.1 Display controller [0380]: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller [8086:27a6] (rev 03)
> 00:1b.0 Audio device [0403]: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller [8086:27d8] (rev 02)
> 00:1c.0 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 [8086:27d0] (rev 02)
> 00:1c.1 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 [8086:27d2] (rev 02)
> 00:1c.2 PCI bridge [0604]: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 [8086:27d4] (rev 02)
> 00:1d.0 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 [8086:27c8] (rev 02)
> 00:1d.1 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 [8086:27c9] (rev 02)
> 00:1d.2 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 [8086:27ca] (rev 02)
> 00:1d.3 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 [8086:27cb] (rev 02)
> 00:1d.7 USB Controller [0c03]: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller [8086:27cc] (rev 02)
> 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev e2)
> 00:1f.0 ISA bridge [0601]: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge [8086:27b9] (rev 02)
> 00:1f.2 IDE interface [0101]: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller [8086:27c4] (rev 02)
> 00:1f.3 SMBus [0c05]: Intel Corporation 82801G (ICH7 Family) SMBus Controller [8086:27da] (rev 02)
> 05:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)
> 07:06.0 CardBus bridge [0607]: Texas Instruments PCIxx12 Cardbus Controller [104c:8039]
> 07:06.1 FireWire (IEEE 1394) [0c00]: Texas Instruments PCIxx12 OHCI Compliant IEEE 1394 Host Controller [104c:803a]
> 07:06.2 Mass storage controller [0180]: Texas Instruments 5-in-1 Multimedia Card Reader (SD/MMC/MS/MS PRO/xD) [104c:803b]
> 07:06.3 SD Host controller [0805]: Texas Instruments PCIxx12 SDA Standard Compliant SD Host Controller [104c:803c]
> 07:08.0 Ethernet controller [0200]: Intel Corporation PRO/100 VE Network Connection [8086:1092] (rev 02)
> === >8 ===
> 
>  
>>> My computer is currently in the "nobody cared" state. Here are the current
>>> contents of /proc/interrupts:
>>>
>>> --- 8< ---
>>>            CPU0       CPU1       
>>>   0:   45249492      60399   IO-APIC-edge      timer
>>>   1:      25451          0   IO-APIC-edge      i8042
>>>   8:          1          0   IO-APIC-edge      rtc0
>>>   9:      36514          0   IO-APIC-fasteoi   acpi
>>>  12:    1147983       2103   IO-APIC-edge      i8042
>>>  14:     170245          0   IO-APIC-edge      ata_piix
>>>  15:     558085        819   IO-APIC-edge      ata_piix
>>>  16:        508          0   IO-APIC-fasteoi   uhci_hcd:usb5, i915@pci:0000:00:02.0
>>>  17:       1353          0   IO-APIC-fasteoi   firewire_ohci
>>>  18:     300158          1   IO-APIC-fasteoi   uhci_hcd:usb4, tifm_7xx1, yenta
>>>  19:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
>>>  20:      26606          2   IO-APIC-fasteoi   eth0
>>>  22:    3206279          1   IO-APIC-fasteoi   HDA Intel
>>>  23:          3          0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb2
>>> 220:    2105545          0   PCI-MSI-edge      iwl3945
>>> NMI:          0          0   Non-maskable interrupts
>>> LOC:    5971997   27874747   Local timer interrupts
>>> RES:     938710    1791498   Rescheduling interrupts
>>> CAL:     138135     180813   function call interrupts
>>> TLB:      48455      64413   TLB shootdowns
>>> TRM:          0          0   Thermal event interrupts
>>> SPU:          0          0   Spurious interrupts
>>> ERR:          0
>>> MIS:          0
>>> --- >8 ---
>> Nothing unusual at first glance. How long did the system run?
> 
> The computer had been booted at 15:24 on October 18th. I got the "nobody cared"
> message at 05:30 (am) on October 19th. The contents of "/proc/interrupts" that
> are quoted above were generated at about 12:40 (afternoon) on October 19th.
> 
> There is one more thing I would like add. Last night, before going to sleep,
> I wrote a simple bash script which, every two seconds, recorded the contents
> of "/proc/interrupts" to a directory into a "ramfs" mount-point. (I chose "ramfs"
> because I thought that "ramfs" would not interfere with the "swapper" process
> which is shown as the reason in all of the "nobody cared" messsages.)
> 
> Interestingly, when I woke up today, the dmesg contents did *not* contain any
> "nobody cared" messages. So I hit Ctrl-C and ended the execution of the script.
> I then left the computer alone and went on to do other things. And guess what,
> about four-five hours after I ended the script, I got the "nobody cared" message.
> So it looks like the computer really needs to be doing "nothing" in order to get
> this "nobody cared" message.

I'm not sure if it's related to doing "nothing", it's more likely to be
a coincidence. Try the patch I mentioned earlier and see if that gets
you to the problem sooner.

> 
> Unfortunately, all of this happened without the "apic=debug" command line option.
> Tonight, I am going to leave the computer on with the "apic=debug" command line
> option and without anything running.
> 
> Finally, I would like to say that I appreciate your help.

You're welcome!

> 
> Regards,
> 
> M. Vefa Bicakci
> 
> Note: dmesg output with "apic=debug" follows:
[snip dmesg]

  Stefan

-- 
Stefan Assmann          | SUSE LINUX Products GmbH
Software Engineer       | Maxfeldstr. 5, D-90409 Nuernberg
Mail : sassmann@...e.de | GF: Markus Rex, HRB 16746 (AG Nuernberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ