lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 4 Dec 2011 14:36:32 +0100
From:	Jeroen Van den Keybus <jeroen.vandenkeybus@...il.com>
To:	Clemens Ladisch <clemens@...isch.de>
Cc:	"Huang, Shane" <Shane.Huang@....com>,
	Borislav Petkov <bp@...64.org>,
	"Nguyen, Dong" <Dong.Nguyen@....com>, linux-kernel@...r.kernel.org
Subject: Re: Unhandled IRQs on AMD E-450

> You previously said that unloading e1000 made things better.  Did this
> affect both IRQs 16 and 19?

No, this only affects IRQ 19. IRQ 16 usually dies within 15min..2hrs.

> Can you check if this problem (on either 16 or 19) happens when you are
> not using the e1000 port (i.e., unplugged)?

The problem occurs with the e1000 idle (unplugged) and under heavy
usage (plugged). Time to failure is also in the same order of
magnitude (i.e. 1..30 minutes). As of now, I never had IRQ 19 disabled
with the e1000 removed. The e1000 delivered with Ubuntu isn't
particularly recent (7.3.21-k8-NAPI). Before I suspected a kernel
problem, I already tried the 8.0.35 compiled from source obtained from
Intel. Exactly the same result: IRQ 19 gets banned.

> The /proc/interrupts doesn't show e1000, but lspci does.  ...?

You are right. I took that lspci after removing e1000, sorry for the
confusion. Please see the new /proc/interrupts:below.

> Does the problem occur without fglrx?

Good question. I'll try that immediately. Stand by.

> To get the AHCI interrupt away from IRQ 19, try the patch below.
> (But please don't show that ugly hack to any AMD guy. :)

I'll try that next too.

>> Is there any way of obtaining more output such as IO-APIC register
>> states to verify that it is indeed a stuck IRQ input line and not an
>> unsuccesful EOI ack ?

> In theory, lspci's "Status: ... INTx+" shows an active interrupt line.

Ok. In that case (taking the lspci from a failed system) no (listed)
device has INTx+.


Thanks,


J.


$ cat /proc/interrupts (with e1000 (eth1) still loaded - this dump is
after IRQ 19 is killed)

           CPU0       CPU1
  0:         45         26   IO-APIC-edge      timer
  1:          1          1   IO-APIC-edge      i8042
  5:          0          0   IO-APIC-edge      parport0
  7:          1          0   IO-APIC-edge
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          1          3   IO-APIC-edge      i8042
 16:        121        559   IO-APIC-fasteoi   firewire_ohci, hda_intel
 17:          3        110   IO-APIC-fasteoi   ehci_hcd:usb1,
ehci_hcd:usb2, ehci_hcd:usb3
 18:          0          4   IO-APIC-fasteoi   ohci_hcd:usb4,
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:     198169      11097   IO-APIC-fasteoi   ahci, eth1
 40:       3601         71   PCI-MSI-edge      eth0
 41:          0          0   PCI-MSI-edge      xhci_hcd
 42:          0          0   PCI-MSI-edge      xhci_hcd
 43:          0          0   PCI-MSI-edge      xhci_hcd
 44:          4        298   PCI-MSI-edge      hda_intel
 45:          0          3   PCI-MSI-edge      fglrx[0]@PCI:0:1:0
NMI:          0          0   Non-maskable interrupts
LOC:     231521     231457   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:      37942      34198   Rescheduling interrupts
CAL:        256        225   Function call interrupts
TLB:        309        243   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         26         26   Machine check polls
ERR:          1
MIS:          0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ