lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 10 Dec 2011 18:58:50 +0100
From:	Clemens Ladisch <clemens@...isch.de>
To:	Jeroen Van den Keybus <jeroen.vandenkeybus@...il.com>
CC:	"Huang, Shane" <Shane.Huang@....com>,
	Borislav Petkov <bp@...64.org>,
	"Nguyen, Dong" <Dong.Nguyen@....com>, linux-kernel@...r.kernel.org
Subject: Re: Unhandled IRQs on AMD E-450

Jeroen Van den Keybus wrote:
> [...]
> - CPU services the IRQ, and does at least one (slow) PCI read to have
> the device deassert its IRQ line. In practice, more PCI read/writes
> are needed, requiring the bridge to do some PCIe traffic generation.
> - Bridge sees the IRQ line trasition and signals Deassert, This
> message has only a few usecs to arrive at the I/O-APIC.
> - _However_ the CPU has by large already handled the IRQ and gets
> interrupted again before the Deassert ever gets out. The resulting PCI
> bus traffic further delays the Deassert message (due to e.g. PCIe
> transmit credit exhaustion).
>
> My idea is that if we would not immediately hammer the bridge with
> PCIe transactions, the Deassert message may eventually arrive ?

PCIe messages are somewhat ordered; posted memory writes are allowed,
but IIRC a read transaction serializes all previous and following
transactions.  Assuming that all involved devices work correctly.

> Also, is there any control by Linux of the credits issued ?

I don't think these can be controlled by software.  The hardware is
supposed to get them correct.

> I therefore patched the polling system by detecting a stuck IRQ
> already after 10 unserviced IRQs. Then the polling system will take
> over for 50 cycles (5 seconds), after which the IRQ is reenabled.
>
> [ 1607.941232] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1613.040185] Reenabling IRQ.
> [ 1908.541558] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1913.640088] Reenabling IRQ.
> [ 2319.361659] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2324.460064] Reenabling IRQ.
> [ 2782.285470] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2787.384222] Reenabling IRQ.
> [ 3485.689347] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 3490.788079] Reenabling IRQ.
> [ 3810.336883] irq 19: nobody cared (try booting with the "irqpoll" option)

So the IRQ _does_ get unstuck eventually; I didn't expact that.

So either the ASM1083 delays its Deassert messages, or it is just way
too slow to react to changes in its PCI interrupt line inputs.

I'd guess that you can make the pollig time shorter; a few milliseconds
should be enough.


Your patch might be useful to others afflicted with this chip.  Could
you publish it?


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ