linux-kernel - Re: AMD 8132 parity issue causes interrupt storms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID:  <499F4E56.4050305@gmail.com>
Date:	Fri, 20 Feb 2009 18:44:06 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org
Subject:  Re: AMD 8132 parity issue causes interrupt storms

Mr. Berkley Shands wrote:
> It seems that the 8132 should be blacklisted :-)
> 
> INT-A will be asserted forever if any channel sees a parity error.
> This can be blocked by several means;
> 
> 1) setpci -s <bus address of 8132> 5.b=05   /* disable interrupts from 
> the bridge */
> This is the I don't see you method.
> 
> Shouldn't the interrupt handler (is there one?) trap and clear this?
> Shouldn't the kernel at least report this error and reset those bits?

What's enabling this interrupt generation? Interrupting on parity errors 
is not part of the PCI spec. Unless there's some driver that's set up to 
handle these interrupts, whoever's enabling them shouldn't be..

> 
> All,
> 
> OK, here's what I know so far.  The interrupt storm is coming from the 
> parity error detector in the 8132.  The parity error is reported in two 
> locations using sticky bits:
> 
> 0x1c bits 31 and 24
>   Here there seems to be some differentiation between which party 
> detected the parity error.  The 8132 spec is pretty vague here (see page 
> 75) but it looks like the 8132 is detecting a parity error from the HBA 
> not the other way around.
> 0x80 bit 0
>   Here it just states that someone asserted the PERR_L signal, no 
> distinction on who did it.
> 
> All these bits are write-one-to-clear.  If 0x80 bit 0 is cleared, the 
> storm stops.  Clearly the OS does not know how to handle these 
> conditions and the error flag is left on while the interrupt is 
> continuously handled.
> 
> One way to handle this is to set 0x48 bit 19 to 0.  This prevents the 
> 8132 from interrupting when 0x80 bit 0 is set.
> 
> A much better way to handle this is to have the interrupt handler 
> actually check the error bits on the 8132 when it is called.  This would 
> slow down the interrupt handler, but actually give us a much better 
> visibility into this problem (when, where and how often this happens).  
> The irritating thing here is that this is chipset dependent.  The 
> interrupt handler would have to know what PCI-X chipset it was talking 
> through to know how to handle this (way to go AMD).
> 
> The really odd thing is that the parity error is reported through INTB 
> on the 8132.  The spec claims that fatal errors (the category they put 
> PERR in) go to INTB while hot plug conditions trigger INTA.  Masking off 
> fatal errors in the IOAPIC turns off the storm too.  I have no idea why 
> this is showing up on INTA.
> 
> Berkley
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/