lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 16 Sep 2013 17:22:09 +0100 From: Russell King - ARM Linux <linux@....linux.org.uk> To: Thomas Petazzoni <thomas.petazzoni@...e-electrons.com> Cc: Willy Tarreau <w@....eu>, Andrew Lunn <andrew@...n.ch>, Jason Cooper <jason@...edaemon.net>, netdev@...r.kernel.org, Ethan Tuttle <ethan@...antuttle.com>, Ezequiel Garcia <ezequiel.garcia@...e-electrons.com>, Gregory Clément <gregory.clement@...e-electrons.com>, linux-arm-kernel@...ts.infradead.org Subject: Re: mvneta: oops in __rcu_read_lock on mirabox On Mon, Sep 16, 2013 at 05:51:52PM +0200, Thomas Petazzoni wrote: > Willy, Ethan, > > On Mon, 16 Sep 2013 08:50:47 +0200, Willy Tarreau wrote: > > > I'm currently testing on 3.11.1 (which I had here) and am not getting > > any issue after 50M packets. My kernel is running in thumb mode and > > without SMP. > > > > Ethan, we'll need your config I guess. > > Can both of you also report the U-Boot version you're using, and the > SoC revision (it's visible in the U-Boot output). Maybe Globalscale is > shipping Mirabox with a different version of the bootloader, or some > hardware difference, that is causing problems? (I'm just speculating > here, but another user already reported having issues with his Mirabox, > and Russell King analyzed the oops as very likely being hardware > problems). One seemed to be a single bit error in an instruction inside the kernel image. The other was what seems to be an impossible abort. I still don't see how we could end up with a prefetch abort inside memset() due to the kernel domain being inaccessible, but still be able to get an oops out, especially when we dump out the memory for the faulting instruction by accessing that memory via that apparantly inaccessible domain while running the code which dumps that memory also under this apparantly inaccessible domain. If the domain containing the kernel really was inaccessible, the system would be completely dead. The only possibilities I can come up with for that is that abort was caused by something spurious happening at the hardware level causing corruption of the instruction TLB (corrupting the domain index stored in the I-TLB) or other CPU control hardware causing it to spuriously generate that fault. As the domain field in the page table L1 entries covers bit 8, and the single bit error with the instruction was also bit 8, maybe there's a design weakness on data line bit 8 causing marginal operation. To add to this, the abort given in this report gives an IFSR value of 0x409, which equates to "Synchronous parity error on memory access" in ARMv7. The other value (0x400) equates to "TLB conflict abort" which can only happen with LPAE support enabled... So this is just getting more weird! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists