netdev - Re: mvneta: oops in __rcu_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130916182450.639084c6@skate>
Date:	Mon, 16 Sep 2013 18:24:50 +0200
From:	Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Willy Tarreau <w@....eu>, Andrew Lunn <andrew@...n.ch>,
	Jason Cooper <jason@...edaemon.net>, netdev@...r.kernel.org,
	Ethan Tuttle <ethan@...antuttle.com>,
	Ezequiel Garcia <ezequiel.garcia@...e-electrons.com>,
	Gregory Clément 
	<gregory.clement@...e-electrons.com>,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: mvneta: oops in __rcu_read_lock on mirabox

Russell,

On Mon, 16 Sep 2013 17:22:09 +0100, Russell King - ARM Linux wrote:

> One seemed to be a single bit error in an instruction inside the kernel
> image.  The other was what seems to be an impossible abort.
> 
> I still don't see how we could end up with a prefetch abort inside memset()
> due to the kernel domain being inaccessible, but still be able to get
> an oops out, especially when we dump out the memory for the faulting
> instruction by accessing that memory via that apparantly inaccessible
> domain while running the code which dumps that memory also under this
> apparantly inaccessible domain.  If the domain containing the kernel
> really was inaccessible, the system would be completely dead.
> 
> The only possibilities I can come up with for that is that abort was
> caused by something spurious happening at the hardware level causing
> corruption of the instruction TLB (corrupting the domain index stored
> in the I-TLB) or other CPU control hardware causing it to spuriously
> generate that fault.
> 
> As the domain field in the page table L1 entries covers bit 8, and the
> single bit error with the instruction was also bit 8, maybe there's a
> design weakness on data line bit 8 causing marginal operation.
> 
> To add to this, the abort given in this report gives an IFSR value of
> 0x409, which equates to "Synchronous parity error on memory access"
> in ARMv7.  The other value (0x400) equates to "TLB conflict abort"
> which can only happen with LPAE support enabled...  So this is just
> getting more weird!

Could this be caused by bitflips in the RAM due to bad timings, or
overheating or that kind of things?

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html