lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191112160855.GA22025@arrakis.emea.arm.com>
Date:   Tue, 12 Nov 2019 16:08:57 +0000
From:   Catalin Marinas <catalin.marinas@....com>
To:     Vincent Whitchurch <vincent.whitchurch@...s.com>
Cc:     torvalds@...ux-foundation.org, axboe@...nel.dk,
        linux@...linux.org.uk, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, will@...nel.org,
        Vincent Whitchurch <rabinv@...s.com>,
        Richard Earnshaw <Richard.Earnshaw@....com>
Subject: Re: [PATCH v2] buffer: Fix I/O error due to ARM read-after-read
 hazard

On Tue, Nov 12, 2019 at 02:02:44PM +0100, Vincent Whitchurch wrote:
> On my dual-core ARM Cortex-A9, reading from squashfs (over
> dm-verity/ubi/mtd) in a loop for hundreds of hours invariably results in
> a read failure in squashfs_read_data().  The errors occur because the
> buffer_uptodate() check fails after wait_on_buffer().  Further debugging
> shows that the bh was in fact uptodate and that there is no actual I/O
> error in the lower layers.
> 
> The problem is caused by the read-after-read hazards in the ARM
> Cortex-A9 MPCore (erratum #761319, see [1]).  The code generated by the
> compiler for the combination of the wait_on_buffer() and
> buffer_uptodate() calls reads the flags value twice from memory (see the
> excerpt of the assembly below).  The new value of the BH_Lock flag is
> seen but the new value of BH_Uptodate is not even though both the bits
> are read from the same memory location.
> 
>  27c:	9d08      	ldr	r5, [sp, #32]
>  27e:	2400      	movs	r4, #0
>  280:	e006      	b.n	290 <squashfs_read_data+0x290>
>  282:	6803      	ldr	r3, [r0, #0]
>  284:	07da      	lsls	r2, r3, #31
>  286:	f140 810d 	bpl.w	4a4 <squashfs_read_data+0x4a4>
>  28a:	3401      	adds	r4, #1
>  28c:	42bc      	cmp	r4, r7
>  28e:	da08      	bge.n	2a2 <squashfs_read_data+0x2a2>
>  290:	f855 0f04 	ldr.w	r0, [r5, #4]!
>  294:	6803      	ldr	r3, [r0, #0]
>  296:	0759      	lsls	r1, r3, #29
>  298:	d5f3      	bpl.n	282 <squashfs_read_data+0x282>
>  29a:	f7ff fffe 	bl	0 <__wait_on_buffer>
> 
> Work around this problem by adding a DMB between the two reads of
> bh->flags, as recommended in the ARM document.  With this barrier, no
> failures have been seen in more than 5000 hours of the same test.
> 
> [1] http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf

I thought we were going to fix the compiler. I found an old thread here:

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00714.html

Also cc'ing Richard Earnshaw as he may been involved in the gcc
discussion at the time.

While you can add some barrier here, there may be other cases where this
can go wrong.

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ