[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191112130244.16630-1-vincent.whitchurch@axis.com>
Date:   Tue, 12 Nov 2019 14:02:44 +0100
From:   Vincent Whitchurch <vincent.whitchurch@...s.com>
To:     torvalds@...ux-foundation.org
Cc:     axboe@...nel.dk, linux@...linux.org.uk,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        will@...nel.org, catalin.marinas@....com,
        Vincent Whitchurch <rabinv@...s.com>
Subject: [PATCH v2] buffer: Fix I/O error due to ARM read-after-read hazard
On my dual-core ARM Cortex-A9, reading from squashfs (over
dm-verity/ubi/mtd) in a loop for hundreds of hours invariably results in
a read failure in squashfs_read_data().  The errors occur because the
buffer_uptodate() check fails after wait_on_buffer().  Further debugging
shows that the bh was in fact uptodate and that there is no actual I/O
error in the lower layers.
The problem is caused by the read-after-read hazards in the ARM
Cortex-A9 MPCore (erratum #761319, see [1]).  The code generated by the
compiler for the combination of the wait_on_buffer() and
buffer_uptodate() calls reads the flags value twice from memory (see the
excerpt of the assembly below).  The new value of the BH_Lock flag is
seen but the new value of BH_Uptodate is not even though both the bits
are read from the same memory location.
 27c:	9d08      	ldr	r5, [sp, #32]
 27e:	2400      	movs	r4, #0
 280:	e006      	b.n	290 <squashfs_read_data+0x290>
 282:	6803      	ldr	r3, [r0, #0]
 284:	07da      	lsls	r2, r3, #31
 286:	f140 810d 	bpl.w	4a4 <squashfs_read_data+0x4a4>
 28a:	3401      	adds	r4, #1
 28c:	42bc      	cmp	r4, r7
 28e:	da08      	bge.n	2a2 <squashfs_read_data+0x2a2>
 290:	f855 0f04 	ldr.w	r0, [r5, #4]!
 294:	6803      	ldr	r3, [r0, #0]
 296:	0759      	lsls	r1, r3, #29
 298:	d5f3      	bpl.n	282 <squashfs_read_data+0x282>
 29a:	f7ff fffe 	bl	0 <__wait_on_buffer>
Work around this problem by adding a DMB between the two reads of
bh->flags, as recommended in the ARM document.  With this barrier, no
failures have been seen in more than 5000 hours of the same test.
[1] http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@...s.com>
---
v2: Reword commit message.
 include/linux/buffer_head.h | 8 ++++++++
 1 file changed, 8 insertions(+)
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 7b73ef7f902d..4ef909a91f8c 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -352,6 +352,14 @@ static inline void wait_on_buffer(struct buffer_head *bh)
 	might_sleep();
 	if (buffer_locked(bh))
 		__wait_on_buffer(bh);
+
+#ifdef CONFIG_ARM
+	/*
+	 * Work around ARM Cortex-A9 MPcore Read-after-Read Hazards (erratum
+	 * 761319).
+	 */
+	smp_rmb();
+#endif
 }
 
 static inline int trylock_buffer(struct buffer_head *bh)
-- 
2.20.0
Powered by blists - more mailing lists
 
