[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210318201506.GU3420@casper.infradead.org>
Date: Thu, 18 Mar 2021 20:15:06 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Eric Whitney <enwlinux@...il.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu
Subject: Re: generic/418 regression seen on 5.12-rc3
On Thu, Mar 18, 2021 at 02:16:13PM -0400, Eric Whitney wrote:
> As mentioned in today's ext4 concall, I've seen generic/418 fail from time to
> time when run on 5.12-rc3 and 5.12-rc1 kernels. This first occurred when
> running the 1k test case using kvm-xfstests. I was then able to bisect the
> failure to a patch landed in the -rc1 merge window:
>
> (bd8a1f3655a7) mm/filemap: support readpage splitting a page
Thanks for letting me know. This failure is new to me.
I don't understand it; this patch changes the behaviour of buffered reads
from waiting on a page with a refcount held to waiting on a page without
the refcount held, then starting the lookup from scratch once the page
is unlocked. I find it hard to believe this introduces a /new/ failure.
Either it makes an existing failure easier to hit, or there's a subtle
bug in the retry logic that I'm not seeing.
> Typical test output resulting from a failure looks like:
>
> QA output created by 418
> +cmpbuf: offset 0: Expected: 0x1, got 0x0
> +[6:0] FAIL - comparison failed, offset 3072
> +diotest -w -b 512 -n 8 -i 4 failed at loop 0
> Silence is golden
> ...
>
> I've also been able to reproduce the failure on -rc3 in the 4k test case as
> well. The failure frequency there was 10 out of 100 runs. It was anywhere
> from 2 to 8 failures out of 100 runs in the 1k case.
>
> So, the failure isn't dependent upon block size less than page size.
That's a good data point. I'll take a look at g/418 and see if i can
figure out what race we're hitting. Nice that it happens so often.
I suppose I could get you to put some debugging in -- maybe dumping the
page if we hit a contended case, then again if we're retrying?
I presume it doesn't always happen at the same offset or anything
convenient like that.
Powered by blists - more mailing lists