[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <effc0ec7-cf9d-44dc-aee5-563942242522@meta.com>
Date: Tue, 17 Sep 2024 11:36:51 +0200
From: Chris Mason <clm@...a.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Chinner <david@...morbit.com>, Jens Axboe <axboe@...nel.dk>,
Christian Theune <ct@...ingcircus.io>, linux-mm@...ck.org,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Daniel Dao <dqminh@...udflare.com>, regressions@...ts.linux.dev,
regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
folios since Dec 2021 (any kernel from 6.1 upwards)
On 9/17/24 5:32 AM, Matthew Wilcox wrote:
> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote:
>> I've got a bunch of assertions around incorrect folio->mapping and I'm
>> trying to bash on the ENOMEM for readahead case. There's a GFP_NOWARN
>> on those, and our systems do run pretty short on ram, so it feels right
>> at least. We'll see.
>
> I've been running with some variant of this patch the whole way across
> the Atlantic, and not hit any problems. But maybe with the right
> workload ...?
>
> There are two things being tested here. One is whether we have a
> cross-linked node (ie a node that's in two trees at the same time).
> The other is whether the slab allocator is giving us a node that already
> contains non-NULL entries.
>
> If you could throw this on top of your kernel, we might stand a chance
> of catching the problem sooner. If it is one of these problems and not
> something weirder.
>
I was able to corrupt the xarray one time, hitting a crash during
unmount. It wasn't the xfs filesystem I was actually hammering so I
guess that tells us something, but it was after ~3 hours of stress runs,
so not really useful.
I'll try with your patch as well.
-chris
Powered by blists - more mailing lists