[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zurfz7CNeyxGrfRr@casper.infradead.org>
Date: Wed, 18 Sep 2024 15:12:31 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Chris Mason <clm@...a.com>, Jens Axboe <axboe@...nel.dk>,
Dave Chinner <david@...morbit.com>,
Christian Theune <ct@...ingcircus.io>, linux-mm@...ck.org,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Daniel Dao <dqminh@...udflare.com>, regressions@...ts.linux.dev,
regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
folios since Dec 2021 (any kernel from 6.1 upwards)
On Wed, Sep 18, 2024 at 03:51:39PM +0200, Linus Torvalds wrote:
> On Wed, 18 Sept 2024 at 15:35, Matthew Wilcox <willy@...radead.org> wrote:
> >
> > Oh god, that's it.
> >
> > there should have been an xas_reset() after calling xas_split_alloc().
>
> I think it is worse than that.
>
> Even *without* an xas_split_alloc(), I think the old code was wrong,
> because it drops the xas lock without doing the xas_reset.
That's actually OK. The first time around the loop, we haven't walked the
tree, so we start from the top as you'd expect. The only other reason to
go around the loop again is that memory allocation failed for a node, and
in that case we call xas_nomem() and that (effectively) calls xas_reset().
So in terms of the expected API for xa_state users, it would be consistent
for xas_split_alloc() to call xas_reset().
You might argue that this API is too subtle, but it was intended to
be easy to use. The problem was that xas_split_alloc() got added much
later and I forgot to maintain the invariant that makes it work as well
as be easy to use.
Powered by blists - more mailing lists