[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15f15df9-ec90-486a-a784-effb8b2cb292@meta.com>
Date: Tue, 24 Sep 2024 21:17:04 +0200
From: Chris Mason <clm@...a.com>
To: Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Chinner <david@...morbit.com>,
Christian Theune <ct@...ingcircus.io>, linux-mm@...ck.org,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Daniel Dao <dqminh@...udflare.com>, regressions@...ts.linux.dev,
regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
folios since Dec 2021 (any kernel from 6.1 upwards)
On 9/20/24 3:54 PM, Chris Mason wrote:
[ ... ]
> xas_split_alloc() does the allocation and also shoves an entry into some of
> the slots. When the tree changes, the entry we've stored is wildly
> wrong, but xas_reset() doesn't undo any of that. So when we actually
> use the xas->xa_alloc nodes we've setup, they are pointing to the
> wrong things.
>
> Which is probably why the commits in 6.10 added this:
>
> /* entry may have changed before we re-acquire the lock */
> if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
> xas_destroy(&xas);
> alloced_order = 0;
> }
>
> The only way to undo the work done by xas_split_alloc() is to call
> xas_destroy().
>
> To prove this theory, I tried making a minimal version that also
> called destroy, but it all ended up less minimal than the code
> that's actually in 6.10. I've got a long test going now with
> an extra cond_resched() to make the race bigger, and a printk of victory.
>
> It hasn't fired yet, and I need to hop on an airplane, so I'll just leave
> it running for now. But long story short, I think we should probably
> just tag all of these for stable:
>
> https://lore.kernel.org/all/20240415171857.19244-2-ryncsn@gmail.com/T/#mdb85922624c39ea7efb775a044af4731890ff776
>
> Also, Willy's proposed changes to xas_split_alloc() seem like a good
> idea.
A few days of load later and some extra printks, it turns out that
taking the writer lock in __filemap_add_folio() makes us dramatically
more likely to just return EEXIST than go into the xas_split_alloc() dance.
With the changes in 6.10, we only get into that xas_destroy() case above
when the conflicting entry is a shadow entry, so I changed my repro to
use memory pressure instead of fadvise.
I also added a schedule_timeout(1) after the split alloc, and with all
of that I'm able to consistently make the xas_destroy() case trigger
without causing any system instability. Kairui Song's patches do seem
to have fixed things nicely.
-chris
Powered by blists - more mailing lists