[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_7KTEKEzC9Fh2rn@bombadil.infradead.org>
Date: Tue, 15 Apr 2025 14:06:20 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: Christian Brauner <brauner@...nel.org>, tytso@....edu,
adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
riel@...riel.com, dave@...olabs.net, willy@...radead.org,
hannes@...xchg.org, oliver.sang@...el.com, david@...hat.com,
axboe@...nel.dk, hare@...e.de, david@...morbit.com,
djwong@...nel.org, ritesh.list@...il.com,
linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
linux-mm@...ck.org, gost.dev@...sung.com, p.raghav@...sung.com,
da.gomez@...sung.com,
syzbot+f3c6fda1297c748a7076@...kaller.appspotmail.com
Subject: Re: [PATCH v2 1/8] migrate: fix skipping metadata buffer heads on
migration
On Tue, Apr 15, 2025 at 06:23:54PM +0200, Jan Kara wrote:
> On Tue 15-04-25 08:47:51, Luis Chamberlain wrote:
> > On Tue, Apr 15, 2025 at 11:05:38AM +0200, Christian Brauner wrote:
> > > On Mon, Apr 14, 2025 at 03:19:33PM -0700, Luis Chamberlain wrote:
> > > > On Mon, Apr 14, 2025 at 02:09:46PM -0700, Luis Chamberlain wrote:
> > > > > On Thu, Apr 10, 2025 at 02:05:38PM +0200, Jan Kara wrote:
> > > > > > > @@ -859,12 +862,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
> > > > > > > }
> > > > > > > bh = bh->b_this_page;
> > > > > > > } while (bh != head);
> > > > > > > + spin_unlock(&mapping->i_private_lock);
> > > > > >
> > > > > > No, you've just broken all simple filesystems (like ext2) with this patch.
> > > > > > You can reduce the spinlock critical section only after providing
> > > > > > alternative way to protect them from migration. So this should probably
> > > > > > happen at the end of the series.
> > > > >
> > > > > So you're OK with this spin lock move with the other series in place?
> > > > >
> > > > > And so we punt the hard-to-reproduce corruption issue as future work
> > > > > to do? Becuase the other alternative for now is to just disable
> > > > > migration for jbd2:
> > > > >
> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > index 1dc09ed5d403..ef1c3ef68877 100644
> > > > > --- a/fs/ext4/inode.c
> > > > > +++ b/fs/ext4/inode.c
> > > > > @@ -3631,7 +3631,6 @@ static const struct address_space_operations ext4_journalled_aops = {
> > > > > .bmap = ext4_bmap,
> > > > > .invalidate_folio = ext4_journalled_invalidate_folio,
> > > > > .release_folio = ext4_release_folio,
> > > > > - .migrate_folio = buffer_migrate_folio_norefs,
> > > > > .is_partially_uptodate = block_is_partially_uptodate,
> > > > > .error_remove_folio = generic_error_remove_folio,
> > > > > .swap_activate = ext4_iomap_swap_activate,
> > > >
> > > > BTW I ask because.. are your expectations that the next v3 series also
> > > > be a target for Linus tree as part of a fix for this spinlock
> > > > replacement?
> > >
> > > Since this is fixing potential filesystem corruption I will upstream
> > > whatever we need to do to fix this. Ideally we have a minimal fix to
> > > upstream now and a comprehensive fix and cleanup for v6.16.
> >
> > Despite our efforts we don't yet have an agreement on how to fix the
> > ext4 corruption, becuase Jan noted the buffer_meta() check in this patch
> > is too broad and would affect other filesystems (I have yet to
> > understand how, but will review).
> >
> > And so while we have agreement we can remove the spin lock to fix the
> > sleeping while atomic incurred by large folios for buffer heads by this
> > patch series, the removal of the spin lock would happen at the end of
> > this series.
> >
> > And so the ext4 corruption is an existing issue as-is today, its
> > separate from the spin lock removal goal to fix the sleeping while
> > atomic..
>
> I agree. Ext4 corruption problems are separate from sleeping in atomic
> issues.
Glad that's clear.
> > However this series might be quite big for an rc2 or rc3 fix for that spin
> > lock removal issue. It should bring in substantial performance benefits
> > though, so it might be worthy to consider. We can re-run tests with the
> > adjustment to remove the spin lock until the last patch in this series.
> >
> > The alternative is to revert the spin lock addition commit for Linus'
> > tree, ie commit ebdf4de5642fb6 ("mm: migrate: fix reference check race
> > between __find_get_block() and migration") and note that it in fact does
> > not fix the ext4 corruption as we've noted, and in fact causes an issue
> > with sleeping while atomic with support for large folios for buffer
> > heads. If we do that then we punt this series for the next development
> > window, and it would just not have the spin lock removal on the last
> > patch.
>
> Well, the commit ebdf4de5642fb6 is 6 years old. At that time there were no
> large folios (in fact there were no folios at all ;)) in the page cache and
> it does work quite well (I didn't see a corruption report from real users
> since then).
It is still a work around.
> So I don't like removing that commit because it makes a
> "reproducible with a heavy stress test" problem become a "reproduced by
> real world workloads" problem.
So how about just patch 2 and 8 in this series, with the spin lock
removal happening on the last patch for Linus tree?
Luis
Powered by blists - more mailing lists