[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHc6FU6VgQDO7HT5f4S_4f=9hczKGRDQ6SbQ5kNHMi4i-6rxVA@mail.gmail.com>
Date: Thu, 28 Apr 2022 19:38:51 +0200
From: Andreas Gruenbacher <agruenba@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: cluster-devel <cluster-devel@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix
On Thu, Apr 28, 2022 at 7:09 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Thu, Apr 28, 2022 at 6:27 AM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> >
> > The data corruption we've been getting unfortunately didn't have to do
> > with lock contention (we already knew that); it still occurs. I'm
> > running out of ideas on what to try there.
>
> Hmm.
>
> I don't see the bug, but I do have a suggestion on something to try.
>
> In particular, you said the problem started with commit 00bfe02f4796
> ("gfs2: Fix mmap + page fault deadlocks for buffered I/O").
Yes, but note that it's gfs2_file_buffered_write() that fails. When
the pagefault_disable/enable() around iomap_file_buffered_write() is
removed, the corruption goes away.
> And to me, I see two main things that are going on
>
> (a) the obvious "calling generic IO functions with pagefault disabled" thing
>
> (b) the "allow demotion" thing
>
> And I wonder if you could at least pinpoint which of the cases it is
> that triggers it.
>
> So I'd love to see you try three things:
>
> (1) just remove the "allow demotion" cases.
>
> This will re-introduce the deadlock the commit is trying to fix,
> but that's such a special case that I assume you can run your
> test-suite that shows the problem even without that fix in place?
>
> This would just pinpoint whether it's due to some odd locking issue or not.
>
> Honestly, from how you describe the symptoms, I don't think (1) is the
> cause, but I think making sure is good.
>
> It sounds much more likely that it's one of those generic vfs
> functions that screws up when a page fault happens and it gets a
> partial result instead of handling the fault.
The test should run just fine without allowing demotion. I'll try (1),
but I don't expect the outcome to change.
> Which gets us to
>
> (2) remove the pagefault_disable/enable() around just the
> generic_file_read_iter() case in gfs2_file_read_iter().
>
> and
>
> (3) finally, remove the pagefault_disable/enable() around the
> iomap_file_buffered_write() case in gfs2_file_buffered_write()
>
> Yeah, yeah, you say it's just the read that fails, but humor me on
> (3), just in case it's an earlier write in your test-suite and the
> read just then uncovered it.
>
> But I put it as (3) so that you'd do the obvious (2) case first, and
> narrow it down (ie if (1) still shows the bug, then do (2), and if
> that fixes the bug it will be fairly well pinpointed to
> generic_file_read_iter().
As mentioned above, we already did (3) and it didn't help. I'll do (1)
now, and then (2).
> Looking around, gfs2 is the only thing that obviously calls
> generic_file_read_iter() with pagefaults disabled, so it does smell
> like filemap_read() might have some issue, but the only thing that
> does is basically that
>
> copied = copy_folio_to_iter(folio, offset, bytes, iter);
>
> which should just become copy_page_to_iter_iovec(), which you'd hope
> would get things right.
>
> But it would be good to just narrow things down a bit.
>
> I'll look at that copy_page_to_iter_iovec() some more regardless, but
> doing that "let's double-check it's not somethign else" would be good.
We've actually been running most of our experiments on a 5.14-based
kernel with a plethora of backports, so pre-folio. Sorry I forgot to
mention that. I'll reproduce with mainline as well.
Thanks,
Andreas
Powered by blists - more mailing lists