[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi7o+fHYBTuCQQdHD112YHQtO21Y3+wxNYypjdo8feKFg@mail.gmail.com>
Date: Tue, 26 Apr 2022 11:31:01 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andreas Gruenbacher <agruenba@...hat.com>
Cc: cluster-devel <cluster-devel@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix
On Tue, Apr 26, 2022 at 7:54 AM Andreas Gruenbacher <agruenba@...hat.com> wrote:
>
> Also, note that we're fighting with a rare case of data corruption that
> seems to have been introduced by commit 00bfe02f4796 ("gfs2: Fix mmap +
> page fault deadlocks for buffered I/O"; merged in v5.16). The
> corruption seems to occur in gfs2_file_buffered_write() when
> fault_in_iov_iter_readable() is involved. We do end up with data that's
> written at an offset, as if after a fault-in, the position in the iocb
> got out of sync with the position in the iov_iter. The user buffer the
> iov_iter points at isn't page aligned, so the corruption also isn't page
> aligned. The code all seems correct and we couldn't figure out what's
> going on so far.
So this may be stupid, but looking at gfs2_file_direct_write(), I see
what looks like two different obvious bugs.
This looks entirely wrong:
if (ret > 0)
read = ret;
because this code is *repeated*.
I think it should use "read += ret;" so that if we have a few
successful reads, they add up.
And then at the end:
if (ret < 0)
return ret;
return read;
looks wrong too, since maybe the *last* iteration failed, but an
earlier succeeded, so I think it should be
if (read)
return read;
return ret;
But hey, what do I know. I say "looks like obvious bugs", but I don't
really know the code, and I may just be completely full of sh*t.
The reason I think I'm full of sh*t is that you say that the problem
occurs in gfs2_file_buffered_write(), not in that
gfs2_file_direct_write() case.
And the *buffered* case seems to get both of those "obvious bugs" right.
So I must be wrong, but I have to say, that looks odd to me.
Now hopefully somebody who knows the code will explain to me why I'm
wrong, and in the process go "Duh, but.." and see what's up.
Linus
Powered by blists - more mailing lists