lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHc6FU48681X8aUK+g7UUN7q5b6rkVBzTP7h_zbE4XqZYAiw3g@mail.gmail.com>
Date:   Tue, 26 Apr 2022 23:27:44 +0200
From:   Andreas Gruenbacher <agruenba@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     cluster-devel <cluster-devel@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix

On Tue, Apr 26, 2022 at 8:31 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Tue, Apr 26, 2022 at 7:54 AM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> >
> > Also, note that we're fighting with a rare case of data corruption that
> > seems to have been introduced by commit 00bfe02f4796 ("gfs2: Fix mmap +
> > page fault deadlocks for buffered I/O"; merged in v5.16).  The
> > corruption seems to occur in gfs2_file_buffered_write() when
> > fault_in_iov_iter_readable() is involved.  We do end up with data that's
> > written at an offset, as if after a fault-in, the position in the iocb
> > got out of sync with the position in the iov_iter.  The user buffer the
> > iov_iter points at isn't page aligned, so the corruption also isn't page
> > aligned.  The code all seems correct and we couldn't figure out what's
> > going on so far.
>
> So this may be stupid, but looking at gfs2_file_direct_write(), I see
> what looks like two different obvious bugs.
>
> This looks entirely wrong:
>
>         if (ret > 0)
>                 read = ret;
>
> because this code is *repeated*.
>
> I think it should use "read += ret;" so that if we have a few
> successful reads, they add up.

Btrfs has a comment in that place that reads:

    /* No increment (+=) because iomap returns a cumulative value. */

That's so that we can complete the tail of an asynchronous write
asynchronously after a failed page fault and subsequent fault-in.

> And then at the end:
>
>         if (ret < 0)
>                 return ret;
>         return read;
>
> looks wrong too, since maybe the *last* iteration failed, but an
> earlier succeeded, so I think it should be
>
>         if (read)
>                 return read;
>         return ret;
>
> But hey, what do I know. I say "looks like obvious bugs", but I don't
> really know the code, and I may just be completely full of sh*t.

That would be great, but applications don't seem to be able to cope
with short direct writes, so we must turn partial failure into total
failure here. There's at least one xfstest that checks for that as
well.

> The reason I think I'm full of sh*t is that you say that the problem
> occurs in gfs2_file_buffered_write(), not in that
> gfs2_file_direct_write() case.

Right, we're having that issue with buffered writes.

> And the *buffered* case seems to get both of those "obvious bugs" right.
>
> So I must be wrong, but I have to say, that looks odd to me.
>
> Now hopefully somebody who knows the code will explain to me why I'm
> wrong, and in the process go "Duh, but.." and see what's up.

Thanks for pointing out the places that seem odd to you. You'll not be
the only one.

Andreas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ