linux-kernel - Re: [GIT PULL] gfs2 fix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAHc6FU4RBTOmKe5LJmQJfszg3r_giFM7zv9mYJmMjH8_UvmpYA@mail.gmail.com>
Date:   Wed, 4 May 2022 19:52:29 +0200
From:   Andreas Gruenbacher <agruenba@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Christoph Hellwig <hch@...radead.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <dchinner@...hat.com>,
        cluster-devel <cluster-devel@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] gfs2 fix

On Wed, May 4, 2022 at 12:42 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Tue, May 3, 2022 at 2:35 PM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> >
> > More testing still ongoing, but the following patch seems to fix the
> > data corruption.
>
> Fingers crossed.

It turns out that crossing fingers wasn't enough and we still get
corruption, but less frequently than before. We're going in the right
direction.

My working theory is that this is due to a subtle bug in the hole
punching done by gfs2_iomap_end() to get rid of unused blocks. With
the test case that fails, gfs2_iomap_end() is punching holes way more
often than I was expecting, and way more often than it should.
Remember that the test case is doing 32-MiB writes of a user buffer
that usually isn't entirely in memory. The first
iomap_file_buffered_write() allocates filesystem blocks for the entire
buffer, and when it finds that it could only do a partial write, it
frees a large part of those blocks. It will then call
fault_in_iov_iter_readable() for the next chunk, and the next call to
iomap_file_buffered_write() will then usually be able to write that
chunk entirely.

So it seems that we should always call fault_in_iov_iter_readable()
before calling into iomap_file_buffered_write(). This will probably
hide whatever is going wrong in punch_hole(), but we'll get to that
later ...

(Side note: the chunk size should be aligned to the page cache, not to
the iov_iter as in the current code.)

> > +               truncate_pagecache_range(inode, hstart, hend - 1);
> > +               if (hstart < hend)
> > +                       punch_hole(ip, hstart, hend - hstart);
>
> Why doesn't that "hstart < hend" condition cover both the truncate and
> the hole punch?

That was a leftover from a previous experiment in which I did the
truncate_pagecache_range() on the unaligned boundaries. Which turned
out to be pointless. I'll clean that up.

Thanks,
Andreas