[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211216193614.GA27676@magnolia>
Date: Thu, 16 Dec 2021 11:36:14 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH v2 19/28] iomap: Convert __iomap_zero_iter to use a folio
On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote:
> On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
> > +++ b/fs/iomap/buffered-io.c
> > @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
> >
> > static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
> > {
> > + struct folio *folio;
> > struct page *page;
> > int status;
> > - unsigned offset = offset_in_page(pos);
> > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
> > + size_t offset, bytes;
> >
> > - status = iomap_write_begin(iter, pos, bytes, &page);
> > + status = iomap_write_begin(iter, pos, length, &page);
>
> This turned out to be buggy. Darrick and I figured out why his tests
> were failing and mine weren't; this only shows up with a 4kB block
> size filesystem and I was only testing with 1kB block size filesystems.
> (at least on x86; I haven't figured out why it passes with 1kB block size
> filesystems, so I'm not sure what would be true on other filesystems).
> iomap_write_begin() is not prepared to deal with a length that spans a
> page boundary. So I'm replacing this patch with the following patches
> (whitespace damaged; pick them up from
> https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c
> if you want to compile them):
>
> commit 412212960b72
> Author: Matthew Wilcox (Oracle) <willy@...radead.org>
> Date: Thu Dec 9 15:47:44 2021 -0500
>
> iomap: Allow iomap_write_begin() to be called with the full length
>
> In the future, we want write_begin to know the entire length of the
> write so that it can choose to allocate large folios. Pass the full
> length in from __iomap_zero_iter() and limit it where necessary.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
>
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index d67108489148..9270db17c435 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
> struct gfs2_sbd *sdp = GFS2_SB(inode);
> unsigned int blocks;
>
> + /* gfs2 does not support large folios yet */
> + if (len > PAGE_SIZE)
> + len = PAGE_SIZE;
This is awkward -- gfs2 doesn't set the mapping flag to indicate that it
supports large folios, so it should never be asked to deal with more
than a page at a time. Shouldn't iomap_write_begin clamp its len
argument to PAGE_SIZE at the start if the mapping doesn't have the large
folios flag set?
--D
> blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
> return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
> }
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8d7a67655b60..67fcd3b9928d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> goto out_no_page;
> }
> folio = page_folio(page);
> + if (pos + len > folio_pos(folio) + folio_size(folio))
> + len = folio_pos(folio) + folio_size(folio) - pos;
>
> if (srcmap->type == IOMAP_INLINE)
> status = iomap_write_begin_inline(iter, page);
> @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff
> _t pos, u64 length)
> struct page *page;
> int status;
> unsigned offset = offset_in_page(pos);
> - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>
> - status = iomap_write_begin(iter, pos, bytes, &page);
> + if (length > UINT_MAX)
> + length = UINT_MAX;
> + status = iomap_write_begin(iter, pos, length, &page);
> if (status)
> return status;
> + if (length > PAGE_SIZE - offset)
> + length = PAGE_SIZE - offset;
>
> - zero_user(page, offset, bytes);
> + zero_user(page, offset, length);
> mark_page_accessed(page);
>
> - return iomap_write_end(iter, pos, bytes, bytes, page);
> + return iomap_write_end(iter, pos, length, length, page);
> }
>
> static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>
>
> commit 78c747a1b3a1
> Author: Matthew Wilcox (Oracle) <willy@...radead.org>
> Date: Fri Nov 5 14:24:09 2021 -0400
>
> iomap: Convert __iomap_zero_iter to use a folio
>
> The zero iterator can work in folio-sized chunks instead of page-sized
> chunks. This will save a lot of page cache lookups if the file is cached
> in large folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> Reviewed-by: Christoph Hellwig <hch@....de>
> Reviewed-by: Darrick J. Wong <djwong@...nel.org>
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 67fcd3b9928d..bbde6d4f27cd 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
>
> static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
> {
> + struct folio *folio;
> struct page *page;
> int status;
> - unsigned offset = offset_in_page(pos);
> + size_t offset;
>
> if (length > UINT_MAX)
> length = UINT_MAX;
> status = iomap_write_begin(iter, pos, length, &page);
> if (status)
> return status;
> - if (length > PAGE_SIZE - offset)
> - length = PAGE_SIZE - offset;
> + folio = page_folio(page);
>
> - zero_user(page, offset, length);
> - mark_page_accessed(page);
> + offset = offset_in_folio(folio, pos);
> + if (length > folio_size(folio) - offset)
> + length = folio_size(folio) - offset;
> + folio_zero_range(folio, offset, length);
> + folio_mark_accessed(folio);
>
> return iomap_write_end(iter, pos, length, length, page);
> }
>
>
> The xfstests that Darrick identified as failing all passed. Running a
> full sweep now; then I'll re-run with a 1kB filesystem to be sure that
> still passes. Then I'll send another pull request.
Powered by blists - more mailing lists