[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCsP4bW6c08h3DJv@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Mon, 19 May 2025 16:32:57 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, willy@...radead.org, tytso@....edu,
adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com,
libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 0/8] ext4: enable large folio for regular files
On Mon, May 19, 2025 at 09:19:10AM +0800, Zhang Yi wrote:
> On 2025/5/16 19:48, Ojaswin Mujoo wrote:
> > On Mon, May 12, 2025 at 02:33:11PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@...wei.com>
> >>
> >> Changes since v1:
> >> - Rebase codes on 6.15-rc6.
> >> - Drop the modifications in block_read_full_folio() which has supported
> >> by commit b72e591f74de ("fs/buffer: remove batching from async
> >> read").
> >> - Fine-tuning patch 6 without modifying the logic.
> >>
> >> v1: https://lore.kernel.org/linux-ext4/20241125114419.903270-1-yi.zhang@huaweicloud.com/
> >>
> >> Original Description:
> >>
> >> Since almost all of the code paths in ext4 have already been converted
> >> to use folios, there isn't much additional work required to support
> >> large folios. This series completes the remaining work and enables large
> >> folios for regular files on ext4, with the exception of fsverity,
> >> fscrypt, and data=journal mode.
> >>
> >> Unlike my other series[1], which enables large folios by converting the
> >> buffered I/O path from the classic buffer_head to iomap, this solution
> >> is based on the original buffer_head, it primarily modifies the block
> >> offset and length calculations within a single folio in the buffer
> >> write, buffer read, zero range, writeback, and move extents paths to
> >> support large folios, doesn't do further code refactoring and
> >> optimization.
> >>
> >> This series have passed kvm-xfstests in auto mode several times, every
> >> thing looks fine, any comments are welcome.
> >>
> >> About performance:
> >>
> >> I used the same test script from my iomap series (need to drop the mount
> >> opts parameter MOUNT_OPT) [2], run fio tests on the same machine with
> >> Intel Xeon Gold 6240 CPU with 400GB system ram, 200GB ramdisk and 4TB
> >> nvme ssd disk. Both compared with the base and the IOMAP + large folio
> >> changes.
> >>
> >> == buffer read ==
> >>
> >> base iomap+large folio base+large folio
> >> type bs IOPS BW(M/s) IOPS BW(M/s) IOPS BW(M/s)
> >> ----------------------------------------------------------------
> >> hole 4K | 576k 2253 | 762k 2975(+32%) | 747k 2918(+29%)
> >> hole 64K | 48.7k 3043 | 77.8k 4860(+60%) | 76.3k 4767(+57%)
> >> hole 1M | 2960 2960 | 4942 4942(+67%) | 4737 4738(+60%)
> >> ramdisk 4K | 443k 1732 | 530k 2069(+19%) | 494k 1930(+11%)
> >> ramdisk 64K | 34.5k 2156 | 45.6k 2850(+32%) | 41.3k 2584(+20%)
> >> ramdisk 1M | 2093 2093 | 2841 2841(+36%) | 2585 2586(+24%)
> >> nvme 4K | 339k 1323 | 364k 1425(+8%) | 344k 1341(+1%)
> >> nvme 64K | 23.6k 1471 | 25.2k 1574(+7%) | 25.4k 1586(+8%)
> >> nvme 1M | 2012 2012 | 2153 2153(+7%) | 2122 2122(+5%)
> >>
> >>
> >> == buffer write ==
> >>
> >> O: Overwrite; S: Sync; W: Writeback
> >>
> >> base iomap+large folio base+large folio
> >> type O S W bs IOPS BW IOPS BW(M/s) IOPS BW(M/s)
> >> ----------------------------------------------------------------------
> >> cache N N N 4K | 417k 1631 | 440k 1719 (+5%) | 423k 1655 (+2%)
> >> cache N N N 64K | 33.4k 2088 | 81.5k 5092 (+144%) | 59.1k 3690 (+77%)
> >> cache N N N 1M | 2143 2143 | 5716 5716 (+167%) | 3901 3901 (+82%)
> >> cache Y N N 4K | 449k 1755 | 469k 1834 (+5%) | 452k 1767 (+1%)
> >> cache Y N N 64K | 36.6k 2290 | 82.3k 5142 (+125%) | 67.2k 4200 (+83%)
> >> cache Y N N 1M | 2352 2352 | 5577 5577 (+137% | 4275 4276 (+82%)
> >> ramdisk N N Y 4K | 365k 1424 | 354k 1384 (-3%) | 372k 1449 (+2%)
> >> ramdisk N N Y 64K | 31.2k 1950 | 74.2k 4640 (+138%) | 56.4k 3528 (+81%)
> >> ramdisk N N Y 1M | 1968 1968 | 5201 5201 (+164%) | 3814 3814 (+94%)
> >> ramdisk N Y N 4K | 9984 39 | 12.9k 51 (+29%) | 9871 39 (-1%)
> >> ramdisk N Y N 64K | 5936 371 | 8960 560 (+51%) | 6320 395 (+6%)
> >> ramdisk N Y N 1M | 1050 1050 | 1835 1835 (+75%) | 1656 1657 (+58%)
> >> ramdisk Y N Y 4K | 411k 1609 | 443k 1731 (+8%) | 441k 1723 (+7%)
> >> ramdisk Y N Y 64K | 34.1k 2134 | 77.5k 4844 (+127%) | 66.4k 4151 (+95%)
> >> ramdisk Y N Y 1M | 2248 2248 | 5372 5372 (+139%) | 4209 4210 (+87%)
> >> ramdisk Y Y N 4K | 182k 711 | 186k 730 (+3%) | 182k 711 (0%)
> >> ramdisk Y Y N 64K | 18.7k 1170 | 34.7k 2171 (+86%) | 31.5k 1969 (+68%)
> >> ramdisk Y Y N 1M | 1229 1229 | 2269 2269 (+85%) | 1943 1944 (+58%)
> >> nvme N N Y 4K | 373k 1458 | 387k 1512 (+4%) | 399k 1559 (+7%)
> >> nvme N N Y 64K | 29.2k 1827 | 70.9k 4431 (+143%) | 54.3k 3390 (+86%)
> >> nvme N N Y 1M | 1835 1835 | 4919 4919 (+168%) | 3658 3658 (+99%)
> >> nvme N Y N 4K | 11.7k 46 | 11.7k 46 (0%) | 11.5k 45 (-1%)
> >> nvme N Y N 64K | 6453 403 | 8661 541 (+34%) | 7520 470 (+17%)
> >> nvme N Y N 1M | 649 649 | 1351 1351 (+108%) | 885 886 (+37%)
> >> nvme Y N Y 4K | 372k 1456 | 433k 1693 (+16%) | 419k 1637 (+12%)
> >> nvme Y N Y 64K | 33.0k 2064 | 74.7k 4669 (+126%) | 64.1k 4010 (+94%)
> >> nvme Y N Y 1M | 2131 2131 | 5273 5273 (+147%) | 4259 4260 (+100%)
> >> nvme Y Y N 4K | 56.7k 222 | 56.4k 220 (-1%) | 59.4k 232 (+5%)
> >> nvme Y Y N 64K | 13.4k 840 | 19.4k 1214 (+45%) | 18.5k 1156 (+38%)
> >> nvme Y Y N 1M | 714 714 | 1504 1504 (+111%) | 1319 1320 (+85%)
> >>
> >> [1] https://lore.kernel.org/linux-ext4/20241022111059.2566137-1-yi.zhang@huaweicloud.com/
> >> [2] https://lore.kernel.org/linux-ext4/3c01efe6-007a-4422-ad79-0bad3af281b1@huaweicloud.com/
> >>
> >> Thanks,
> >> Yi.
> >>
> >> Zhang Yi (8):
> >> ext4: make ext4_mpage_readpages() support large folios
> >> ext4: make regular file's buffered write path support large folios
> >> ext4: make __ext4_block_zero_page_range() support large folio
> >> ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large
> >> folio
> >> ext4: correct the journal credits calculations of allocating blocks
> >> ext4: make the writeback path support large folios
> >> ext4: make online defragmentation support large folios
> >> ext4: enable large folio for regular file
> >>
> >> fs/ext4/ext4.h | 1 +
> >> fs/ext4/ext4_jbd2.c | 3 +-
> >> fs/ext4/ext4_jbd2.h | 4 +--
> >> fs/ext4/extents.c | 5 +--
> >> fs/ext4/ialloc.c | 3 ++
> >> fs/ext4/inode.c | 72 ++++++++++++++++++++++++++++++-------------
> >> fs/ext4/move_extent.c | 11 +++----
> >> fs/ext4/readpage.c | 28 ++++++++++-------
> >> fs/jbd2/journal.c | 7 +++--
> >> include/linux/jbd2.h | 2 +-
> >> 10 files changed, 88 insertions(+), 48 deletions(-)
> >>
> >> --
> >> 2.46.1
> >
> > Hi Zhang,
> >
> > I'm currently testing the patches with 4k block size and 64k pagesize on
> > power and noticed that ext4/046 is hitting a bug on:
> >
> > [ 188.351668][ T1320] NIP [c0000000006f15a4] block_read_full_folio+0x444/0x450
> > [ 188.351782][ T1320] LR [c0000000006f15a0] block_read_full_folio+0x440/0x450
> > [ 188.351868][ T1320] --- interrupt: 700
> > [ 188.351919][ T1320] [c0000000058176e0] [c0000000007d7564] ext4_mpage_readpages+0x204/0x910
> > [ 188.352027][ T1320] [c0000000058177e0] [c0000000007a55d4] ext4_readahead+0x44/0x60
> > [ 188.352119][ T1320] [c000000005817800] [c00000000052bd80] read_pages+0xa0/0x3d0
> > [ 188.352216][ T1320] [c0000000058178a0] [c00000000052cb84] page_cache_ra_order+0x2c4/0x560
> > [ 188.352312][ T1320] [c000000005817990] [c000000000514614] filemap_readahead.isra.0+0x74/0xe0
> > [ 188.352427][ T1320] [c000000005817a00] [c000000000519fe8] filemap_get_pages+0x548/0x9d0
> > [ 188.352529][ T1320] [c000000005817af0] [c00000000051a59c] filemap_read+0x12c/0x520
> > [ 188.352624][ T1320] [c000000005817cc0] [c000000000793ae8] ext4_file_read_iter+0x78/0x320
> > [ 188.352724][ T1320] [c000000005817d10] [c000000000673e54] vfs_read+0x314/0x3d0
> > [ 188.352813][ T1320] [c000000005817dc0] [c000000000674ad8] ksys_read+0x88/0x150
> > [ 188.352905][ T1320] [c000000005817e10] [c00000000002fff4] system_call_exception+0x114/0x300
> > [ 188.353019][ T1320] [c000000005817e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> >
> > which is:
> >
> > int block_read_full_folio(struct folio *folio, get_block_t *get_block)
> > {
> > ...
> > /* This is needed for ext4. */
> > if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
> > limit = inode->i_sb->s_maxbytes;
> >
> > VM_BUG_ON_FOLIO(folio_test_large(folio), folio); <-------------
> >
> > head = folio_create_buffers(folio, inode, 0);
> > blocksize = head->b_size;
> >
> > This seems like it got mistakenly left out. Wihtout this line I'm not
> > hitting the BUG, however it's strange that none the x86 testing caught
> > this. I can only replicate this on 4k blocksize on 64k page size power
> > pc architecture. I'll spend some time to understand why it is not
> > getting hit on x86 with 1k bs. (maybe ext4_mpage_readpages() is not
> > falling to block_read_full_folio that easily.)
> >
> > I'll continue testing with the line removed.
>
> Hi Ojaswin.
>
> Thanks for the test again, I checked the commit, this line has already
> been removed by commit e59e97d42b05 ("fs/buffer fs/mpage: remove large
> folio restriction").
>
> Thanks,
> Yi.
Hi Yi,
Thanks, seems like they came in via Christian's 6.15-rc1 vfs branch,
maybe Ted rebased recently since I didnt see this change in the fairly
recent branhc that I was testing on.
Good to see it is fixed. I've another overnight auto run going on, I'll
update if I see any regressions.
Thanks,
Ojaswin
>
Powered by blists - more mailing lists