lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251025032221.2905818-1-libaokun@huaweicloud.com>
Date: Sat, 25 Oct 2025 11:21:56 +0800
From: libaokun@...weicloud.com
To: linux-ext4@...r.kernel.org
Cc: tytso@....edu,
	adilger.kernel@...ger.ca,
	jack@...e.cz,
	linux-kernel@...r.kernel.org,
	kernel@...kajraghav.com,
	mcgrof@...nel.org,
	linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org,
	yi.zhang@...wei.com,
	yangerkun@...wei.com,
	chengzhihao1@...wei.com,
	libaokun1@...wei.com,
	libaokun@...weicloud.com
Subject: [PATCH 00/25] ext4: enable block size larger than page size

From: Baokun Li <libaokun1@...wei.com>

This series enables block size > page size (Large Block Size) in EXT4.

Since large folios are already supported for regular files, the required
changes are not substantial, but they are scattered across the code.
The changes primarily focus on cleaning up potential division-by-zero
errors, resolving negative left/right shifts, and correctly handling
mutually exclusive mount options.

One somewhat troublesome issue is that allocating page units greater than
order-1 with __GFP_NOFAIL in __alloc_pages_slowpath() can trigger an
unexpected WARN_ON. With LBS support, EXT4 and jbd2 may use __GFP_NOFAIL
to allocate large folios when reading metadata.

To avoid this warning, when jbd2_alloc() and grow_dev_folio() attempt to
allocate with order greater than 1, the __GFP_NOFAIL flag is not passed
down; instead, the functions retry internally to satisfy the allocation.

Patch series based on 6.18-rc2. `kvm-xfstests -c ext4/all -g auto` has
been executed with no new failures. `kvm-xfstests -c ext4/64k -g auto`
has been executed and no Oops was observed.

Here are some performance test data for your reference:

Testing EXT4 filesystems with different block sizes, measuring
single-threaded dd bandwidth for BIO/DIO with varying bs values.

Before(PAGE_SIZE=4096):

      BIO     | bs=4k    | bs=8k    | bs=16k   | bs=32k   | bs=64k
--------------|----------|----------|----------|----------|------------
 4k           | 1.5 GB/s | 2.1 GB/s | 2.8 GB/s | 3.4 GB/s | 3.8 GB/s
 8k (bigalloc)| 1.4 GB/s | 2.0 GB/s | 2.6 GB/s | 3.1 GB/s | 3.4 GB/s
 16k(bigalloc)| 1.5 GB/s | 2.0 GB/s | 2.6 GB/s | 3.2 GB/s | 3.6 GB/s
 32k(bigalloc)| 1.5 GB/s | 2.1 GB/s | 2.7 GB/s | 3.3 GB/s | 3.7 GB/s
 64k(bigalloc)| 1.5 GB/s | 2.1 GB/s | 2.8 GB/s | 3.4 GB/s | 3.8 GB/s
              
      DIO     | bs=4k    | bs=8k    | bs=16k   | bs=32k   | bs=64k
--------------|----------|----------|----------|----------|------------
 4k           | 194 MB/s | 366 MB/s | 626 MB/s | 1.0 GB/s | 1.4 GB/s
 8k (bigalloc)| 188 MB/s | 359 MB/s | 612 MB/s | 996 MB/s | 1.4 GB/s
 16k(bigalloc)| 208 MB/s | 378 MB/s | 642 MB/s | 1.0 GB/s | 1.4 GB/s
 32k(bigalloc)| 184 MB/s | 368 MB/s | 637 MB/s | 995 MB/s | 1.4 GB/s
 64k(bigalloc)| 208 MB/s | 389 MB/s | 634 MB/s | 1.0 GB/s | 1.4 GB/s

Patched(PAGE_SIZE=4096):

   BIO   | bs=4k    | bs=8k    | bs=16k   | bs=32k   | bs=64k
---------|----------|----------|----------|----------|------------
 4k      | 1.5 GB/s | 2.1 GB/s | 2.8 GB/s | 3.4 GB/s | 3.8 GB/s
 8k (LBS)| 1.7 GB/s | 2.3 GB/s | 3.2 GB/s | 4.2 GB/s | 4.7 GB/s
 16k(LBS)| 2.0 GB/s | 2.7 GB/s | 3.6 GB/s | 4.7 GB/s | 5.4 GB/s
 32k(LBS)| 2.2 GB/s | 3.1 GB/s | 3.9 GB/s | 4.9 GB/s | 5.7 GB/s
 64k(LBS)| 2.4 GB/s | 3.3 GB/s | 4.2 GB/s | 5.1 GB/s | 6.0 GB/s

   DIO   | bs=4k    | bs=8k    | bs=16k   | bs=32k   | bs=64k
---------|----------|----------|----------|----------|------------
 4k      | 204 MB/s | 355 MB/s | 627 MB/s | 1.0 GB/s | 1.4 GB/s
 8k (LBS)| 210 MB/s | 356 MB/s | 602 MB/s | 997 MB/s | 1.4 GB/s
 16k(LBS)| 191 MB/s | 361 MB/s | 589 MB/s | 981 MB/s | 1.4 GB/s
 32k(LBS)| 181 MB/s | 330 MB/s | 581 MB/s | 951 MB/s | 1.3 GB/s
 64k(LBS)| 148 MB/s | 272 MB/s | 499 MB/s | 840 MB/s | 1.3 GB/s


The results show:

 * The code changes have almost no impact on the original 4k write
   performance of ext4.
 * Compared with bigalloc, LBS improves BIO write performance by about 50%
   on average.
 * Compared with bigalloc, LBS shows degradation in DIO write performance,
   which increases as the filesystem block size grows and the test bs
   decreases, with a maximum degradation of about 30%.

The DIO regression is primarily due to the increased time spent in
crc32c_arch() within ext4_block_bitmap_csum_set() during block allocation,
as the block size grows larger. This indicates that larger filesystem block
sizes are not always better; please choose an appropriate block size based
on your I/O workload characteristics.

We are also planning further optimizations for block allocation under LBS
in the future.

Comments and questions are, as always, welcome.

Thanks,
Baokun


Baokun Li (21):
  ext4: remove page offset calculation in ext4_block_truncate_page()
  ext4: remove PAGE_SIZE checks for rec_len conversion
  ext4: make ext4_punch_hole() support large block size
  ext4: enable DIOREAD_NOLOCK by default for BS > PS as well
  ext4: introduce s_min_folio_order for future BS > PS support
  ext4: support large block size in ext4_calculate_overhead()
  ext4: support large block size in ext4_readdir()
  ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion
  ext4: add EXT4_LBLK_TO_P and EXT4_P_TO_LBLK for block/page conversion
  ext4: support large block size in ext4_mb_load_buddy_gfp()
  ext4: support large block size in ext4_mb_get_buddy_page_lock()
  ext4: support large block size in ext4_mb_init_cache()
  ext4: prepare buddy cache inode for BS > PS with large folios
  ext4: support large block size in ext4_mpage_readpages()
  ext4: support large block size in ext4_block_write_begin()
  ext4: support large block size in mpage_map_and_submit_buffers()
  ext4: support large block size in mpage_prepare_extent_to_map()
  fs/buffer: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS
  jbd2: prevent WARN_ON in __alloc_pages_slowpath() when BS > PS
  ext4: add checks for large folio incompatibilities when BS > PS
  ext4: enable block size larger than page size

Zhihao Cheng (4):
  ext4: remove page offset calculation in ext4_block_zero_page_range()
  ext4: rename 'page' references to 'folio' in multi-block allocator
  ext4: support large block size in __ext4_block_zero_page_range()
  ext4: make online defragmentation support large block size

 fs/buffer.c           |  33 +++++++++-
 fs/ext4/dir.c         |   8 +--
 fs/ext4/ext4.h        |  27 ++++-----
 fs/ext4/extents.c     |   2 +-
 fs/ext4/inode.c       |  69 ++++++++++-----------
 fs/ext4/mballoc.c     | 137 ++++++++++++++++++++++--------------------
 fs/ext4/move_extent.c |  20 +++---
 fs/ext4/namei.c       |   8 +--
 fs/ext4/readpage.c    |   7 +--
 fs/ext4/super.c       |  52 ++++++++++++----
 fs/ext4/verity.c      |   2 +-
 fs/jbd2/journal.c     |  28 ++++++++-
 12 files changed, 234 insertions(+), 159 deletions(-)

-- 
2.46.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ