[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yp4gorgjhh6c3qeopjabmknimeifhnpbz63irrrtjpplatnk4k@ycofoucc4ry3>
Date: Wed, 5 Nov 2025 11:14:01 +0100
From: Jan Kara <jack@...e.cz>
To: libaokun@...weicloud.com
Cc: linux-ext4@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
jack@...e.cz, linux-kernel@...r.kernel.org, kernel@...kajraghav.com,
mcgrof@...nel.org, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
yi.zhang@...wei.com, yangerkun@...wei.com, chengzhihao1@...wei.com,
libaokun1@...wei.com
Subject: Re: [PATCH 25/25] ext4: enable block size larger than page size
On Sat 25-10-25 11:22:21, libaokun@...weicloud.com wrote:
> From: Baokun Li <libaokun1@...wei.com>
>
> Since block device (See commit 3c20917120ce ("block/bdev: enable large
> folio support for large logical block sizes")) and page cache (See commit
> ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page
> cache")) has the ability to have a minimum order when allocating folio,
> and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable
> large folio for regular file"), now add support for block_size > PAGE_SIZE
> in ext4.
>
> set_blocksize() -> bdev_validate_blocksize() already validates the block
> size, so ext4_load_super() does not need to perform additional checks.
>
> Here we only need to enable large folio by default when s_min_folio_order
> is greater than 0 and add the FS_LBS bit to fs_flags.
>
> In addition, mark this feature as experimental.
>
> Signed-off-by: Baokun Li <libaokun1@...wei.com>
> Reviewed-by: Zhang Yi <yi.zhang@...wei.com>
...
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 04f9380d4211..ba6cf05860ae 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5146,6 +5146,9 @@ static bool ext4_should_enable_large_folio(struct inode *inode)
> if (!ext4_test_mount_flag(sb, EXT4_MF_LARGE_FOLIO))
> return false;
>
> + if (EXT4_SB(sb)->s_min_folio_order)
> + return true;
> +
But now files with data journalling flag enabled will get large folios
possibly significantly greater that blocksize. I don't think there's a
fundamental reason why data journalling doesn't work with large folios, the
only thing that's likely going to break is that credit estimates will go
through the roof if there are too many blocks per folio. But that can be
handled by setting max folio order to be equal to min folio order when
journalling data for the inode.
It is a bit scary to be modifying max folio order in
ext4_change_inode_journal_flag() but I guess less scary than setting new
aops and if we prune the whole page cache before touching the order and
inode flag, we should be safe (famous last words ;).
Honza
> if (!S_ISREG(inode->i_mode))
> return false;
> if (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index fdc006a973aa..4c0bd79bdf68 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5053,6 +5053,9 @@ static int ext4_check_large_folio(struct super_block *sb)
> return -EINVAL;
> }
>
> + if (sb->s_blocksize > PAGE_SIZE)
> + ext4_msg(sb, KERN_NOTICE, "EXPERIMENTAL bs(%lu) > ps(%lu) enabled.",
> + sb->s_blocksize, PAGE_SIZE);
> return 0;
> }
>
> @@ -7432,7 +7435,8 @@ static struct file_system_type ext4_fs_type = {
> .init_fs_context = ext4_init_fs_context,
> .parameters = ext4_param_specs,
> .kill_sb = ext4_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME |
> + FS_LBS,
> };
> MODULE_ALIAS_FS("ext4");
>
> --
> 2.46.1
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists