[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94de227e-23c1-4089-b99c-e8fc0beae5da@huaweicloud.com>
Date: Thu, 26 Jun 2025 21:26:41 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: "D, Suneeth" <Suneeth.D@....com>, linux-ext4@...r.kernel.org
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
willy@...radead.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com,
yangerkun@...wei.com
Subject: Re: [PATCH v2 8/8] ext4: enable large folio for regular file
Hello Suneeth D!
On 2025/6/26 19:29, D, Suneeth wrote:
>
> Hello Zhang Yi,
>
> On 5/12/2025 12:03 PM, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Besides fsverity, fscrypt, and the data=journal mode, ext4 now supports
>> large folios for regular files. Enable this feature by default. However,
>> since we cannot change the folio order limitation of mappings on active
>> inodes, setting the journal=data mode via ioctl on an active inode will
>> not take immediate effect in non-delalloc mode.
>>
>
> We run lmbench3 as part of our Weekly CI for the purpose of Kernel Performance Regression testing between a stable vs rc kernel. We noticed a regression on the kernels starting from 6.16-rc1 all the way through 6.16-rc3 in the range of 8-12%. Further bisection b/w 6.15 and 6.16-rc1 pointed me to the first bad commit as 7ac67301e82f02b77a5c8e7377a1f414ef108b84. The following were the machine configurations and test parameters used:-
>
> Model name: AMD EPYC 9754 128-Core Processor [Bergamo]
> Thread(s) per core: 2
> Core(s) per socket: 128
> Socket(s): 1
> Total online memory: 258G
>
> micro-benchmark_variant: "lmbench3-development-1-0-MMAP-50%" which has the following parameters,
>
> -> nr_thread: 1
> -> memory_size: 50%
> -> mode: development
> -> test: MMAP
>
> The following are the stats after bisection:-
>
> (the KPI used here is lmbench3.MMAP.read.latency.us)
>
> v6.15 - 97.3K
>
> v6.16-rc1 - 107.5K
>
> v6.16-rc3 - 107.4K
>
> 6.15.0-rc4badcommit - 103.5K
>
> 6.15.0-rc4badcommit_m1 (one commit before bad-commit) - 94.2K
Thanks for the report, I will try to reproduce this performance regression on
my machine and find out what caused this regression.
Thanks,
Yi.
>
> I also ran the micro-benchmark with tools/testing/perf record and following is the output from tools/testing/perf diff b/w the bad commit and just one commit before that.
>
> # ./perf diff perf.data.old perf.data
> No kallsyms or vmlinux with build-id da8042fb274c5e3524318e5e3afbeeef5df2055e was found
> # Event 'cycles:P'
> #
> # Baseline Delta Abs Shared Object Symbol
>
> >
> # ........ ......... ....................... ....................................................................................................................................................................................>
> #
> +4.34% [kernel.kallsyms] [k] __lruvec_stat_mod_folio
> +3.41% [kernel.kallsyms] [k] unmap_page_range
> +3.33% [kernel.kallsyms] [k] __mod_memcg_lruvec_state
> +2.04% [kernel.kallsyms] [k] srso_alias_return_thunk
> +2.02% [kernel.kallsyms] [k] srso_alias_safe_ret
> 22.22% -1.78% bw_mmap_rd [.] bread
> +1.76% [kernel.kallsyms] [k] __handle_mm_fault
> +1.70% [kernel.kallsyms] [k] filemap_map_pages
> +1.58% [kernel.kallsyms] [k] set_pte_range
> +1.58% [kernel.kallsyms] [k] next_uptodate_folio
> +1.33% [kernel.kallsyms] [k] do_anonymous_page
> +1.01% [kernel.kallsyms] [k] get_page_from_freelist
> +0.98% [kernel.kallsyms] [k] __mem_cgroup_charge
> +0.85% [kernel.kallsyms] [k] asm_exc_page_fault
> +0.82% [kernel.kallsyms] [k] native_irq_return_iret
> +0.82% [kernel.kallsyms] [k] do_user_addr_fault
> +0.77% [kernel.kallsyms] [k] clear_page_erms
> +0.75% [kernel.kallsyms] [k] handle_mm_fault
> +0.73% [kernel.kallsyms] [k] set_ptes.isra.0
> +0.70% [kernel.kallsyms] [k] lru_add
> +0.69% [kernel.kallsyms] [k] folio_add_file_rmap_ptes
> +0.68% [kernel.kallsyms] [k] folio_remove_rmap_ptes
> 12.45% -0.65% line [.] mem_benchmark_0
> +0.64% [kernel.kallsyms] [k] __alloc_frozen_pages_noprof
> +0.63% [kernel.kallsyms] [k] vm_normal_page
> +0.63% [kernel.kallsyms] [k] free_pages_and_swap_cache
> +0.63% [kernel.kallsyms] [k] lock_vma_under_rcu
> +0.60% [kernel.kallsyms] [k] __rcu_read_unlock
> +0.59% [kernel.kallsyms] [k] cgroup_rstat_updated
> +0.57% [kernel.kallsyms] [k] get_mem_cgroup_from_mm
> +0.52% [kernel.kallsyms] [k] __mod_lruvec_state
> +0.51% [kernel.kallsyms] [k] exc_page_fault
>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>> ---
>> fs/ext4/ext4.h | 1 +
>> fs/ext4/ext4_jbd2.c | 3 ++-
>> fs/ext4/ialloc.c | 3 +++
>> fs/ext4/inode.c | 20 ++++++++++++++++++++
>> 4 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 5a20e9cd7184..2fad90c30493 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -2993,6 +2993,7 @@ int ext4_walk_page_buffers(handle_t *handle,
>> struct buffer_head *bh));
>> int do_journal_get_write_access(handle_t *handle, struct inode *inode,
>> struct buffer_head *bh);
>> +bool ext4_should_enable_large_folio(struct inode *inode);
>> #define FALL_BACK_TO_NONDELALLOC 1
>> #define CONVERT_INLINE_DATA 2
>> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
>> index 135e278c832e..b3e9b7bd7978 100644
>> --- a/fs/ext4/ext4_jbd2.c
>> +++ b/fs/ext4/ext4_jbd2.c
>> @@ -16,7 +16,8 @@ int ext4_inode_journal_mode(struct inode *inode)
>> ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
>> test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
>> (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
>> - !test_opt(inode->i_sb, DELALLOC))) {
>> + !test_opt(inode->i_sb, DELALLOC) &&
>> + !mapping_large_folio_support(inode->i_mapping))) {
>> /* We do not support data journalling for encrypted data */
>> if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode))
>> return EXT4_INODE_ORDERED_DATA_MODE; /* ordered */
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index e7ecc7c8a729..4938e78cbadc 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -1336,6 +1336,9 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idmap,
>> }
>> }
>> + if (ext4_should_enable_large_folio(inode))
>> + mapping_set_large_folios(inode->i_mapping);
>> +
>> ext4_update_inode_fsync_trans(handle, inode, 1);
>> err = ext4_mark_inode_dirty(handle, inode);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 29eccdf8315a..7fd3921cfe46 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4774,6 +4774,23 @@ static int check_igot_inode(struct inode *inode, ext4_iget_flags flags,
>> return -EFSCORRUPTED;
>> }
>> +bool ext4_should_enable_large_folio(struct inode *inode)
>> +{
>> + struct super_block *sb = inode->i_sb;
>> +
>> + if (!S_ISREG(inode->i_mode))
>> + return false;
>> + if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
>> + ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
>> + return false;
>> + if (ext4_has_feature_verity(sb))
>> + return false;
>> + if (ext4_has_feature_encrypt(sb))
>> + return false;
>> +
>> + return true;
>> +}
>> +
>> struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>> ext4_iget_flags flags, const char *function,
>> unsigned int line)
>> @@ -5096,6 +5113,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
>> ret = -EFSCORRUPTED;
>> goto bad_inode;
>> }
>> + if (ext4_should_enable_large_folio(inode))
>> + mapping_set_large_folios(inode->i_mapping);
>> +
>> ret = check_igot_inode(inode, flags, function, line);
>> /*
>> * -ESTALE here means there is nothing inherently wrong with the inode,
>
> ---
> Thanks and Regards,
> Suneeth D
Powered by blists - more mailing lists