[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f59ef632-0d11-4ae7-bdad-d552fe1f1d78@amd.com>
Date: Thu, 26 Jun 2025 16:59:36 +0530
From: "D, Suneeth" <Suneeth.D@....com>
To: Zhang Yi <yi.zhang@...weicloud.com>, <linux-ext4@...r.kernel.org>
CC: <linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<willy@...radead.org>, <tytso@....edu>, <adilger.kernel@...ger.ca>,
<jack@...e.cz>, <yi.zhang@...wei.com>, <libaokun1@...wei.com>,
<yukuai3@...wei.com>, <yangerkun@...wei.com>
Subject: Re: [PATCH v2 8/8] ext4: enable large folio for regular file
Hello Zhang Yi,
On 5/12/2025 12:03 PM, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> Besides fsverity, fscrypt, and the data=journal mode, ext4 now supports
> large folios for regular files. Enable this feature by default. However,
> since we cannot change the folio order limitation of mappings on active
> inodes, setting the journal=data mode via ioctl on an active inode will
> not take immediate effect in non-delalloc mode.
>
We run lmbench3 as part of our Weekly CI for the purpose of Kernel
Performance Regression testing between a stable vs rc kernel. We noticed
a regression on the kernels starting from 6.16-rc1 all the way through
6.16-rc3 in the range of 8-12%. Further bisection b/w 6.15 and 6.16-rc1
pointed me to the first bad commit as
7ac67301e82f02b77a5c8e7377a1f414ef108b84. The following were the machine
configurations and test parameters used:-
Model name: AMD EPYC 9754 128-Core Processor [Bergamo]
Thread(s) per core: 2
Core(s) per socket: 128
Socket(s): 1
Total online memory: 258G
micro-benchmark_variant: "lmbench3-development-1-0-MMAP-50%" which has
the following parameters,
-> nr_thread: 1
-> memory_size: 50%
-> mode: development
-> test: MMAP
The following are the stats after bisection:-
(the KPI used here is lmbench3.MMAP.read.latency.us)
v6.15 - 97.3K
v6.16-rc1 - 107.5K
v6.16-rc3 - 107.4K
6.15.0-rc4badcommit - 103.5K
6.15.0-rc4badcommit_m1 (one commit before bad-commit) - 94.2K
I also ran the micro-benchmark with tools/testing/perf record and
following is the output from tools/testing/perf diff b/w the bad commit
and just one commit before that.
# ./perf diff perf.data.old perf.data
No kallsyms or vmlinux with build-id
da8042fb274c5e3524318e5e3afbeeef5df2055e was found
# Event 'cycles:P'
#
# Baseline Delta Abs Shared Object Symbol
>
# ........ ......... .......................
....................................................................................................................................................................................>
#
+4.34% [kernel.kallsyms] [k] __lruvec_stat_mod_folio
+3.41% [kernel.kallsyms] [k] unmap_page_range
+3.33% [kernel.kallsyms] [k]
__mod_memcg_lruvec_state
+2.04% [kernel.kallsyms] [k] srso_alias_return_thunk
+2.02% [kernel.kallsyms] [k] srso_alias_safe_ret
22.22% -1.78% bw_mmap_rd [.] bread
+1.76% [kernel.kallsyms] [k] __handle_mm_fault
+1.70% [kernel.kallsyms] [k] filemap_map_pages
+1.58% [kernel.kallsyms] [k] set_pte_range
+1.58% [kernel.kallsyms] [k] next_uptodate_folio
+1.33% [kernel.kallsyms] [k] do_anonymous_page
+1.01% [kernel.kallsyms] [k] get_page_from_freelist
+0.98% [kernel.kallsyms] [k] __mem_cgroup_charge
+0.85% [kernel.kallsyms] [k] asm_exc_page_fault
+0.82% [kernel.kallsyms] [k] native_irq_return_iret
+0.82% [kernel.kallsyms] [k] do_user_addr_fault
+0.77% [kernel.kallsyms] [k] clear_page_erms
+0.75% [kernel.kallsyms] [k] handle_mm_fault
+0.73% [kernel.kallsyms] [k] set_ptes.isra.0
+0.70% [kernel.kallsyms] [k] lru_add
+0.69% [kernel.kallsyms] [k]
folio_add_file_rmap_ptes
+0.68% [kernel.kallsyms] [k] folio_remove_rmap_ptes
12.45% -0.65% line [.] mem_benchmark_0
+0.64% [kernel.kallsyms] [k]
__alloc_frozen_pages_noprof
+0.63% [kernel.kallsyms] [k] vm_normal_page
+0.63% [kernel.kallsyms] [k]
free_pages_and_swap_cache
+0.63% [kernel.kallsyms] [k] lock_vma_under_rcu
+0.60% [kernel.kallsyms] [k] __rcu_read_unlock
+0.59% [kernel.kallsyms] [k] cgroup_rstat_updated
+0.57% [kernel.kallsyms] [k] get_mem_cgroup_from_mm
+0.52% [kernel.kallsyms] [k] __mod_lruvec_state
+0.51% [kernel.kallsyms] [k] exc_page_fault
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ---
> fs/ext4/ext4.h | 1 +
> fs/ext4/ext4_jbd2.c | 3 ++-
> fs/ext4/ialloc.c | 3 +++
> fs/ext4/inode.c | 20 ++++++++++++++++++++
> 4 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5a20e9cd7184..2fad90c30493 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2993,6 +2993,7 @@ int ext4_walk_page_buffers(handle_t *handle,
> struct buffer_head *bh));
> int do_journal_get_write_access(handle_t *handle, struct inode *inode,
> struct buffer_head *bh);
> +bool ext4_should_enable_large_folio(struct inode *inode);
> #define FALL_BACK_TO_NONDELALLOC 1
> #define CONVERT_INLINE_DATA 2
>
> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
> index 135e278c832e..b3e9b7bd7978 100644
> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -16,7 +16,8 @@ int ext4_inode_journal_mode(struct inode *inode)
> ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
> test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
> (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
> - !test_opt(inode->i_sb, DELALLOC))) {
> + !test_opt(inode->i_sb, DELALLOC) &&
> + !mapping_large_folio_support(inode->i_mapping))) {
> /* We do not support data journalling for encrypted data */
> if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode))
> return EXT4_INODE_ORDERED_DATA_MODE; /* ordered */
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index e7ecc7c8a729..4938e78cbadc 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -1336,6 +1336,9 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idmap,
> }
> }
>
> + if (ext4_should_enable_large_folio(inode))
> + mapping_set_large_folios(inode->i_mapping);
> +
> ext4_update_inode_fsync_trans(handle, inode, 1);
>
> err = ext4_mark_inode_dirty(handle, inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 29eccdf8315a..7fd3921cfe46 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4774,6 +4774,23 @@ static int check_igot_inode(struct inode *inode, ext4_iget_flags flags,
> return -EFSCORRUPTED;
> }
>
> +bool ext4_should_enable_large_folio(struct inode *inode)
> +{
> + struct super_block *sb = inode->i_sb;
> +
> + if (!S_ISREG(inode->i_mode))
> + return false;
> + if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
> + ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA))
> + return false;
> + if (ext4_has_feature_verity(sb))
> + return false;
> + if (ext4_has_feature_encrypt(sb))
> + return false;
> +
> + return true;
> +}
> +
> struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> ext4_iget_flags flags, const char *function,
> unsigned int line)
> @@ -5096,6 +5113,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> ret = -EFSCORRUPTED;
> goto bad_inode;
> }
> + if (ext4_should_enable_large_folio(inode))
> + mapping_set_large_folios(inode->i_mapping);
> +
> ret = check_igot_inode(inode, flags, function, line);
> /*
> * -ESTALE here means there is nothing inherently wrong with the inode,
---
Thanks and Regards,
Suneeth D
View attachment "lmbench_steps.txt" of type "text/plain" (1078 bytes)
Powered by blists - more mailing lists