[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c839cf3a-6aef-435e-b915-156ada202b31@linux.alibaba.com>
Date: Mon, 14 Jul 2025 14:24:44 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Chao Yu <chao@...nel.org>, xiang@...nel.org
Cc: linux-erofs@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
Yue Hu <zbestahu@...il.com>, Jeffle Xu <jefflexu@...ux.alibaba.com>,
Sandeep Dhavale <dhavale@...gle.com>, Hongbo Li <lihongbo22@...wei.com>
Subject: Re: [PATCH v2 2/2] erofs: support to readahead dirent blocks in
erofs_readdir()
On 2025/7/14 14:16, Chao Yu wrote:
> On 7/14/25 11:49, Gao Xiang wrote:
>> Hi Chao,
>>
>> On 2025/7/14 11:34, Chao Yu wrote:
>>> This patch supports to readahead more blocks in erofs_readdir(), it can
>>> enhance readdir performance in large direcotry.
>>>
>>> readdir test in a large directory which contains 12000 sub-files.
>>>
>>> files_per_second
>>> Before: 926385.54
>>> After: 2380435.562
>>>
>>> Meanwhile, let's introduces a new sysfs entry to control page count
>>> of readahead to provide more flexible policy for readahead of readdir().
>>> - location: /sys/fs/erofs/<disk>/dir_ra_pages
>>> - default value: 4
>>> - range: [0, 128]
>>> - disable readahead: set the value to 0
>>>
>>> Signed-off-by: Chao Yu <chao@...nel.org>
>>> ---
>>> v2:
>>> - introduce sysfs node to control page count of readahead during
>>> readdir().
>>> Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++
>>> fs/erofs/dir.c | 9 +++++++++
>>> fs/erofs/internal.h | 5 +++++
>>> fs/erofs/super.c | 2 ++
>>> fs/erofs/sysfs.c | 5 +++++
>>> 5 files changed, 28 insertions(+)
>>>
>>> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs b/Documentation/ABI/testing/sysfs-fs-erofs
>>> index bf3b6299c15e..500c93484e4c 100644
>>> --- a/Documentation/ABI/testing/sysfs-fs-erofs
>>> +++ b/Documentation/ABI/testing/sysfs-fs-erofs
>>> @@ -35,3 +35,10 @@ Description: Used to set or show hardware accelerators in effect
>>> and multiple accelerators are separated by '\n'.
>>> Supported accelerator(s): qat_deflate.
>>> Disable all accelerators with an empty string (echo > accel).
>>> +
>>> +What: /sys/fs/erofs/<disk>/dir_ra_pages
>>> +Date: July 2025
>>> +Contact: "Chao Yu" <chao@...nel.org>
>>> +Description: Used to set or show page count of readahead during readdir(),
>>> + the range of value is [0, 128], by default it is 4, set it to
>>> + 0 to disable readahead.
>>> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
>>> index 3e4b38bec0aa..40f828d5b670 100644
>>> --- a/fs/erofs/dir.c
>>> +++ b/fs/erofs/dir.c
>>> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>> struct inode *dir = file_inode(f);
>>> struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
>>> struct super_block *sb = dir->i_sb;
>>> + struct file_ra_state *ra = &f->f_ra;
>>> unsigned long bsz = sb->s_blocksize;
>>> unsigned int ofs = erofs_blkoff(sb, ctx->pos);
>>> + unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE);
>>
>> why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?
>
> Seems DIV_ROUND_UP_POW2() doesn't use bit shift as expected, let
> me use DIV_ROUND_UP() instead.
Oh, I think it's fine to use DIV_ROUND_UP_POW2 here.
>
>>
>>> int err = 0;
>>> bool initial = true;
>>> @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>> break;
>>> }
>>> + /* readahead blocks to enhance performance in large directory */
>>> + if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 &&
>>
>> dbstart is a byte-oriented value, so I'm not sure if it
>> works as you expect..
>
> Oh, I checked patch in 6.6, found that I missed to handle it correctly
> when porting code to upstream, let me fix this.
>
>>
>>> + !ra_has_index(ra, dbstart))
>>> + page_cache_sync_readahead(dir->i_mapping, ra, f,
>>> + dbstart, min(nr_pages - dbstart,
>>
>> same here.
>>
>>> + (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages));
>>> +
>>> de = erofs_bread(&buf, dbstart, true);
>>> if (IS_ERR(de)) {
>>> erofs_err(sb, "failed to readdir of logical block %llu of nid %llu",
>>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>>> index 0d19bde8c094..f0e5b4273aa8 100644
>>> --- a/fs/erofs/internal.h
>>> +++ b/fs/erofs/internal.h
>>> @@ -157,6 +157,7 @@ struct erofs_sb_info {
>>> /* sysfs support */
>>> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
>>> struct completion s_kobj_unregister;
>>> + unsigned int dir_ra_pages;
>>> /* fscache support */
>>> struct fscache_volume *volume;
>>> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>>> #define EROFS_I_BL_XATTR_BIT (BITS_PER_LONG - 1)
>>> #define EROFS_I_BL_Z_BIT (BITS_PER_LONG - 2)
>>> +/* default/maximum readahead pages of directory */
>>> +#define DEFAULT_DIR_RA_PAGES 4
>>> +#define MAX_DIR_RA_PAGES 128
>>
>> better to add EROFS_ prefix for them.
>>
>> Also could we setup those blocks or bytes instead
>> of pages?
>>
>> If users would like to setup values, they may don't
>> care more the page size since they only care about
>> images.
>
> Let's use bytes, and then roundup to blksize?
Agreed.
>
>>
>> Also why do we limit maximum number even if users
>> would like to readahead more? (such as fadvise
>> allows -1 too)
>
> No test, but I suspect there is border effect even we set it to
> i_size.
>
> Anyway, let me remove the upper boundary limitation, unless there
> is further requirement of limitation.
Ok.
Thanks,
Gao Xiang
>
> Thanks,
>
>>
>> Thanks,
>> Gao Xiang
>>
>
Powered by blists - more mailing lists