[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a501e5f-880e-4dc6-a984-b2406b54daa9@kernel.org>
Date: Mon, 14 Jul 2025 14:16:44 +0800
From: Chao Yu <chao@...nel.org>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>, xiang@...nel.org
Cc: chao@...nel.org, linux-erofs@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, Yue Hu <zbestahu@...il.com>,
Jeffle Xu <jefflexu@...ux.alibaba.com>, Sandeep Dhavale
<dhavale@...gle.com>, Hongbo Li <lihongbo22@...wei.com>
Subject: Re: [PATCH v2 2/2] erofs: support to readahead dirent blocks in
erofs_readdir()
On 7/14/25 11:49, Gao Xiang wrote:
> Hi Chao,
>
> On 2025/7/14 11:34, Chao Yu wrote:
>> This patch supports to readahead more blocks in erofs_readdir(), it can
>> enhance readdir performance in large direcotry.
>>
>> readdir test in a large directory which contains 12000 sub-files.
>>
>> files_per_second
>> Before: 926385.54
>> After: 2380435.562
>>
>> Meanwhile, let's introduces a new sysfs entry to control page count
>> of readahead to provide more flexible policy for readahead of readdir().
>> - location: /sys/fs/erofs/<disk>/dir_ra_pages
>> - default value: 4
>> - range: [0, 128]
>> - disable readahead: set the value to 0
>>
>> Signed-off-by: Chao Yu <chao@...nel.org>
>> ---
>> v2:
>> - introduce sysfs node to control page count of readahead during
>> readdir().
>> Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++
>> fs/erofs/dir.c | 9 +++++++++
>> fs/erofs/internal.h | 5 +++++
>> fs/erofs/super.c | 2 ++
>> fs/erofs/sysfs.c | 5 +++++
>> 5 files changed, 28 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs b/Documentation/ABI/testing/sysfs-fs-erofs
>> index bf3b6299c15e..500c93484e4c 100644
>> --- a/Documentation/ABI/testing/sysfs-fs-erofs
>> +++ b/Documentation/ABI/testing/sysfs-fs-erofs
>> @@ -35,3 +35,10 @@ Description: Used to set or show hardware accelerators in effect
>> and multiple accelerators are separated by '\n'.
>> Supported accelerator(s): qat_deflate.
>> Disable all accelerators with an empty string (echo > accel).
>> +
>> +What: /sys/fs/erofs/<disk>/dir_ra_pages
>> +Date: July 2025
>> +Contact: "Chao Yu" <chao@...nel.org>
>> +Description: Used to set or show page count of readahead during readdir(),
>> + the range of value is [0, 128], by default it is 4, set it to
>> + 0 to disable readahead.
>> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
>> index 3e4b38bec0aa..40f828d5b670 100644
>> --- a/fs/erofs/dir.c
>> +++ b/fs/erofs/dir.c
>> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>> struct inode *dir = file_inode(f);
>> struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
>> struct super_block *sb = dir->i_sb;
>> + struct file_ra_state *ra = &f->f_ra;
>> unsigned long bsz = sb->s_blocksize;
>> unsigned int ofs = erofs_blkoff(sb, ctx->pos);
>> + unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE);
>
> why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?
Seems DIV_ROUND_UP_POW2() doesn't use bit shift as expected, let
me use DIV_ROUND_UP() instead.
>
>> int err = 0;
>> bool initial = true;
>> @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>> break;
>> }
>> + /* readahead blocks to enhance performance in large directory */
>> + if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 &&
>
> dbstart is a byte-oriented value, so I'm not sure if it
> works as you expect..
Oh, I checked patch in 6.6, found that I missed to handle it correctly
when porting code to upstream, let me fix this.
>
>> + !ra_has_index(ra, dbstart))
>> + page_cache_sync_readahead(dir->i_mapping, ra, f,
>> + dbstart, min(nr_pages - dbstart,
>
> same here.
>
>> + (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages));
>> +
>> de = erofs_bread(&buf, dbstart, true);
>> if (IS_ERR(de)) {
>> erofs_err(sb, "failed to readdir of logical block %llu of nid %llu",
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index 0d19bde8c094..f0e5b4273aa8 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -157,6 +157,7 @@ struct erofs_sb_info {
>> /* sysfs support */
>> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
>> struct completion s_kobj_unregister;
>> + unsigned int dir_ra_pages;
>> /* fscache support */
>> struct fscache_volume *volume;
>> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>> #define EROFS_I_BL_XATTR_BIT (BITS_PER_LONG - 1)
>> #define EROFS_I_BL_Z_BIT (BITS_PER_LONG - 2)
>> +/* default/maximum readahead pages of directory */
>> +#define DEFAULT_DIR_RA_PAGES 4
>> +#define MAX_DIR_RA_PAGES 128
>
> better to add EROFS_ prefix for them.
>
> Also could we setup those blocks or bytes instead
> of pages?
>
> If users would like to setup values, they may don't
> care more the page size since they only care about
> images.
Let's use bytes, and then roundup to blksize?
>
> Also why do we limit maximum number even if users
> would like to readahead more? (such as fadvise
> allows -1 too)
No test, but I suspect there is border effect even we set it to
i_size.
Anyway, let me remove the upper boundary limitation, unless there
is further requirement of limitation.
Thanks,
>
> Thanks,
> Gao Xiang
>
Powered by blists - more mailing lists