[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dcb197e7-8dea-4bd2-9344-b753c10c534d@linux.alibaba.com>
Date: Mon, 14 Jul 2025 11:49:29 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Chao Yu <chao@...nel.org>, xiang@...nel.org
Cc: linux-erofs@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
Yue Hu <zbestahu@...il.com>, Jeffle Xu <jefflexu@...ux.alibaba.com>,
Sandeep Dhavale <dhavale@...gle.com>, Hongbo Li <lihongbo22@...wei.com>
Subject: Re: [PATCH v2 2/2] erofs: support to readahead dirent blocks in
erofs_readdir()
Hi Chao,
On 2025/7/14 11:34, Chao Yu wrote:
> This patch supports to readahead more blocks in erofs_readdir(), it can
> enhance readdir performance in large direcotry.
>
> readdir test in a large directory which contains 12000 sub-files.
>
> files_per_second
> Before: 926385.54
> After: 2380435.562
>
> Meanwhile, let's introduces a new sysfs entry to control page count
> of readahead to provide more flexible policy for readahead of readdir().
> - location: /sys/fs/erofs/<disk>/dir_ra_pages
> - default value: 4
> - range: [0, 128]
> - disable readahead: set the value to 0
>
> Signed-off-by: Chao Yu <chao@...nel.org>
> ---
> v2:
> - introduce sysfs node to control page count of readahead during
> readdir().
> Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++
> fs/erofs/dir.c | 9 +++++++++
> fs/erofs/internal.h | 5 +++++
> fs/erofs/super.c | 2 ++
> fs/erofs/sysfs.c | 5 +++++
> 5 files changed, 28 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs b/Documentation/ABI/testing/sysfs-fs-erofs
> index bf3b6299c15e..500c93484e4c 100644
> --- a/Documentation/ABI/testing/sysfs-fs-erofs
> +++ b/Documentation/ABI/testing/sysfs-fs-erofs
> @@ -35,3 +35,10 @@ Description: Used to set or show hardware accelerators in effect
> and multiple accelerators are separated by '\n'.
> Supported accelerator(s): qat_deflate.
> Disable all accelerators with an empty string (echo > accel).
> +
> +What: /sys/fs/erofs/<disk>/dir_ra_pages
> +Date: July 2025
> +Contact: "Chao Yu" <chao@...nel.org>
> +Description: Used to set or show page count of readahead during readdir(),
> + the range of value is [0, 128], by default it is 4, set it to
> + 0 to disable readahead.
> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
> index 3e4b38bec0aa..40f828d5b670 100644
> --- a/fs/erofs/dir.c
> +++ b/fs/erofs/dir.c
> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
> struct inode *dir = file_inode(f);
> struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
> struct super_block *sb = dir->i_sb;
> + struct file_ra_state *ra = &f->f_ra;
> unsigned long bsz = sb->s_blocksize;
> unsigned int ofs = erofs_blkoff(sb, ctx->pos);
> + unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE);
why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?
> int err = 0;
> bool initial = true;
>
> @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
> break;
> }
>
> + /* readahead blocks to enhance performance in large directory */
> + if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 &&
dbstart is a byte-oriented value, so I'm not sure if it
works as you expect..
> + !ra_has_index(ra, dbstart))
> + page_cache_sync_readahead(dir->i_mapping, ra, f,
> + dbstart, min(nr_pages - dbstart,
same here.
> + (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages));
> +
> de = erofs_bread(&buf, dbstart, true);
> if (IS_ERR(de)) {
> erofs_err(sb, "failed to readdir of logical block %llu of nid %llu",
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 0d19bde8c094..f0e5b4273aa8 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -157,6 +157,7 @@ struct erofs_sb_info {
> /* sysfs support */
> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
> struct completion s_kobj_unregister;
> + unsigned int dir_ra_pages;
>
> /* fscache support */
> struct fscache_volume *volume;
> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
> #define EROFS_I_BL_XATTR_BIT (BITS_PER_LONG - 1)
> #define EROFS_I_BL_Z_BIT (BITS_PER_LONG - 2)
>
> +/* default/maximum readahead pages of directory */
> +#define DEFAULT_DIR_RA_PAGES 4
> +#define MAX_DIR_RA_PAGES 128
better to add EROFS_ prefix for them.
Also could we setup those blocks or bytes instead
of pages?
If users would like to setup values, they may don't
care more the page size since they only care about
images.
Also why do we limit maximum number even if users
would like to readahead more? (such as fadvise
allows -1 too)
Thanks,
Gao Xiang
Powered by blists - more mailing lists