lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a501e5f-880e-4dc6-a984-b2406b54daa9@kernel.org>
Date: Mon, 14 Jul 2025 14:16:44 +0800
From: Chao Yu <chao@...nel.org>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>, xiang@...nel.org
Cc: chao@...nel.org, linux-erofs@...ts.ozlabs.org,
 linux-kernel@...r.kernel.org, Yue Hu <zbestahu@...il.com>,
 Jeffle Xu <jefflexu@...ux.alibaba.com>, Sandeep Dhavale
 <dhavale@...gle.com>, Hongbo Li <lihongbo22@...wei.com>
Subject: Re: [PATCH v2 2/2] erofs: support to readahead dirent blocks in
 erofs_readdir()

On 7/14/25 11:49, Gao Xiang wrote:
> Hi Chao,
> 
> On 2025/7/14 11:34, Chao Yu wrote:
>> This patch supports to readahead more blocks in erofs_readdir(), it can
>> enhance readdir performance in large direcotry.
>>
>> readdir test in a large directory which contains 12000 sub-files.
>>
>>         files_per_second
>> Before:        926385.54
>> After:        2380435.562
>>
>> Meanwhile, let's introduces a new sysfs entry to control page count
>> of readahead to provide more flexible policy for readahead of readdir().
>> - location: /sys/fs/erofs/<disk>/dir_ra_pages
>> - default value: 4
>> - range: [0, 128]
>> - disable readahead: set the value to 0
>>
>> Signed-off-by: Chao Yu <chao@...nel.org>
>> ---
>> v2:
>> - introduce sysfs node to control page count of readahead during
>> readdir().
>>   Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++
>>   fs/erofs/dir.c                           | 9 +++++++++
>>   fs/erofs/internal.h                      | 5 +++++
>>   fs/erofs/super.c                         | 2 ++
>>   fs/erofs/sysfs.c                         | 5 +++++
>>   5 files changed, 28 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs b/Documentation/ABI/testing/sysfs-fs-erofs
>> index bf3b6299c15e..500c93484e4c 100644
>> --- a/Documentation/ABI/testing/sysfs-fs-erofs
>> +++ b/Documentation/ABI/testing/sysfs-fs-erofs
>> @@ -35,3 +35,10 @@ Description:    Used to set or show hardware accelerators in effect
>>           and multiple accelerators are separated by '\n'.
>>           Supported accelerator(s): qat_deflate.
>>           Disable all accelerators with an empty string (echo > accel).
>> +
>> +What:        /sys/fs/erofs/<disk>/dir_ra_pages
>> +Date:        July 2025
>> +Contact:    "Chao Yu" <chao@...nel.org>
>> +Description:    Used to set or show page count of readahead during readdir(),
>> +        the range of value is [0, 128], by default it is 4, set it to
>> +        0 to disable readahead.
>> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
>> index 3e4b38bec0aa..40f828d5b670 100644
>> --- a/fs/erofs/dir.c
>> +++ b/fs/erofs/dir.c
>> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>       struct inode *dir = file_inode(f);
>>       struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
>>       struct super_block *sb = dir->i_sb;
>> +    struct file_ra_state *ra = &f->f_ra;
>>       unsigned long bsz = sb->s_blocksize;
>>       unsigned int ofs = erofs_blkoff(sb, ctx->pos);
>> +    unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE);
> 
> why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?

Seems DIV_ROUND_UP_POW2() doesn't use bit shift as expected, let
me use DIV_ROUND_UP() instead.

> 
>>       int err = 0;
>>       bool initial = true;
>>   @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>               break;
>>           }
>>   +        /* readahead blocks to enhance performance in large directory */
>> +        if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 &&
> 
> dbstart is a byte-oriented value, so I'm not sure if it
> works as you expect..

Oh, I checked patch in 6.6, found that I missed to handle it correctly
when porting code to upstream, let me fix this.

> 
>> +            !ra_has_index(ra, dbstart))
>> +            page_cache_sync_readahead(dir->i_mapping, ra, f,
>> +                dbstart, min(nr_pages - dbstart,
> 
> same here.
> 
>> +                (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages));
>> +
>>           de = erofs_bread(&buf, dbstart, true);
>>           if (IS_ERR(de)) {
>>               erofs_err(sb, "failed to readdir of logical block %llu of nid %llu",
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index 0d19bde8c094..f0e5b4273aa8 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -157,6 +157,7 @@ struct erofs_sb_info {
>>       /* sysfs support */
>>       struct kobject s_kobj;        /* /sys/fs/erofs/<devname> */
>>       struct completion s_kobj_unregister;
>> +    unsigned int dir_ra_pages;
>>         /* fscache support */
>>       struct fscache_volume *volume;
>> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>>   #define EROFS_I_BL_XATTR_BIT    (BITS_PER_LONG - 1)
>>   #define EROFS_I_BL_Z_BIT    (BITS_PER_LONG - 2)
>>   +/* default/maximum readahead pages of directory */
>> +#define DEFAULT_DIR_RA_PAGES    4
>> +#define MAX_DIR_RA_PAGES    128
> 
> better to add EROFS_ prefix for them.
> 
> Also could we setup those blocks or bytes instead
> of pages?
> 
> If users would like to setup values, they may don't
> care more the page size since they only care about
> images.

Let's use bytes, and then roundup to blksize?

> 
> Also why do we limit maximum number even if users
> would like to readahead more? (such as fadvise
> allows -1 too)

No test, but I suspect there is border effect even we set it to
i_size.

Anyway, let me remove the upper boundary limitation, unless there
is further requirement of limitation.

Thanks,

> 
> Thanks,
> Gao Xiang
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ