lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c839cf3a-6aef-435e-b915-156ada202b31@linux.alibaba.com>
Date: Mon, 14 Jul 2025 14:24:44 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Chao Yu <chao@...nel.org>, xiang@...nel.org
Cc: linux-erofs@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
 Yue Hu <zbestahu@...il.com>, Jeffle Xu <jefflexu@...ux.alibaba.com>,
 Sandeep Dhavale <dhavale@...gle.com>, Hongbo Li <lihongbo22@...wei.com>
Subject: Re: [PATCH v2 2/2] erofs: support to readahead dirent blocks in
 erofs_readdir()



On 2025/7/14 14:16, Chao Yu wrote:
> On 7/14/25 11:49, Gao Xiang wrote:
>> Hi Chao,
>>
>> On 2025/7/14 11:34, Chao Yu wrote:
>>> This patch supports to readahead more blocks in erofs_readdir(), it can
>>> enhance readdir performance in large direcotry.
>>>
>>> readdir test in a large directory which contains 12000 sub-files.
>>>
>>>          files_per_second
>>> Before:        926385.54
>>> After:        2380435.562
>>>
>>> Meanwhile, let's introduces a new sysfs entry to control page count
>>> of readahead to provide more flexible policy for readahead of readdir().
>>> - location: /sys/fs/erofs/<disk>/dir_ra_pages
>>> - default value: 4
>>> - range: [0, 128]
>>> - disable readahead: set the value to 0
>>>
>>> Signed-off-by: Chao Yu <chao@...nel.org>
>>> ---
>>> v2:
>>> - introduce sysfs node to control page count of readahead during
>>> readdir().
>>>    Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++
>>>    fs/erofs/dir.c                           | 9 +++++++++
>>>    fs/erofs/internal.h                      | 5 +++++
>>>    fs/erofs/super.c                         | 2 ++
>>>    fs/erofs/sysfs.c                         | 5 +++++
>>>    5 files changed, 28 insertions(+)
>>>
>>> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs b/Documentation/ABI/testing/sysfs-fs-erofs
>>> index bf3b6299c15e..500c93484e4c 100644
>>> --- a/Documentation/ABI/testing/sysfs-fs-erofs
>>> +++ b/Documentation/ABI/testing/sysfs-fs-erofs
>>> @@ -35,3 +35,10 @@ Description:    Used to set or show hardware accelerators in effect
>>>            and multiple accelerators are separated by '\n'.
>>>            Supported accelerator(s): qat_deflate.
>>>            Disable all accelerators with an empty string (echo > accel).
>>> +
>>> +What:        /sys/fs/erofs/<disk>/dir_ra_pages
>>> +Date:        July 2025
>>> +Contact:    "Chao Yu" <chao@...nel.org>
>>> +Description:    Used to set or show page count of readahead during readdir(),
>>> +        the range of value is [0, 128], by default it is 4, set it to
>>> +        0 to disable readahead.
>>> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
>>> index 3e4b38bec0aa..40f828d5b670 100644
>>> --- a/fs/erofs/dir.c
>>> +++ b/fs/erofs/dir.c
>>> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>>        struct inode *dir = file_inode(f);
>>>        struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
>>>        struct super_block *sb = dir->i_sb;
>>> +    struct file_ra_state *ra = &f->f_ra;
>>>        unsigned long bsz = sb->s_blocksize;
>>>        unsigned int ofs = erofs_blkoff(sb, ctx->pos);
>>> +    unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE);
>>
>> why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?
> 
> Seems DIV_ROUND_UP_POW2() doesn't use bit shift as expected, let
> me use DIV_ROUND_UP() instead.

Oh, I think it's fine to use DIV_ROUND_UP_POW2 here.

> 
>>
>>>        int err = 0;
>>>        bool initial = true;
>>>    @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
>>>                break;
>>>            }
>>>    +        /* readahead blocks to enhance performance in large directory */
>>> +        if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 &&
>>
>> dbstart is a byte-oriented value, so I'm not sure if it
>> works as you expect..
> 
> Oh, I checked patch in 6.6, found that I missed to handle it correctly
> when porting code to upstream, let me fix this.
> 
>>
>>> +            !ra_has_index(ra, dbstart))
>>> +            page_cache_sync_readahead(dir->i_mapping, ra, f,
>>> +                dbstart, min(nr_pages - dbstart,
>>
>> same here.
>>
>>> +                (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages));
>>> +
>>>            de = erofs_bread(&buf, dbstart, true);
>>>            if (IS_ERR(de)) {
>>>                erofs_err(sb, "failed to readdir of logical block %llu of nid %llu",
>>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>>> index 0d19bde8c094..f0e5b4273aa8 100644
>>> --- a/fs/erofs/internal.h
>>> +++ b/fs/erofs/internal.h
>>> @@ -157,6 +157,7 @@ struct erofs_sb_info {
>>>        /* sysfs support */
>>>        struct kobject s_kobj;        /* /sys/fs/erofs/<devname> */
>>>        struct completion s_kobj_unregister;
>>> +    unsigned int dir_ra_pages;
>>>          /* fscache support */
>>>        struct fscache_volume *volume;
>>> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>>>    #define EROFS_I_BL_XATTR_BIT    (BITS_PER_LONG - 1)
>>>    #define EROFS_I_BL_Z_BIT    (BITS_PER_LONG - 2)
>>>    +/* default/maximum readahead pages of directory */
>>> +#define DEFAULT_DIR_RA_PAGES    4
>>> +#define MAX_DIR_RA_PAGES    128
>>
>> better to add EROFS_ prefix for them.
>>
>> Also could we setup those blocks or bytes instead
>> of pages?
>>
>> If users would like to setup values, they may don't
>> care more the page size since they only care about
>> images.
> 
> Let's use bytes, and then roundup to blksize?

Agreed.

> 
>>
>> Also why do we limit maximum number even if users
>> would like to readahead more? (such as fadvise
>> allows -1 too)
> 
> No test, but I suspect there is border effect even we set it to
> i_size.
> 
> Anyway, let me remove the upper boundary limitation, unless there
> is further requirement of limitation.

Ok.

Thanks,
Gao Xiang

> 
> Thanks,
> 
>>
>> Thanks,
>> Gao Xiang
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ