lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2c601dba823$bb2fdf00$318f9d00$@samsung.com>
Date: Tue, 8 Apr 2025 10:15:35 +0900
From: "Sungjong Seo" <sj1557.seo@...sung.com>
To: "'Anthony Iliopoulos'" <ailiop@...e.com>, "'Namjae Jeon'"
	<linkinjeon@...nel.org>, "'Yuezhang Mo'" <yuezhang.mo@...y.com>
Cc: <linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<sjdev.seo@...il.com>, <cpgs@...sung.com>, <sj1557.seo@...sung.com>
Subject: RE: [PATCH] exfat: enable request merging for dir readahead

Hi, Anthony

> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
> 
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
> 
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.

Good approach. However, this attempt was in the past Samsung code,
and there was a problem that the latency of directory-related operations
became longer when ra_count is large (maybe, MAX_RA_SIZE).
In the most recent code, blk_flush_plug is being done in units of
pages as follows.

```
blk_start_plug(&plug);
for (i = 0; i < ra_count; i++) {
        if (i && !(i & (sects_per_page - 1)))
                blk_flush_plug(&plug, false);
        sb_breadahead(sb, sec + i);
}
blk_finish_plug(&plug);
```

However, since blk_flush_plug is not exported, it can no longer be used in
module build. It seems that blk_flush_plug needs to be exported or
improved to repeat blk_start_plug and blk_finish_plug in units of pages.

After changing to plug by page unit, could you also compare the throughput?

Thanks

> 
> Signed-off-by: Anthony Iliopoulos <ailiop@...e.com>
> ---
>  fs/exfat/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb,
> sector_t sec)
>  {
>  	struct exfat_sb_info *sbi = EXFAT_SB(sb);
>  	struct buffer_head *bh;
> +	struct blk_plug plug;
>  	unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb-
> >s_blocksize_bits;
>  	unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
>  	unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block
*sb,
> sector_t sec)
>  	if (!bh || !buffer_uptodate(bh)) {
>  		unsigned int i;
> 
> +		blk_start_plug(&plug);
>  		for (i = 0; i < ra_count; i++)
>  			sb_breadahead(sb, (sector_t)(sec + i));
> +		blk_finish_plug(&plug);
>  	}
>  	brelse(bh);
>  	return 0;
> --
> 2.49.0



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ