lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 1 Feb 2024 23:43:11 -0500
From: Mike Snitzer <snitzer@...nel.org>
To: Ming Lei <ming.lei@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	David Hildenbrand <david@...hat.com>,
	Matthew Wilcox <willy@...radead.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>,
	Don Dutile <ddutile@...hat.com>, Rafael Aquini <raquini@...hat.com>,
	Dave Chinner <david@...morbit.com>
Subject: Re: mm/madvise: set ra_pages as device max request size during
 ADV_POPULATE_READ

On Thu, Feb 01 2024 at  9:20P -0500,
Ming Lei <ming.lei@...hat.com> wrote:

> madvise(MADV_POPULATE_READ) tries to populate all page tables in the
> specific range, so it is usually sequential IO if VMA is backed by
> file.
> 
> Set ra_pages as device max request size for the involved readahead in
> the ADV_POPULATE_READ, this way reduces latency of madvise(MADV_POPULATE_READ)
> to 1/10 when running madvise(MADV_POPULATE_READ) over one 1GB file with
> usual(default) 128KB of read_ahead_kb.
> 
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: Alexander Viro <viro@...iv.linux.org.uk>
> Cc: Christian Brauner <brauner@...nel.org>
> Cc: Don Dutile <ddutile@...hat.com>
> Cc: Rafael Aquini <raquini@...hat.com>
> Cc: Dave Chinner <david@...morbit.com>
> Cc: Mike Snitzer <snitzer@...nel.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Ming Lei <ming.lei@...hat.com>
> ---
>  mm/madvise.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 912155a94ed5..db5452c8abdd 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -900,6 +900,37 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>  		return -EINVAL;
>  }
>  
> +static void madvise_restore_ra_win(struct file **file, unsigned int ra_pages)
> +{
> +	if (*file) {
> +		struct file *f = *file;
> +
> +		f->f_ra.ra_pages = ra_pages;
> +		fput(f);
> +		*file = NULL;
> +	}
> +}
> +
> +static struct file *madvise_override_ra_win(struct file *f,
> +		unsigned long start, unsigned long end,
> +		unsigned int *old_ra_pages)
> +{
> +	unsigned int io_pages;
> +
> +	if (!f || !f->f_mapping || !f->f_mapping->host)
> +		return NULL;
> +
> +	io_pages = inode_to_bdi(f->f_mapping->host)->io_pages;
> +	if (((end - start) >> PAGE_SHIFT) < io_pages)
> +		return NULL;
> +
> +	f = get_file(f);
> +	*old_ra_pages = f->f_ra.ra_pages;
> +	f->f_ra.ra_pages = io_pages;
> +
> +	return f;
> +}
> +

Does this override imply that madvise_populate resorts to calling
filemap_fault() and here you're just arming it to use the larger
->io_pages for the duration of all associated faulting?

Wouldn't it be better to avoid faulting and build up larger page
vectors that get sent down to the block layer in one go and let the
block layer split using the device's limits? (like happens with
force_page_cache_ra)

I'm concerned that madvise_populate isn't so efficient with filemap
due to excessive faulting (*BUT* I haven't traced to know, I'm just
inferring that is why twiddling f->f_ra.ra_pages helps improve
madvise_populate by having it issue larger IO. Apologies if I'm way
off base)

Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ