[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZcAfF18OM2kqKsBe@dread.disaster.area>
Date: Mon, 5 Feb 2024 10:34:47 +1100
From: Dave Chinner <david@...morbit.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
David Hildenbrand <david@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Don Dutile <ddutile@...hat.com>, Rafael Aquini <raquini@...hat.com>,
Mike Snitzer <snitzer@...nel.org>
Subject: Re: [PATCH] mm/madvise: set ra_pages as device max request size
during ADV_POPULATE_READ
On Fri, Feb 02, 2024 at 10:20:29AM +0800, Ming Lei wrote:
> madvise(MADV_POPULATE_READ) tries to populate all page tables in the
> specific range, so it is usually sequential IO if VMA is backed by
> file.
>
> Set ra_pages as device max request size for the involved readahead in
> the ADV_POPULATE_READ, this way reduces latency of madvise(MADV_POPULATE_READ)
> to 1/10 when running madvise(MADV_POPULATE_READ) over one 1GB file with
> usual(default) 128KB of read_ahead_kb.
>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: Alexander Viro <viro@...iv.linux.org.uk>
> Cc: Christian Brauner <brauner@...nel.org>
> Cc: Don Dutile <ddutile@...hat.com>
> Cc: Rafael Aquini <raquini@...hat.com>
> Cc: Dave Chinner <david@...morbit.com>
> Cc: Mike Snitzer <snitzer@...nel.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Ming Lei <ming.lei@...hat.com>
> ---
> mm/madvise.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 912155a94ed5..db5452c8abdd 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -900,6 +900,37 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> return -EINVAL;
> }
>
> +static void madvise_restore_ra_win(struct file **file, unsigned int ra_pages)
> +{
> + if (*file) {
> + struct file *f = *file;
> +
> + f->f_ra.ra_pages = ra_pages;
> + fput(f);
> + *file = NULL;
> + }
> +}
> +
> +static struct file *madvise_override_ra_win(struct file *f,
> + unsigned long start, unsigned long end,
> + unsigned int *old_ra_pages)
> +{
> + unsigned int io_pages;
> +
> + if (!f || !f->f_mapping || !f->f_mapping->host)
> + return NULL;
> +
> + io_pages = inode_to_bdi(f->f_mapping->host)->io_pages;
> + if (((end - start) >> PAGE_SHIFT) < io_pages)
> + return NULL;
> +
> + f = get_file(f);
> + *old_ra_pages = f->f_ra.ra_pages;
> + f->f_ra.ra_pages = io_pages;
> +
> + return f;
> +}
This won't do what you think if the file has been marked
FMODE_RANDOM before this populate call.
IOWs, I don't think madvise should be digging in the struct file
readahead stuff here. It should call vfs_fadvise(FADV_SEQUENTIAL) to
do the set the readahead mode, rather that try to duplicate
FADV_SEQUENTIAL (badly). We already do this for WILLNEED to make it
do the right thing, we should be doing the same thing here.
Also, AFAICT, there is no need for get_file()/fput() here - the vma
already has a reference to the struct file, and the vma should not
be going away whilst the madvise() operation is in progress.
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists