linux-kernel - Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZbcU1PXACdxbtjWx@fedora>
Date: Mon, 29 Jan 2024 11:00:36 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Mike Snitzer <snitzer@...nel.org>,
	Don Dutile <ddutile@...hat.com>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	ming.lei@...hat.com
Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops
 in willneed range

On Sun, Jan 28, 2024 at 10:02:49PM +0000, Matthew Wilcox wrote:
> On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote:
> > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for
> > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED
> > only tries to readahead 512 pages, and the remained part in the advised
> > range fallback on normal readahead.
> 
> Does the MAINTAINERS file mean nothing any more?

It is just miss to Cc you, sorry.

> 
> > If bdi->ra_pages is set as small, readahead will perform not efficient
> > enough. Increasing read ahead may not be an option since workload may
> > have mixed random and sequential I/O.
> 
> I thik there needs to be a lot more explanation than this about what's
> going on before we jump to "And therefore this patch is the right
> answer".

Both 6d2be915e589 and the commit log provids background about this
issue, let me explain it more:

1) before commit 6d2be915e589, madvise/fadvise(WILLNEED)/readahead
syscalls try to readahead in the specified range if memory is allowed,
and for each readahead in this range, the ra size is set as max sectors
of the block device, see force_page_cache_ra().

2) since commit 6d2be915e589, only 2MB bytes are load in these syscalls,
and the remained bytes fallback to future normal readahead when reads
from page cache or mmap buffer

3) this patch wires the advise(WILLNEED) range info to normal readahead for
both mmap fault and buffered read code path, so each readhead can use
max sectors of block size for the ra, basically takes the similar
approach before commit 6d2be915e589

> 
> > @@ -972,6 +974,7 @@ struct file_ra_state {
> >  	unsigned int ra_pages;
> >  	unsigned int mmap_miss;
> >  	loff_t prev_pos;
> > +	struct maple_tree *need_mt;
> 
> No.  Embed the struct maple tree.  Don't allocate it.  What made you
> think this was the right approach?

Can you explain why it has to be embedded? core-api/maple_tree.rst
mentioned it is fine to call "mt_init() for dynamically allocated ones".

maple tree provides one easy way to record the advised willneed range,
so readahead code path can apply this info for speedup readahead.


Thanks,
Ming