lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZbcDvTkeDKttPfJ4@dread.disaster.area>
Date: Mon, 29 Jan 2024 12:47:41 +1100
From: Dave Chinner <david@...morbit.com>
To: Mike Snitzer <snitzer@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, Ming Lei <ming.lei@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Don Dutile <ddutile@...hat.com>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>
Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops
 in willneed range

On Sun, Jan 28, 2024 at 07:39:49PM -0500, Mike Snitzer wrote:
> On Sun, Jan 28, 2024 at 7:22 PM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote:
> > > On Sun, Jan 28 2024 at  5:02P -0500,
> > > Matthew Wilcox <willy@...radead.org> wrote:
> > Understood.  But ... the application is asking for as much readahead as
> > possible, and the sysadmin has said "Don't readahead more than 64kB at
> > a time".  So why will we not get a bug report in 1-15 years time saying
> > "I put a limit on readahead and the kernel is ignoring it"?  I think
> > typically we allow the sysadmin to override application requests,
> > don't we?
> 
> The application isn't knowingly asking for readahead.  It is asking to
> mmap the file (and reporter wants it done as quickly as possible..
> like occurred before).

.. which we do within the constraints of the given configuration.

> This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap
> request size based on read-ahead setting") -- same logic, just applied
> to callchain that ends up using madvise(MADV_WILLNEED).

Not really. There is a difference between performing a synchronous
read IO here that we must complete, compared to optimistic
asynchronous read-ahead which we can fail or toss away without the
user ever seeing the data the IO returned.

We want required IO to be done in as few, larger IOs as possible,
and not be limited by constraints placed on background optimistic
IOs.

madvise(WILLNEED) is optimistic IO - there is no requirement that it
complete the data reads successfully. If the data is actually
required, we'll guarantee completion when the user accesses it, not
when madvise() is called.  IOWs, madvise is async readahead, and so
really should be constrained by readahead bounds and not user IO
bounds.

We could change this behaviour for madvise of large ranges that we
force into the page cache by ignoring device readahead bounds, but
I'm not sure we want to do this in general.

Perhaps fadvise/madvise(willneed) can fiddle the file f_ra.ra_pages
value in this situation to override the device limit for large
ranges (for some definition of large - say 10x bdi->ra_pages) and
restore it once the readahead operation is done. This would make it
behave less like readahead and more like a user read from an IO
perspective...

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ