linux-kernel - Re: [PATCH] mm/readahead: Skip fully overlapped range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mze6nnqy2xwwqaz5xpwkthx3x4n6yd5vgbnyateyjlyjefiwde@qclv7inpacqe>
Date: Thu, 16 Oct 2025 18:21:19 +0200
From: Jan Kara <jack@...e.cz>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Aubrey Li <aubrey.li@...ux.intel.com>, Jan Kara <jack@...e.cz>, 
	Matthew Wilcox <willy@...radead.org>, Nanhai Zou <nanhai.zou@...el.com>, 
	Gang Deng <gang.deng@...el.com>, Tianyou Li <tianyou.li@...el.com>, 
	Vinicius Gomes <vinicius.gomes@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>, 
	Chen Yu <yu.c.chen@...el.com>, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Roman Gushchin <roman.gushchin@...ux.dev>
Subject: Re: [PATCH] mm/readahead: Skip fully overlapped range

Sorry for not replying earlier. I wanted make up my mind about this and
other stuff was keeping preempting me...

On Sat 11-10-25 15:20:42, Andrew Morton wrote:
> On Tue, 30 Sep 2025 13:35:43 +0800 Aubrey Li <aubrey.li@...ux.intel.com> wrote:
> 
> > file_ra_state is considered a performance hint, not a critical correctness
> > field. The race conditions on file's readahead state don't affect the
> > correctness of file I/O because later the page cache mechanisms ensure data
> > consistency, it won't cause wrong data to be read. I think that's why we do
> > not lock file_ra_state today, to avoid performance penalties on this hot path.
> > 
> > That said, this patch didn't make things worse, and it does take a risk but
> > brings the rewards of RocksDB's readseq benchmark.
> 
> So if I may summarize:
> 
> - you've identifed and addressed an issue with concurrent readahead
>   against an fd

Right but let me also note that the patch modifies only
force_page_cache_ra() which is a pretty peculiar function. It's used at two
places:
1) When page_cache_sync_ra() decides it isn't worth to do a proper
readahead and just wants to read that one one.

2) From POSIX_FADV_WILLNEED - I suppose this is Aubrey's case.

As such it seems to be fixing mostly a "don't do it when it hurts" kind of
load from the benchmark than a widely used practical case since I'm not
sure many programs call POSIX_FADV_WILLNEED from many threads in parallel
for the same range.

> - Jan points out that we don't properly handle concurrent access to a
>   file's ra_state.  This is somewhat offtopic, but we should address
>   this sometime anyway.  Then we can address the RocksDB issue later.

The problem I had with the patch is that it adds more racy updates & checks
for the shared ra state so it's kind of difficult to say whether some
workload will not now more often clobber the ra state resulting in poor
readahead behavior. Also as I looked into the patch now another objection I
have is that force_page_cache_ra() previously didn't touch the ra state at
all, it just read the requested pages. After the patch
force_page_cache_ra() will destroy the readahead state completely. This is
definitely something we don't want to do.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR