linux-kernel - Re: [PATCH] mm/readahead: Skip fully overlapped range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20251011152042.d0061f174dd934711bc1418b@linux-foundation.org>
Date: Sat, 11 Oct 2025 15:20:42 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Aubrey Li <aubrey.li@...ux.intel.com>
Cc: Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>, Nanhai
 Zou <nanhai.zou@...el.com>, Gang Deng <gang.deng@...el.com>, Tianyou Li
 <tianyou.li@...el.com>, Vinicius Gomes <vinicius.gomes@...el.com>, Tim Chen
 <tim.c.chen@...ux.intel.com>, Chen Yu <yu.c.chen@...el.com>,
 linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Roman Gushchin <roman.gushchin@...ux.dev>
Subject: Re: [PATCH] mm/readahead: Skip fully overlapped range

On Tue, 30 Sep 2025 13:35:43 +0800 Aubrey Li <aubrey.li@...ux.intel.com> wrote:

> file_ra_state is considered a performance hint, not a critical correctness
> field. The race conditions on file's readahead state don't affect the
> correctness of file I/O because later the page cache mechanisms ensure data
> consistency, it won't cause wrong data to be read. I think that's why we do
> not lock file_ra_state today, to avoid performance penalties on this hot path.
> 
> That said, this patch didn't make things worse, and it does take a risk but
> brings the rewards of RocksDB's readseq benchmark.

So if I may summarize:

- you've identifed and addressed an issue with concurrent readahead
  against an fd

- Jan points out that we don't properly handle concurrent access to a
  file's ra_state.  This is somewhat offtopic, but we should address
  this sometime anyway.  Then we can address the RocksDB issue later.

Alternatively, we could fix this issue right now and let the
concurrency fixes come later.  Not as pretty, but it's practical.

Another practicality: improving a benchmark is nice, but do we have any
reasons to believe that this change will improve any real-world
workload?  If so, which and by how much?