linux-kernel - Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <at4ojyziprhhktjgtfmuyzrqwfmomnly6fubkvmbtxkdnx6hpb@5nldc3vipwny>
Date: Mon, 14 Jul 2025 17:16:51 +0200
From: Jan Kara <jack@...e.cz>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>, 
	Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Liu Shixin <liushixin2@...wei.com>
Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of
 thrashing

On Thu 10-07-25 12:52:32, Roman Gushchin wrote:
> We've noticed in production that under a very heavy memory pressure
> the readahead behavior becomes unstable causing spikes in memory
> pressure and CPU contention on zone locks.
> 
> The current mmap_miss heuristics considers minor pagefaults as a
> good reason to decrease mmap_miss and conditionally start async
> readahead. This creates a vicious cycle: asynchronous readahead
> loads more pages, which in turn causes more minor pagefaults.
> This problem is especially pronounced when multiple threads of
> an application fault on consecutive pages of an evicted executable,
> aggressively lowering the mmap_miss counter and preventing readahead
> from being disabled.

I think you're talking about filemap_map_pages() logic of handling
mmap_miss. It would be nice to mention it in the changelog. There's one
thing that doesn't quite make sense to me: When there's memory pressure,
I'd expect the pages to be reclaimed from memory and not just unmapped. 
Also given your solution uses !uptodate folios suggests the pages were
actually fully reclaimed and the problem really is that filemap_map_pages()
treats as minor page fault (i.e., cache hit) what is in fact a major page
fault (i.e., cache miss)?

Actually, now that I digged deeper I've remembered that based on Liu
Shixin's report
(https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/)
which sounds a lot like what you're reporting, we have eventually merged his
fixes (ended up as commits 0fd44ab213bc ("mm/readahead: break read-ahead
loop if filemap_add_folio return -ENOMEM"), 5c46d5319bde ("mm/filemap:
don't decrease mmap_miss when folio has workingset flag")). Did you test a
kernel with these fixes (6.10 or later)? In particular after these fixes
the !folio_test_workingset() check in filemap_map_folio_range() and
filemap_map_order0_folio() should make sure we don't decrease mmap_miss
when faulting fresh pages. Or was in your case page evicted so long ago
that workingset bit is already clear?

Once we better understand the situation, let me also mention that I have
two patches which I originally proposed to fix Liu's problems. They didn't
quite fix them so his patches got merged in the end but the problems
described there are still somewhat valid:

    mm/readahead: Improve page readaround miss detection

    filemap_map_pages() decreases ra->mmap_miss for every page it maps. This
    however overestimates number of real cache hits because we have no idea
    whether the application will use the pages we map or not. This is
    problematic in particular in memory constrained situations where we
    think we have great readahead success rate although in fact we are just
    trashing page cache & disk. Change filemap_map_pages() to count only
    success of mapping the page we are faulting in. This should be actually
    enough to keep mmap_miss close to 0 for workloads doing sequential reads
    because filemap_map_pages() does not map page with readahead flag and
    thus these are going to contribute to decreasing the mmap_miss counter.

    Fixes: f1820361f83d ("mm: implement ->map_pages for page cache")

-
    mm/readahead: Fix readahead miss detection with FAULT_FLAG_RETRY_NOWAIT

    When the page fault happens with FAULT_FLAG_RETRY_NOWAIT (which is
    common) we will bail out of the page fault after issuing reads and retry
    the fault. That will then find the created pages in filemap_map_pages()
    and hence will be treated as cache hit canceling out the cache miss in
    do_sync_mmap_readahead(). Increment mmap_miss by two in
    do_sync_mmap_readahead() in case FAULT_FLAG_RETRY_NOWAIT is set to
    account for the following expected hit. If the page gets evicted even
    before we manage to retry the fault, we are under so heavy memory
    pressure that increasing mmap_miss by two is fine.

    Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer")

In particular the second problem described could still lead to mmap_miss
not growing as fast as it should so maybe it would be worth reviving it.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR