[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87plcjxry2.fsf@linux.dev>
Date: Mon, 25 Aug 2025 09:54:29 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, "Matthew Wilcox (Oracle)"
<willy@...radead.org>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] mm: readahead: improve mmap_miss heuristic for
concurrent faults
Mateusz Guzik <mjguzik@...il.com> writes:
> On Fri, Aug 15, 2025 at 11:32:24AM -0700, Roman Gushchin wrote:
>> If two or more threads of an application faulting on the same folio,
>> the mmap_miss counter can be decreased multiple times. It breaks the
>> mmap_miss heuristic and keeps the readahead enabled even under extreme
>> levels of memory pressure.
>>
>> It happens often if file folios backing a multi-threaded application
>> are getting evicted and re-faulted.
>>
>> Fix it by skipping decreasing mmap_miss if the folio is locked.
>>
>> This change was evaluated on several hundred thousands hosts in Google's
>> production over a couple of weeks. The number of containers being
>> stuck in a vicious reclaim cycle for a long time was reduced several
>> fold (~10-20x), as well as the overall fleet-wide cpu time spent in
>> direct memory reclaim was meaningfully reduced. No regressions were
>> observed.
>>
>> Signed-off-by: Roman Gushchin <roman.gushchin@...ux.dev>
>> Cc: Matthew Wilcox (Oracle) <willy@...radead.org>
>> Cc: Jan Kara <jack@...e.cz>
>> Cc: linux-mm@...ck.org
>> ---
>> mm/filemap.c | 14 +++++++++++---
>> 1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index c21e98657e0b..983ba1019674 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3324,9 +3324,17 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
>> if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
>> return fpin;
>>
>> - mmap_miss = READ_ONCE(ra->mmap_miss);
>> - if (mmap_miss)
>> - WRITE_ONCE(ra->mmap_miss, --mmap_miss);
>> + /*
>> + * If the folio is locked, we're likely racing against another fault.
>> + * Don't touch the mmap_miss counter to avoid decreasing it multiple
>> + * times for a single folio and break the balance with mmap_miss
>> + * increase in do_sync_mmap_readahead().
>> + */
>> + if (likely(!folio_test_locked(folio))) {
>> + mmap_miss = READ_ONCE(ra->mmap_miss);
>> + if (mmap_miss)
>> + WRITE_ONCE(ra->mmap_miss, --mmap_miss);
>> + }
>
> I'm not an mm person.
>
> The comment implies the change fixes the race, but it is not at all
> clear to me how.
>
> Does it merely make it significantly less likely?
It's not fixing any race, it's fixing the imbalance in the upward and
downward pressure on the mmap_miss variable. This improves the readahead
behavior under very special circumstances: a multi-threaded application
under very heavy memory pressure. There should be no visible difference
in behavior in other cases.
Thanks!
Powered by blists - more mailing lists