[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2659a0bc-b5a7-43e0-b565-fcb93e4ea2b7@redhat.com>
Date: Tue, 6 Aug 2024 16:40:12 +0200
From: David Hildenbrand <david@...hat.com>
To: Qi Zheng <zhengqi.arch@...edance.com>, hughd@...gle.com,
willy@...radead.org, mgorman@...e.de, muchun.song@...ux.dev,
vbabka@...nel.org, akpm@...ux-foundation.org, zokeefe@...gle.com,
rientjes@...gle.com
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 4/7] mm: pgtable: try to reclaim empty PTE pages in
zap_page_range_single()
On 05.08.24 14:55, Qi Zheng wrote:
> Now in order to pursue high performance, applications mostly use some
> high-performance user-mode memory allocators, such as jemalloc or
> tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE)
> to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will
> release page table memory, which may cause huge page table memory usage.
>
> The following are a memory usage snapshot of one process which actually
> happened on our server:
>
> VIRT: 55t
> RES: 590g
> VmPTE: 110g
>
> In this case, most of the page table entries are empty. For such a PTE
> page where all entries are empty, we can actually free it back to the
> system for others to use.
>
> As a first step, this commit attempts to synchronously free the empty PTE
> pages in zap_page_range_single() (MADV_DONTNEED etc will invoke this). In
> order to reduce overhead, we only handle the cases with a high probability
> of generating empty PTE pages, and other cases will be filtered out, such
> as:
It doesn't make particular sense during munmap() where we will just
remove the page tables manually directly afterwards. We should limit it
to the !munmap case -- in particular MADV_DONTNEED.
To minimze the added overhead, I further suggest to only try reclaim
asynchronously if we know that likely all ptes will be none, that is,
when we just zapped *all* ptes of a PTE page table -- our range spans
the complete PTE page table.
Just imagine someone zaps a single PTE, we really don't want to start
scanning page tables and involve an (rather expensive) walk_page_range
just to find out that there is still something mapped.
Last but not least, would there be a way to avoid the walk_page_range()
and simply trigger it from zap_pte_range(), possibly still while holding
the PTE table lock?
We might have to trylock the PMD, but that should be doable.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists