[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b7e71e8-4e31-4699-b656-c35dce678a80@redhat.com>
Date: Mon, 8 Sep 2025 22:32:22 +0200
From: David Hildenbrand <david@...hat.com>
To: Anthony Yznaga <anthony.yznaga@...cle.com>, linux-mm@...ck.org
Cc: akpm@...ux-foundation.org, andreyknvl@...il.com, arnd@...db.de,
bp@...en8.de, brauner@...nel.org, bsegall@...gle.com, corbet@....net,
dave.hansen@...ux.intel.com, dietmar.eggemann@....com,
ebiederm@...ssion.com, hpa@...or.com, jakub.wartak@...lbox.org,
jannh@...gle.com, juri.lelli@...hat.com, khalid@...nel.org,
liam.howlett@...cle.com, linyongting@...edance.com,
lorenzo.stoakes@...cle.com, luto@...nel.org, markhemm@...glemail.com,
maz@...nel.org, mhiramat@...nel.org, mgorman@...e.de, mhocko@...e.com,
mingo@...hat.com, muchun.song@...ux.dev, neilb@...e.de, osalvador@...e.de,
pcc@...gle.com, peterz@...radead.org, pfalcato@...e.de, rostedt@...dmis.org,
rppt@...nel.org, shakeel.butt@...ux.dev, surenb@...gle.com,
tglx@...utronix.de, vasily.averin@...ux.dev, vbabka@...e.cz,
vincent.guittot@...aro.org, viro@...iv.linux.org.uk, vschneid@...hat.com,
willy@...radead.org, x86@...nel.org, xhao@...ux.alibaba.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org
Subject: Re: [PATCH v3 00/22] Add support for shared PTEs across processes
On 20.08.25 03:03, Anthony Yznaga wrote:
> Memory pages shared between processes require page table entries
> (PTEs) for each process. Each of these PTEs consume some of
> the memory and as long as the number of mappings being maintained
> is small enough, this space consumed by page tables is not
> objectionable. When very few memory pages are shared between
> processes, the number of PTEs to maintain is mostly constrained by
> the number of pages of memory on the system. As the number of shared
> pages and the number of times pages are shared goes up, amount of
> memory consumed by page tables starts to become significant. This
> issue does not apply to threads. Any number of threads can share the
> same pages inside a process while sharing the same PTEs. Extending
> this same model to sharing pages across processes can eliminate this
> issue for sharing across processes as well.
>
> Some of the field deployments commonly see memory pages shared
> across 1000s of processes. On x86_64, each page requires a PTE that
> is 8 bytes long which is very small compared to the 4K page
> size. When 2000 processes map the same page in their address space,
> each one of them requires 8 bytes for its PTE and together that adds
> up to 8K of memory just to hold the PTEs for one 4K page. On a
> database server with 300GB SGA, a system crash was seen with
> out-of-memory condition when 1500+ clients tried to share this SGA
> even though the system had 512GB of memory. On this server, in the
> worst case scenario of all 1500 processes mapping every page from
> SGA would have required 878GB+ for just the PTEs. If these PTEs
> could be shared, the a substantial amount of memory saved.
>
> This patch series implements a mechanism that allows userspace
> processes to opt into sharing PTEs. It adds a new in-memory
> filesystem - msharefs. A file created on msharefs represents a
> shared region where all processes mapping that region will map
> objects within it with shared PTEs. When the file is created,
> a new host mm struct is created to hold the shared page tables
> and vmas for objects later mapped into the shared region. This
> host mm struct is associated with the file and not with a task.
> When a process mmap's the shared region, a vm flag VM_MSHARE
> is added to the vma. On page fault the vma is checked for the
> presence of the VM_MSHARE flag. If found, the host mm is
> searched for a vma that covers the fault address. Fault handling
> then continues using that host vma which establishes PTEs in the
> host mm. Fault handling in a shared region also links the shared
> page table to the process page table if the shared page table
> already exists.
Regarding the overall design, two important questions:
In the context of this series, how do we handle VMA-modifying functions
like mprotect/some madvise/mlock/mempolicy/...? Are they currently
blocked when applied to a mshare VMA?
And how are we handling other page table walkers that don't modify VMAs
like MADV_DONTNEED, smaps, migrate_pages, ... etc?
--
Cheers
David / dhildenb
Powered by blists - more mailing lists