[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a26e29c-d889-450a-a5e1-ce671f09e4c8@redhat.com>
Date: Mon, 28 Apr 2025 18:16:21 +0200
From: David Hildenbrand <david@...hat.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-trace-kernel@...r.kernel.org, Dave Hansen
<dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>, Tvrtko Ursulin
<tursulin@...ulin.net>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>, Andrew Morton <akpm@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>
Subject: Re: [PATCH v1 05/11] mm: convert VM_PFNMAP tracking to pfnmap_track()
+ pfnmap_untrack()
On 28.04.25 18:08, Peter Xu wrote:
> On Fri, Apr 25, 2025 at 10:36:55PM +0200, David Hildenbrand wrote:
>> On 25.04.25 22:23, Peter Xu wrote:
>>> On Fri, Apr 25, 2025 at 10:17:09AM +0200, David Hildenbrand wrote:
>>>> Let's use our new interface. In remap_pfn_range(), we'll now decide
>>>> whether we have to track (full VMA covered) or only sanitize the pgprot
>>>> (partial VMA covered).
>>>>
>>>> Remember what we have to untrack by linking it from the VMA. When
>>>> duplicating VMAs (e.g., splitting, mremap, fork), we'll handle it similar
>>>> to anon VMA names, and use a kref to share the tracking.
>>>>
>>>> Once the last VMA un-refs our tracking data, we'll do the untracking,
>>>> which simplifies things a lot and should sort our various issues we saw
>>>> recently, for example, when partially unmapping/zapping a tracked VMA.
>>>>
>>>> This change implies that we'll keep tracking the original PFN range even
>>>> after splitting + partially unmapping it: not too bad, because it was
>>>> not working reliably before. The only thing that kind-of worked before
>>>> was shrinking such a mapping using mremap(): we managed to adjust the
>>>> reservation in a hacky way, now we won't adjust the reservation but
>>>> leave it around until all involved VMAs are gone.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@...hat.com>
>>>> ---
>>>> include/linux/mm_inline.h | 2 +
>>>> include/linux/mm_types.h | 11 ++++++
>>>> kernel/fork.c | 54 ++++++++++++++++++++++++--
>>>> mm/memory.c | 81 +++++++++++++++++++++++++++++++--------
>>>> mm/mremap.c | 4 --
>>>> 5 files changed, 128 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
>>>> index f9157a0c42a5c..89b518ff097e6 100644
>>>> --- a/include/linux/mm_inline.h
>>>> +++ b/include/linux/mm_inline.h
>>>> @@ -447,6 +447,8 @@ static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1,
>>>> #endif /* CONFIG_ANON_VMA_NAME */
>>>> +void pfnmap_track_ctx_release(struct kref *ref);
>>>> +
>>>> static inline void init_tlb_flush_pending(struct mm_struct *mm)
>>>> {
>>>> atomic_set(&mm->tlb_flush_pending, 0);
>>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>>> index 56d07edd01f91..91124761cfda8 100644
>>>> --- a/include/linux/mm_types.h
>>>> +++ b/include/linux/mm_types.h
>>>> @@ -764,6 +764,14 @@ struct vma_numab_state {
>>>> int prev_scan_seq;
>>>> };
>>>> +#ifdef __HAVE_PFNMAP_TRACKING
>>>> +struct pfnmap_track_ctx {
>>>> + struct kref kref;
>>>> + unsigned long pfn;
>>>> + unsigned long size;
>>>> +};
>>>> +#endif
>>>> +
>>>> /*
>>>> * This struct describes a virtual memory area. There is one of these
>>>> * per VM-area/task. A VM area is any part of the process virtual memory
>>>> @@ -877,6 +885,9 @@ struct vm_area_struct {
>>>> struct anon_vma_name *anon_name;
>>>> #endif
>>>> struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
>>>> +#ifdef __HAVE_PFNMAP_TRACKING
>>>> + struct pfnmap_track_ctx *pfnmap_track_ctx;
>>>> +#endif
>>>
>>> So this was originally the small concern (or is it small?) that this will
>>> grow every vma on x86, am I right?
>>
>> Yeah, and last time I looked into this, it would have grown it such that it would
>> require a bigger slab. Right now:
>
> Probably due to what config you have. E.g., when I'm looking mine it's
> much bigger and already consuming 256B, but it's because I enabled more
> things (userfaultfd, lockdep, etc.).
Note that I enabled everything that you would expect on a production
system (incld. userfaultfd, mempolicy, per-vma locks), so I didn't
enable lockep.
Thanks for verifying!
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists