[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81080764-7c94-463f-80d3-e3b2968ddf5f@redhat.com>
Date: Fri, 16 Aug 2024 11:30:31 +0200
From: David Hildenbrand <david@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Sean Christopherson <seanjc@...gle.com>,
Oscar Salvador <osalvador@...e.de>, Axel Rasmussen
<axelrasmussen@...gle.com>, linux-arm-kernel@...ts.infradead.org,
x86@...nel.org, Will Deacon <will@...nel.org>, Gavin Shan
<gshan@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>,
Zi Yan <ziy@...dia.com>, Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>, Ingo Molnar <mingo@...hat.com>,
Alistair Popple <apopple@...dia.com>, Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>, Yan Zhao <yan.y.zhao@...el.com>
Subject: Re: [PATCH 06/19] mm/pagewalk: Check pfnmap early for
folio_walk_start()
On 14.08.24 15:05, Jason Gunthorpe wrote:
> On Fri, Aug 09, 2024 at 07:25:36PM +0200, David Hildenbrand wrote:
>
>>>> That is in general not what we want, and we still have some places that
>>>> wrongly hard-code that behavior.
>>>>
>>>> In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.
>>>>
>>>> vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
>>>> vm_normal_page_pud()] should be able to identify PFN maps and reject them,
>>>> no?
>>>
>>> Yep, I think we can also rely on special bit.
>
> It is more than just relying on the special bit..
>
> VM_PFNMAP/VM_MIXEDMAP should really only be used inside
> vm_normal_page() because thay are, effectively, support for a limited
> emulation of the special bit on arches that don't have them. There are
> a bunch of weird rules that are used to try and make that work
> properly that have to be followed.
>
> On arches with the sepcial bit they should possibly never be checked
> since the special bit does everything you need.
>
> Arguably any place reading those flags out side of vm_normal_page/etc
> is suspect.
IIUC, your opinion matches mine: VM_PFNMAP/VM_MIXEDMAP and
pte_special()/... usage should be limited to
vm_normal_page/vm_normal_page_pmd/ ... of course, GUP-fast is special
(one of the reason for "pte_special()" and friends after all).
>
>>> Here I chose to follow gup-slow, and I suppose you meant that's also wrong?
>>
>> I assume just nobody really noticed, just like nobody noticed that
>> walk_page_test() skips VM_PFNMAP (but not VM_IO :) ).
>
> Like here..
>
>>> And, just curious: is there any use case you're aware of that can benefit
>>> from caring PRIVATE pfnmaps yet so far, especially in this path?
>>
>> In general MAP_PRIVATE pfnmaps is not really useful on things like MMIO.
>>
>> There was a discussion (in VM_PAT) some time ago whether we could remove
>> MAP_PRIVATE PFNMAPs completely [1]. At least some users still use COW
>> mappings on /dev/mem, although not many (and they might not actually write
>> to these areas).
>
> I've squashed many bugs where kernel drivers don't demand userspace
> use MAP_SHARED when asking for a PFNMAP, and of course userspace has
> gained the wrong flags too. I don't know if anyone needs this, but it
> has crept wrongly into the API.
>
> Maybe an interesting place to start is a warning printk about using an
> obsolete feature and see where things go from there??
Maybe we should start with some way to pr_warn_ONCE() whenever we get a
COW/unshare-fault in such a MAP_PRIVATE mapping, and essentially
populate the fresh anon folio.
Then we don't only know who mmaps() something like that, but who
actually relies on getting anon folios in there.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists