[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54a55ff7-38c8-42c2-886f-d6d1985072a9@redhat.com>
Date: Thu, 30 Jan 2025 10:47:29 +0100
From: David Hildenbrand <david@...hat.com>
To: Alistair Popple <apopple@...dia.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linux-mm@...ck.org, nouveau@...ts.freedesktop.org,
Andrew Morton <akpm@...ux-foundation.org>, Jérôme Glisse
<jglisse@...hat.com>, Jonathan Corbet <corbet@....net>,
Alex Shi <alexs@...nel.org>, Yanteng Si <si.yanteng@...ux.dev>,
Karol Herbst <kherbst@...hat.com>, Lyude Paul <lyude@...hat.com>,
Danilo Krummrich <dakr@...nel.org>, David Airlie <airlied@...il.com>,
Simona Vetter <simona@...ll.ch>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pasha Tatashin <pasha.tatashin@...een.com>, Peter Xu <peterx@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using
folio_walk instead of rmap walk
On 30.01.25 10:40, Simona Vetter wrote:
> On Thu, Jan 30, 2025 at 05:11:49PM +1100, Alistair Popple wrote:
>> On Wed, Jan 29, 2025 at 12:54:02PM +0100, David Hildenbrand wrote:
>>> We require a writable PTE and only support anonymous folio: we can only
>>> have exactly one PTE pointing at that page, which we can just lookup
>>> using a folio walk, avoiding the rmap walk and the anon VMA lock.
>>>
>>> So let's stop doing an rmap walk and perform a folio walk instead, so we
>>> can easily just modify a single PTE and avoid relying on rmap/mapcounts.
>>>
>>> We now effectively work on a single PTE instead of multiple PTEs of
>>> a large folio, allowing for conversion of individual PTEs from
>>> non-exclusive to device-exclusive -- note that the other way always
>>> worked on single PTEs.
>>>
>>> We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why
>>> that is not required: GUP will already take care of the
>>> MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive
>>> entry) when not finding a present PTE and having to trigger a fault and
>>> ending up in remove_device_exclusive_entry().
>>
>> I will have to look at this a bit more closely tomorrow but this doesn't seem
>> right to me. We may be transitioning from a present PTE (ie. a writable
>> anonymous mapping) to a non-present PTE (ie. a device-exclusive entry) and
>> therefore any secondary processors (eg. other GPUs, iommus, etc.) will need to
>> update their copies of the PTE. So I think the notifier call is needed.
>
> I guess this is a question of semantics we want, for multiple gpus do we
> require that device-exclusive also excludes other gpus or not. I'm leaning
> towards agreeing with you here.
See my reply, it's also relevant for non-device, such as KVM. So it's
the right thing to do.
>
>>> Note that the PTE is
>>> always writable, and we can always create a writable-device-exclusive
>>> entry.
>>>
>>> With this change, device-exclusive is fully compatible with THPs /
>>> large folios. We still require PMD-sized THPs to get PTE-mapped, and
>>> supporting PMD-mapped THP (without the PTE-remapping) is a different
>>> endeavour that might not be worth it at this point.
>
> I'm not sure we actually want hugepages for device exclusive, since it has
> an impact on what's allowed and what not. If we only ever do 4k entries
> then userspace can assume that as long atomics are separated by a 4k page
> there's no issue when both the gpu and cpu hammer on them. If we try to
> keep thp entries then suddenly a workload that worked before will result
> in endless ping-pong between gpu and cpu because the separate atomic
> counters (or whatever) now all sit in the same 2m page.
Agreed. And the conversion + mapping into the device gets trickier.
>
> So going with thp might result in userspace having to spread out atomics
> even more, which is just wasting memory and not saving any tlb entries
> since often you don't need that many.
>
> tldr; I think not supporting thp entries for device exclusive is a
> feature, not a bug.
So, you agree with my "different endeavour that might not be worth it"
statement?
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists