lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <180e9c2f-51fe-44ba-ac68-5aa7b7918ab0@redhat.com>
Date: Thu, 30 Jan 2025 10:01:56 +0100
From: David Hildenbrand <david@...hat.com>
To: Alistair Popple <apopple@...dia.com>
Cc: linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, linux-mm@...ck.org,
 nouveau@...ts.freedesktop.org, Andrew Morton <akpm@...ux-foundation.org>,
 Jérôme Glisse <jglisse@...hat.com>,
 Jonathan Corbet <corbet@....net>, Alex Shi <alexs@...nel.org>,
 Yanteng Si <si.yanteng@...ux.dev>, Karol Herbst <kherbst@...hat.com>,
 Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
 Pasha Tatashin <pasha.tatashin@...een.com>, Peter Xu <peterx@...hat.com>,
 Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using
 folio_walk instead of rmap walk

On 30.01.25 07:11, Alistair Popple wrote:
> On Wed, Jan 29, 2025 at 12:54:02PM +0100, David Hildenbrand wrote:
>> We require a writable PTE and only support anonymous folio: we can only
>> have exactly one PTE pointing at that page, which we can just lookup
>> using a folio walk, avoiding the rmap walk and the anon VMA lock.
>>
>> So let's stop doing an rmap walk and perform a folio walk instead, so we
>> can easily just modify a single PTE and avoid relying on rmap/mapcounts.
>>
>> We now effectively work on a single PTE instead of multiple PTEs of
>> a large folio, allowing for conversion of individual PTEs from
>> non-exclusive to device-exclusive -- note that the other way always
>> worked on single PTEs.
>>
>> We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why
>> that is not required: GUP will already take care of the
>> MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive
>> entry) when not finding a present PTE and having to trigger a fault and
>> ending up in remove_device_exclusive_entry().
> 
> I will have to look at this a bit more closely tomorrow but this doesn't seem
> right to me. We may be transitioning from a present PTE (ie. a writable
> anonymous mapping) to a non-present PTE (ie. a device-exclusive entry) and
> therefore any secondary processors (eg. other GPUs, iommus, etc.) will need to
> update their copies of the PTE. So I think the notifier call is needed.

Then it is all very confusing:

"MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no 
longer have exclusive access to the page."

That's simply not true in the scenario you describe, because nobody had 
exclusive access.

But what you are saying is, that we need to inform others (e.g., KVM) 
that we are converting it to a device-exclusive entry, such that they 
stop accessing it.

That makes sense to me (and the cleanup patch in the cleanup series 
would have to go as well to prevent the livelock).

So we would have to update the documentation of MMU_NOTIFY_EXCLUSIVE 
that it is also trigger on conversion from non-exclusive to exclusive.

Does that make sense?

Thanks!

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ