linux-kernel - Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <afad3159-9aa8-e052-3bef-d00dee1ba51e@shipmail.org>
Date:   Thu, 25 Mar 2021 12:53:15 +0100
From:   Thomas Hellström (Intel) 
        <thomas_os@...pmail.org>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Christian König <christian.koenig@....com>,
        David Airlie <airlied@...ux.ie>, linux-kernel@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 1/2] mm,drm/ttm: Block fast GUP to TTM huge pages


On 3/25/21 12:30 PM, Jason Gunthorpe wrote:
> On Thu, Mar 25, 2021 at 10:51:35AM +0100, Thomas Hellström (Intel) wrote:
>
>>> Please explain that further. Why do we need the mmap lock to insert PMDs
>>> but not when insert PTEs?
>> We don't. But once you've inserted a PMD directory you can't remove it
>> unless you have the mmap lock (and probably also the i_mmap_lock in write
>> mode). That for example means that if you have a VRAM region mapped with
>> huge PMDs, and then it gets evicted, and you happen to read a byte from it
>> when it's evicted and therefore populate the full region with PTEs pointing
>> to system pages, you can't go back to huge PMDs again without a munmap() in
>> between.
> This is all basically magic to me still, but THP does this
> transformation and I think what it does could work here too. We
> probably wouldn't be able to upgrade while handling fault, but at the
> same time, this should be quite rare as it would require the driver to
> have supplied a small page for this VMA at some point.

IIRC THP handles this using khugepaged, grabbing the lock in write mode 
when coalescing, and yeah, I don't think anything prevents anyone from 
extending khugepaged doing that also for special huge page table entries.

>
>>> Apart from that I still don't fully get why we need this in the first
>>> place.
>> Because virtual huge page address boundaries need to be aligned with
>> physical huge page address boundaries, and mmap can happen before bos are
>> populated so you have no way of knowing how physical huge page
>> address
> But this is a mmap-time problem, fault can't fix mmap using the wrong VA.

Nope. The point here was that in this case, to make sure mmap uses the 
correct VA to give us a reasonable chance of alignement, the driver 
might need to be aware of and do trickery with the huge page-table-entry 
sizes anyway, although I think in most cases a standard helper for this 
can be supplied.

/Thomas


>
>>> I really don't see that either. When a buffer is accessed by the CPU it
>>> is in > 90% of all cases completely accessed. Not faulting in full
>>> ranges is just optimizing for a really unlikely case here.
>> It might be that you're right, but are all drivers wanting to use this like
>> drm in this respect? Using the interface to fault in a 1G range in the hope
>> it could map it to a huge pud may unexpectedly consume and populate some 16+
>> MB of page tables.
> If the underlying device block size is so big then sure, why not? The
> "unexpectedly" should be quite rare/non existant anyhow.
>
> Jason
>