linux-kernel - Re: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <676ee47d-8ca0-94c4-7454-46e9915ea36a@redhat.com>
Date:   Tue, 13 Jun 2023 19:51:05 +0200
From:   David Hildenbrand <david@...hat.com>
To:     "Kasireddy, Vivek" <vivek.kasireddy@...el.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
        Hugh Dickins <hughd@...gle.com>
Cc:     Gerd Hoffmann <kraxel@...hat.com>,
        "Kim, Dongwon" <dongwon.kim@...el.com>,
        "Chang, Junxiao" <junxiao.chang@...el.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "Hocko, Michal" <mhocko@...e.com>,
        "jmarchan@...hat.com" <jmarchan@...hat.com>,
        "muchun.song@...ux.dev" <muchun.song@...ux.dev>,
        James Houghton <jthoughton@...gle.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'

On 13.06.23 10:26, Kasireddy, Vivek wrote:
> Hi David,
> 
>>
>> On 12.06.23 09:10, Kasireddy, Vivek wrote:
>>> Hi Mike,
>>
>> Hi Vivek,
>>
>>>
>>> Sorry for the late reply; I just got back from vacation.
>>> If it is unsafe to directly use the subpages of a hugetlb page, then reverting
>>> this patch seems like the only option for addressing this issue immediately.
>>> So, this patch is
>>> Acked-by: Vivek Kasireddy <vivek.kasireddy@...el.com>
>>>
>>> As far as the use-case is concerned, there are two main users of the
>> udmabuf
>>> driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only
>> one
>>> that uses hugetlb pages (when hugetlb=on is set) as the backing store for
>>> Guest (Linux, Android and Windows) system memory. The main goal is to
>>> share the pages associated with the Guest allocated framebuffer (FB) with
>>> the Host GPU driver and other components in a zero-copy way. To that
>> end,
>>> the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with
>>> the FB) and pins them before sharing the (guest) physical (or dma)
>> addresses
>>> (and lengths) with Qemu. Qemu then translates the addresses into file
>>> offsets and shares these offsets with udmabuf.
>>
>> Is my understanding correct, that we can effectively long-term pin
>> (worse than mlock) 64 MiB per UDMABUF_CREATE, allowing eventually !root
> The 64 MiB limit is the theoretical upper bound that we have not seen hit in
> practice. Typically, for a 1920x1080 resolution (commonly used in Guests),
> the size of the FB is ~8 MB (1920x1080x4). And, most modern Graphics
> compositors flip between two FBs.
> 

Okay, but users with privileges to open that file can just create as 
many as they want? I think I'll have to play with it.

>> users
>>
>> ll /dev/udmabuf
>> crw-rw---- 1 root kvm 10, 125 12. Jun 08:12 /dev/udmabuf
>>
>> to bypass there effective MEMLOCK limit, fragmenting physical memory and
>> breaking swap?
> Right, it does not look like the mlock limits are honored.
> 

That should be added.

>>
>> Regarding the udmabuf_vm_fault(), I assume we're mapping pages we
>> obtained from the memfd ourselves into a special VMA (mmap() of the
> mmap operation is really needed only if any component on the Host needs
> CPU access to the buffer. But in most scenarios, we try to ensure direct GPU
> access (h/w acceleration via gl) to these pages.
> 
>> udmabuf). I'm not sure how well shmem pages are prepared for getting
>> mapped by someone else into an arbitrary VMA (page->index?).
> Most drm/gpu drivers use shmem pages as the backing store for FBs and
> other buffers and also provide mmap capability. What concerns do you see
> with this approach?

Are these mmaping the pages the way udmabuf maps these pages (IOW, 
on-demand fault where we core-mm will adjust the mapcount etc)?

Skimming over at shmem_read_mapping_page() users, I assume most of them 
use a VM_PFNMAP mapping (or don't mmap them at all), where we won't be 
messing with the struct page at all.

(That might even allow you to mmap hugetlb sub-pages, because the struct 
page -- and mapcount -- will be ignored completely and not touched.)

> 
>>
>> ... also, just imagine someone doing FALLOC_FL_PUNCH_HOLE / ftruncate()
>> on the memfd. What's mapped into the memfd no longer corresponds to
>> what's pinned / mapped into the VMA.
> IIUC, making use of the DMA_BUF_IOCTL_SYNC ioctl would help with any
> coherency issues:
> https://www.kernel.org/doc/html/v6.2/driver-api/dma-buf.html#c.dma_buf_sync
> 

Would it as of now? udmabuf_create() pulls the shmem pages out of the 
memfd, not sure how DMA_BUF_IOCTL_SYNC would help to update that 
whenever the pages inside the memfd would change (e.g., 
FALLOC_FL_PUNCH_HOLE + realloc).

But that's most probably simply "not supported".

>>
>>
>> Was linux-mm (and especially shmem maintainers, ccing Hugh) involved in
>> the upstreaming of udmabuf?
> It does not appear so from the link below although other key lists were cc'd:
> https://patchwork.freedesktop.org/patch/246100/?series=39879&rev=7

That's unfortunate :(

-- 
Cheers,

David / dhildenb