[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <676ee47d-8ca0-94c4-7454-46e9915ea36a@redhat.com>
Date: Tue, 13 Jun 2023 19:51:05 +0200
From: David Hildenbrand <david@...hat.com>
To: "Kasireddy, Vivek" <vivek.kasireddy@...el.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
Hugh Dickins <hughd@...gle.com>
Cc: Gerd Hoffmann <kraxel@...hat.com>,
"Kim, Dongwon" <dongwon.kim@...el.com>,
"Chang, Junxiao" <junxiao.chang@...el.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"Hocko, Michal" <mhocko@...e.com>,
"jmarchan@...hat.com" <jmarchan@...hat.com>,
"muchun.song@...ux.dev" <muchun.song@...ux.dev>,
James Houghton <jthoughton@...gle.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'
On 13.06.23 10:26, Kasireddy, Vivek wrote:
> Hi David,
>
>>
>> On 12.06.23 09:10, Kasireddy, Vivek wrote:
>>> Hi Mike,
>>
>> Hi Vivek,
>>
>>>
>>> Sorry for the late reply; I just got back from vacation.
>>> If it is unsafe to directly use the subpages of a hugetlb page, then reverting
>>> this patch seems like the only option for addressing this issue immediately.
>>> So, this patch is
>>> Acked-by: Vivek Kasireddy <vivek.kasireddy@...el.com>
>>>
>>> As far as the use-case is concerned, there are two main users of the
>> udmabuf
>>> driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only
>> one
>>> that uses hugetlb pages (when hugetlb=on is set) as the backing store for
>>> Guest (Linux, Android and Windows) system memory. The main goal is to
>>> share the pages associated with the Guest allocated framebuffer (FB) with
>>> the Host GPU driver and other components in a zero-copy way. To that
>> end,
>>> the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with
>>> the FB) and pins them before sharing the (guest) physical (or dma)
>> addresses
>>> (and lengths) with Qemu. Qemu then translates the addresses into file
>>> offsets and shares these offsets with udmabuf.
>>
>> Is my understanding correct, that we can effectively long-term pin
>> (worse than mlock) 64 MiB per UDMABUF_CREATE, allowing eventually !root
> The 64 MiB limit is the theoretical upper bound that we have not seen hit in
> practice. Typically, for a 1920x1080 resolution (commonly used in Guests),
> the size of the FB is ~8 MB (1920x1080x4). And, most modern Graphics
> compositors flip between two FBs.
>
Okay, but users with privileges to open that file can just create as
many as they want? I think I'll have to play with it.
>> users
>>
>> ll /dev/udmabuf
>> crw-rw---- 1 root kvm 10, 125 12. Jun 08:12 /dev/udmabuf
>>
>> to bypass there effective MEMLOCK limit, fragmenting physical memory and
>> breaking swap?
> Right, it does not look like the mlock limits are honored.
>
That should be added.
>>
>> Regarding the udmabuf_vm_fault(), I assume we're mapping pages we
>> obtained from the memfd ourselves into a special VMA (mmap() of the
> mmap operation is really needed only if any component on the Host needs
> CPU access to the buffer. But in most scenarios, we try to ensure direct GPU
> access (h/w acceleration via gl) to these pages.
>
>> udmabuf). I'm not sure how well shmem pages are prepared for getting
>> mapped by someone else into an arbitrary VMA (page->index?).
> Most drm/gpu drivers use shmem pages as the backing store for FBs and
> other buffers and also provide mmap capability. What concerns do you see
> with this approach?
Are these mmaping the pages the way udmabuf maps these pages (IOW,
on-demand fault where we core-mm will adjust the mapcount etc)?
Skimming over at shmem_read_mapping_page() users, I assume most of them
use a VM_PFNMAP mapping (or don't mmap them at all), where we won't be
messing with the struct page at all.
(That might even allow you to mmap hugetlb sub-pages, because the struct
page -- and mapcount -- will be ignored completely and not touched.)
>
>>
>> ... also, just imagine someone doing FALLOC_FL_PUNCH_HOLE / ftruncate()
>> on the memfd. What's mapped into the memfd no longer corresponds to
>> what's pinned / mapped into the VMA.
> IIUC, making use of the DMA_BUF_IOCTL_SYNC ioctl would help with any
> coherency issues:
> https://www.kernel.org/doc/html/v6.2/driver-api/dma-buf.html#c.dma_buf_sync
>
Would it as of now? udmabuf_create() pulls the shmem pages out of the
memfd, not sure how DMA_BUF_IOCTL_SYNC would help to update that
whenever the pages inside the memfd would change (e.g.,
FALLOC_FL_PUNCH_HOLE + realloc).
But that's most probably simply "not supported".
>>
>>
>> Was linux-mm (and especially shmem maintainers, ccing Hugh) involved in
>> the upstreaming of udmabuf?
> It does not appear so from the link below although other key lists were cc'd:
> https://patchwork.freedesktop.org/patch/246100/?series=39879&rev=7
That's unfortunate :(
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists