[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c29bd0cb-0907-4c5b-8244-83bbb256d964@redhat.com>
Date: Thu, 8 May 2025 17:37:04 +0200
From: David Hildenbrand <david@...hat.com>
To: Pantelis Antoniou <p.antoniou@...tner.samsung.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Artem Krupotkin <artem.k@...sung.com>,
Charles Briere <c.briere@...sung.com>,
Wade Farnsworth <wade.farnsworth@...mens.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages
On 08.05.25 17:23, Pantelis Antoniou wrote:
> On Thu, 8 May 2025 17:03:46 +0200
> David Hildenbrand <david@...hat.com> wrote:
>
> Hi there,
>
>> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > Recent updates
>> to net filesystems enabled zero copy operations, > which require
>> getting a user space page pinned. > > This does not work for pages
>> that were allocated via __get_user_pages
>> On 07.05.25 17:41, Pantelis Antoniou wrote:
>>
>> Hi,
>>
>>> Recent updates to net filesystems enabled zero copy operations,
>>> which require getting a user space page pinned.
>>>
>>> This does not work for pages that were allocated via
>>> __get_user_pages and then mapped to user-space via remap_pfn_rage.
>>
>> Right. Because the struct page of a VM_PFNMAP *must not be touched*.
>> It has to be treated like it doesn't exist.
>>
>
> Well, that's not exactly the case. For pages mapped to user space via
> remap_pfn_range() the VM_PFNMAP bit is set even though the pages do
> have a struct page.
Yes. And VM_PFNMAP is the big flag that these pages shall not be
touched. Even if it exists. Even if you say please. :)
See the comment above vm_normal_page():
"Special" mappings do not wish to be associated with a "struct
page" (either it doesn't exist, or it exists but they don't want
to touch it)
VM_MIXEDMAP could maybe be used for that purpose: possibly GUP also has
to be updated to make use of that. (I was hoping we can get rid of
VM_MIXEDMAP at some point)
>
> The details of how it happens are at the cover page of this patch but
> let me paste the relevant bits here.
>
> "In our emulation environment we have noticed failing writes when
> performing I/O from a userspace mapped DRM GEM buffer object.
> The platform does not use VRAM, all graphics memory is regular DRAM
> memory, allocated via __get_free_pages
>
> The same write was successful from a heap allocated bounce buffer.
>
> The sequence of events is as follows.
>
> 1. A BO (Buffer Object) is created, and it's backing memory is allocated via
> __get_user_pages()
__get_user_pages() only allocates memory via page faults. Are you sure
you meant __get_user_pages() here?
>
> 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened
> file handle of a DRM driver. The mapping is done via the
> drm_gem_mmap_obj() call.
>
> 3. Userspace issues a write to a file copying the contents of the BO.
>
> 3a. If the file is located on regular filesystem (like ext4), the write
> completes successfully.
>
> 3b. If the file is located on a network filesystem, like 9p the write fails.
>
> The write fails because v9fs_file_write_iter() will call
> netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will
> call netfs_extract_user_iter()
>
> netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for
> a user backed iterator will call iov_iter_extract_user_pages which will call
> pin_user_pages_fast() which finally will call __gup_longterm_locked().
>
> __gup_longterm_locked() will call __get_user_pages_locked() which will fail
> because the VMA is marked with the VM_IO and VM_PFNMAP flags."
So, drm_gem_mmap_obj() ends up using remap_pfn_rage()?
I can spot that drm_gem_mmap_obj() has a path where it explicitly sets
vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND |
VM_DONTDUMP);
Which is a clear sign to core-MM (incl. GUP) to never mess with the
mapped pages.
>
>>>
>>> remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits.
>>> VM_PFNMAP in particular mark the pages as not having struct_page
>>> associated with them, which is not the case for __get_user_pages()
>>>
>>> This in turn makes any attempt to lock a page fail, and breaking
>>> I/O from that address range.
>>>
>>> This patch address it by special casing pages in those VMAs and not
>>> calling vm_normal_page() for them.
>>>
>>> Signed-off-by: Pantelis Antoniou <p.antoniou@...tner.samsung.com>
>>> ---
>>> mm/gup.c | 22 ++++++++++++++++++----
>>> 1 file changed, 18 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 84461d384ae2..e185c18c0c81 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t
>>> pte, struct page *page, return !userfaultfd_pte_wp(vma, pte);
>>> }
>>>
>>> +static struct page *gup_normal_page(struct vm_area_struct *vma,
>>> + unsigned long address, pte_t pte)
>>> +{
>>> + unsigned long pfn;
>>> +
>>> + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) {
>>> + pfn = pte_pfn(pte);
>>> + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn >
>>> highest_memmap_pfn)
>>> + return NULL;
>>> + return pfn_to_page(pfn);
>>> + }
>>> + return vm_normal_page(vma, address, pte);
>>
>> I enjoy seeing vm_normal_page() checks in GUP code.
>>
>> I don't enjoy seeing what you added before that :)
>>
>> If vm_normal_page() tells you "this is not a normal", then we should
>> not touch it. There is one exception: the shared zeropage.
>>
>>
>> So, unfortunately, this is wrong.
>>
>
> Well, lets talk about a proper fix then for the previously mentioned
> user-space regression.
You really have to find out the responsible commit. GUP has been
behaving like that forever I'm afraid.
And even the VM_PFNMAP was in drm_gem_mmap_obj() already at least in
2012 if I am not wrong.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists