lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250508182356.45dbfd40@sarc.samsung.com>
Date: Thu, 8 May 2025 18:23:56 +0300
From: Pantelis Antoniou <p.antoniou@...tner.samsung.com>
To: David Hildenbrand <david@...hat.com>
CC: Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, Artem Krupotkin <artem.k@...sung.com>,
	Charles Briere <c.briere@...sung.com>, Wade Farnsworth
	<wade.farnsworth@...mens.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated
 pages

On Thu, 8 May 2025 17:03:46 +0200
David Hildenbrand <david@...hat.com> wrote:

Hi there,

> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > Recent updates
> to net filesystems enabled zero copy operations, > which require
> getting a user space page pinned. > > This does not work for pages
> that were allocated via __get_user_pages 
> On 07.05.25 17:41, Pantelis Antoniou wrote:
> 
> Hi,
> 
> > Recent updates to net filesystems enabled zero copy operations,
> > which require getting a user space page pinned.
> > 
> > This does not work for pages that were allocated via
> > __get_user_pages and then mapped to user-space via remap_pfn_rage.
> 
> Right. Because the struct page of a VM_PFNMAP *must not be touched*.
> It has to be treated like it doesn't exist.
> 

Well, that's not exactly the case. For pages mapped to user space via
remap_pfn_range() the VM_PFNMAP bit is set even though the pages do
have a struct page.

The details of how it happens are at the cover page of this patch but
let me paste the relevant bits here.

"In our emulation environment we have noticed failing writes when
performing I/O from a userspace mapped DRM GEM buffer object.
The platform does not use VRAM, all graphics memory is regular DRAM
memory, allocated via __get_free_pages

The same write was successful from a heap allocated bounce buffer.

The sequence of events is as follows.

1. A BO (Buffer Object) is created, and it's backing memory is allocated via
   __get_user_pages()

2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened
   file handle of a DRM driver. The mapping is done via the
   drm_gem_mmap_obj() call.

3. Userspace issues a write to a file copying the contents of the BO.

3a. If the file is located on regular filesystem (like ext4), the write
    completes successfully.

3b. If the file is located on a network filesystem, like 9p the write fails.

The write fails because v9fs_file_write_iter() will call
netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will 
call netfs_extract_user_iter() 

netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for
a user backed iterator will call iov_iter_extract_user_pages which will call
pin_user_pages_fast() which finally will call __gup_longterm_locked().

__gup_longterm_locked() will call __get_user_pages_locked() which will fail
because the VMA is marked with the VM_IO and VM_PFNMAP flags."

> > 
> > remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits.
> > VM_PFNMAP in particular mark the pages as not having struct_page
> > associated with them, which is not the case for __get_user_pages()
> > 
> > This in turn makes any attempt to lock a page fail, and breaking
> > I/O from that address range.
> > 
> > This patch address it by special casing pages in those VMAs and not
> > calling vm_normal_page() for them.
> > 
> > Signed-off-by: Pantelis Antoniou <p.antoniou@...tner.samsung.com>
> > ---
> >   mm/gup.c | 22 ++++++++++++++++++----
> >   1 file changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 84461d384ae2..e185c18c0c81 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t
> > pte, struct page *page, return !userfaultfd_pte_wp(vma, pte);
> >   }
> >   
> > +static struct page *gup_normal_page(struct vm_area_struct *vma,
> > +		unsigned long address, pte_t pte)
> > +{
> > +	unsigned long pfn;
> > +
> > +	if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) {
> > +		pfn = pte_pfn(pte);
> > +		if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn >
> > highest_memmap_pfn)
> > +			return NULL;
> > +		return pfn_to_page(pfn);
> > +	}
> > +	return vm_normal_page(vma, address, pte);
> 
> I enjoy seeing vm_normal_page() checks in GUP code.
> 
> I don't enjoy seeing what you added before that :)
> 
> If vm_normal_page() tells you "this is not a normal", then we should
> not touch it. There is one exception: the shared zeropage.
> 
> 
> So, unfortunately, this is wrong.
> 

Well, lets talk about a proper fix then for the previously mentioned
user-space regression.

Regards

-- Pantelis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ