linux-kernel - Re: Weird code with change "mm/gup: clean up follow_pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c0f57bfc-d34b-a34b-4f2d-0d66782e4ae7@nvidia.com>
Date:   Thu, 3 Feb 2022 16:59:56 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Lukas Bulwahn <lukas.bulwahn@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Peter Xu <peterx@...hat.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Hildenbrand <david@...hat.com>, Jan Kara <jack@...e.cz>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: Weird code with change "mm/gup: clean up follow_pfn_pte()
 slightly"

On 2/3/22 16:45, Jason Gunthorpe wrote:
> On Thu, Feb 03, 2022 at 12:44:57PM -0800, John Hubbard wrote:
>> On 2/3/22 05:01, Jason Gunthorpe wrote:
>> ...
>>>>> In the new branch if (pages), you set page = ERR_PTR(-EFAULT) and goto
>>>>> out. However, at the label out, the value of page is not used, but the
>>>>> return uses the variables i and ret.
>>>>
>>>> Yes, I think that the complaint is accurate. The intent of this code is
>>>> to return either number of pages so far (i) or ret (which should be zero
>>>> in this case), because we are just stopping early, rather than calling
>>>> this an actual error.
>>>
>>> IIRC GUP shouldn't return 0, it should return an error code, not zero.
>>>
>>> Jason
>>
>> Errors work for single pages, but GUP is a multi-page API call. If it
>> returned an error part way through the list of pages, then callers would
>> have no way of knowing how many pages to release.
> 
> Yes, but that is returning a positive error code, I said it should not
> return zero.
> 
> When it hits an error with pages already loaded it returns that number
> and the caller will then do gup once more with the VA pointing at the
> problematic page. Then GUP can return the error code because it has 0
> pages on the next iteration.
> 
> It should not return 0 here when it got an error.

This is perhaps better API design, but it's not what exists now. The call
sites today handle 0 pages ret value correctly, already. There are lots
of call sites. Is this worth changing?

Also, to be clear, are you proposing just handling zero as a special,
or something more extensive? Because after we get N pages into it,
someone has to unpin those pages, and it's been up to the caller so far.

> 
>>   * Returns either number of pages pinned (which may be less than the
>>   * number requested), or an error. Details about the return value:
>>   *
>>   * -- If nr_pages is 0, returns 0.
>>   * -- If nr_pages is >0, but no pages were pinned, returns -errno.
>>   * -- If nr_pages is >0, and some pages were pinned, returns the number of
>>   *    pages pinned. Again, this may be less than nr_pages.
>>   * -- 0 return value is possible when the fault would need to be retried.
> 
> I actually don't know of any place that handles the 0 return code, or
> what 'fault would need to be retried' is supposed to mean for the
> caller ...
> 

There are quite a few places that handle a 0 return, and they understand
that it is an error for their case. For example:

static int non_atomic_pte_lookup(struct vm_area_struct *vma,
				 unsigned long vaddr, int write,
				 unsigned long *paddr, int *pageshift)
{
	struct page *page;

#ifdef CONFIG_HUGETLB_PAGE
	*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
#else
	*pageshift = PAGE_SHIFT;
#endif
	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
		return -EFAULT;
	*paddr = page_to_phys(page);
	put_page(page);
	return 0;
}


thanks,
-- 
John Hubbard
NVIDIA