linux-kernel - Re: [PATCH v2 12/18] mm/gup: track FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f587647d-83dc-5bde-d244-f522ec5bda60@nvidia.com>
Date:   Mon, 4 Nov 2019 16:18:33 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Jerome Glisse <jglisse@...hat.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Alex Williamson <alex.williamson@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        Christoph Hellwig <hch@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Dave Chinner <david@...morbit.com>,
        David Airlie <airlied@...ux.ie>,
        "David S . Miller" <davem@...emloft.net>,
        Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
        Jason Gunthorpe <jgg@...pe.ca>, Jens Axboe <axboe@...nel.dk>,
        Jonathan Corbet <corbet@....net>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Hocko <mhocko@...e.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Paul Mackerras <paulus@...ba.org>,
        Shuah Khan <shuah@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, <bpf@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <kvm@...r.kernel.org>,
        <linux-block@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-fsdevel@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
        <linux-media@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        <linuxppc-dev@...ts.ozlabs.org>, <netdev@...r.kernel.org>,
        <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 12/18] mm/gup: track FOLL_PIN pages

Hi Dan, there is a question for you further down:


On 11/4/19 3:49 PM, Jerome Glisse wrote:
> On Mon, Nov 04, 2019 at 02:49:18PM -0800, John Hubbard wrote:
...
>>> Maybe add a small comment about wrap around :)
>>
>>
>> I don't *think* the count can wrap around, due to the checks in user_page_ref_inc().
>>
>> But it's true that the documentation is a little light here...What did you have 
>> in mind?
> 
> About false positive case (and how unlikely they are) and that wrap
> around is properly handle. Maybe just a pointer to the documentation
> so that people know they can go look there for details. I know my
> brain tend to forget where to look for things so i like to be constantly
> reminded hey the doc is Documentations/foobar :)
> 

I see. OK, here's a version with a thoroughly overhauled comment header:

/**
 * page_dma_pinned() - report if a page is pinned for DMA.
 *
 * This function checks if a page has been pinned via a call to
 * pin_user_pages*() or pin_longterm_pages*().
 *
 * The return value is partially fuzzy: false is not fuzzy, because it means
 * "definitely not pinned for DMA", but true means "probably pinned for DMA, but
 * possibly a false positive due to having at least GUP_PIN_COUNTING_BIAS worth
 * of normal page references".
 *
 * False positives are OK, because: a) it's unlikely for a page to get that many
 * refcounts, and b) all the callers of this routine are expected to be able to
 * deal gracefully with a false positive.
 *
 * For more information, please see Documentation/vm/pin_user_pages.rst.
 *
 * @page:	pointer to page to be queried.
 * @Return:	True, if it is likely that the page has been "dma-pinned".
 *		False, if the page is definitely not dma-pinned.
 */
static inline bool page_dma_pinned(struct page *page)


>>> [...]
>>>
>>>> @@ -1930,12 +2028,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  
>>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
>>>>  		if (unlikely(!pgmap)) {
>>>> -			undo_dev_pagemap(nr, nr_start, pages);
>>>> +			undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>  			return 0;
>>>>  		}
>>>>  		SetPageReferenced(page);
>>>>  		pages[*nr] = page;
>>>> -		get_page(page);
>>>> +
>>>> +		if (flags & FOLL_PIN) {
>>>> +			if (unlikely(!user_page_ref_inc(page))) {
>>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
>>>> +				return 0;
>>>> +			}
>>>
>>> Maybe add a comment about a case that should never happens ie
>>> user_page_ref_inc() fails after the second iteration of the
>>> loop as it would be broken and a bug to call undo_dev_pagemap()
>>> after the first iteration of that loop.
>>>
>>> Also i believe that this should never happens as if first
>>> iteration succeed than __page_cache_add_speculative() will
>>> succeed for all the iterations.
>>>
>>> Note that the pgmap case above follows that too ie the call to
>>> get_dev_pagemap() can only fail on first iteration of the loop,
>>> well i assume you can never have a huge device page that span
>>> different pgmap ie different devices (which is a reasonable
>>> assumption). So maybe this code needs fixing ie :
>>>
>>> 		pgmap = get_dev_pagemap(pfn, pgmap);
>>> 		if (unlikely(!pgmap))
>>> 			return 0;
>>>
>>>
>>
>> OK, yes that does make sense. And I think a comment is adequate,
>> no need to check for bugs during every tail page iteration. So how 
>> about this, as a preliminary patch:
> 
> Actualy i thought about it and i think that there is pgmap
> per section and thus maybe one device can have multiple pgmap
> and that would be an issue for page bigger than section size
> (ie bigger than 128MB iirc). I will go double check that, but
> maybe Dan can chime in.
> 
> In any case my comment above is correct for the page ref
> increment, if the first one succeed than others will too
> or otherwise it means someone is doing too many put_page()/
> put_user_page() which is _bad_ :)
> 

I'll wait to hear from Dan before doing anything rash. :)


thanks,

John Hubbard
NVIDIA