linux-kernel - Re: [PATCH 1/2] mm: introduce put_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <76788484-d5ec-91f2-1f66-141764ba0b1e@nvidia.com>
Date:   Wed, 16 Jan 2019 21:25:05 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Jan Kara <jack@...e.cz>, Jerome Glisse <jglisse@...hat.com>
CC:     Matthew Wilcox <willy@...radead.org>,
        Dave Chinner <david@...morbit.com>,
        Dan Williams <dan.j.williams@...el.com>,
        John Hubbard <john.hubbard@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>, <tom@...pey.com>,
        Al Viro <viro@...iv.linux.org.uk>, <benve@...co.com>,
        Christoph Hellwig <hch@...radead.org>,
        Christopher Lameter <cl@...ux.com>,
        "Dalessandro, Dennis" <dennis.dalessandro@...el.com>,
        Doug Ledford <dledford@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Michal Hocko <mhocko@...nel.org>, <mike.marciniszyn@...el.com>,
        <rcampbell@...dia.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

On 1/15/19 12:07 AM, Jan Kara wrote:
>>>>> [...]
>>> Also there is one more idea I had how to record number of pins in the page:
>>>
>>> #define PAGE_PIN_BIAS	1024
>>>
>>> get_page_pin()
>>> 	atomic_add(&page->_refcount, PAGE_PIN_BIAS);
>>>
>>> put_page_pin();
>>> 	atomic_add(&page->_refcount, -PAGE_PIN_BIAS);
>>>
>>> page_pinned(page)
>>> 	(atomic_read(&page->_refcount) - page_mapcount(page)) > PAGE_PIN_BIAS
>>>
>>> This is pretty trivial scheme. It still gives us 22-bits for page pins
>>> which should be plenty (but we should check for that and bail with error if
>>> it would overflow). Also there will be no false negatives and false
>>> positives only if there are more than 1024 non-page-table references to the
>>> page which I expect to be rare (we might want to also subtract
>>> hpage_nr_pages() for radix tree references to avoid excessive false
>>> positives for huge pages although at this point I don't think they would
>>> matter). Thoughts?

Hi Jan,

Some details, sorry I'm not fully grasping your plan without more explanation:

Do I read it correctly that this uses the lower 10 bits for the original
page->_refcount, and the upper 22 bits for gup-pinned counts? If so, I'm surprised,
because gup-pinned is going to be less than or equal to the normal (get_page-based)
pin count. And 1024 seems like it might be reached in a large system with lots
of processes and IPC.

Are you just allowing the lower 10 bits to overflow, and that's why the 
subtraction of mapcount? Wouldn't it be better to allow more than 10 bits, 
instead?

Another question: do we just allow other kernel code to observe this biased
_refcount, or do we attempt to filter it out?  In other words, do you expect 
problems due to some kernel code checking the _refcount and finding a large 
number there, when it expected, say, 3? I recall some code tries to do 
that...in fact, ZONE_DEVICE is 1-based, instead of zero-based, with respect 
to _refcount, right?

thanks,
-- 
John Hubbard
NVIDIA