linux-kernel - Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <151edbf3-66ff-df0c-c1cc-5998de50111e@nvidia.com>
Date:   Wed, 20 Jun 2018 15:55:41 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Jan Kara <jack@...e.cz>
CC:     Matthew Wilcox <willy@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...pe.ca>,
        John Hubbard <john.hubbard@...il.com>,
        Michal Hocko <mhocko@...nel.org>,
        Christopher Lameter <cl@...ux.com>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rdma <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

On 06/20/2018 05:08 AM, Jan Kara wrote:
> On Tue 19-06-18 11:11:48, John Hubbard wrote:
>> On 06/19/2018 03:41 AM, Jan Kara wrote:
>>> On Tue 19-06-18 02:02:55, Matthew Wilcox wrote:
>>>> On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote:
[...]
>>> I'm also still pondering the idea of inserting a "virtual" VMA into vma
>>> interval tree in the inode - as the GUP references are IMHO closest to an
>>> mlocked mapping - and that would achieve all the functionality we need as
>>> well. I just didn't have time to experiment with it.
>>
>> How would this work? Would it have the same virtual address range? And how
>> does it avoid the problems we've been discussing? Sorry to be a bit slow
>> here. :)
> 
> The range covered by the virtual mapping would be the one sent to
> get_user_pages() to get page references. And then we would need to teach
> page_mkclean() to check for these virtual VMAs and block / skip / report
> (different situations would need different behavior) such page. But this
> second part is the same regardless how we identify a page that is pinned by
> get_user_pages().


OK. That neatly avoids the need a new page flag, I think. But of course it is 
somewhat more extensive to implement. Sounds like something to keep in mind,
in case it has better tradeoffs than the direction I'm heading so far.

 
>>> And then there's the aspect that both these approaches are a bit too
>>> heavyweight for some get_user_pages_fast() users (e.g. direct IO) - Al Viro
>>> had an idea to use page lock for that path but e.g. fs/direct-io.c would have
>>> problems due to lock ordering constraints (filesystem ->get_block would
>>> suddently get called with the page lock held). But we can probably leave
>>> performance optimizations for phase two.
>>
>>  
>> So I assume that phase one would be to apply this approach only to
>> get_user_pages_longterm. (Please let me know if that's wrong.)
> 
> No, I meant phase 1 would be to apply this to all get_user_pages() flavors.
> Then phase 2 is to try to find a way to make get_user_pages_fast() fast
> again. And then in parallel to that, we also need to find a way for
> get_user_pages_longterm() to signal to the user pinned pages must be
> released soon. Because after phase 1 pinned pages will block page
> writeback and such system won't oops but will become unusable
> sooner rather than later. And again this problem needs to be solved
> regardless of a mechanism of identifying pinned pages.
> 

OK, thanks, that does help. I had the priorities of these get_user_pages*()
changes all scrambled, but between your and Dan's explanation, I finally 
understand the preferred ordering of this work.