[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <151edbf3-66ff-df0c-c1cc-5998de50111e@nvidia.com>
Date: Wed, 20 Jun 2018 15:55:41 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Jan Kara <jack@...e.cz>
CC: Matthew Wilcox <willy@...radead.org>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...pe.ca>,
John Hubbard <john.hubbard@...il.com>,
Michal Hocko <mhocko@...nel.org>,
Christopher Lameter <cl@...ux.com>,
Linux MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-rdma <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()
On 06/20/2018 05:08 AM, Jan Kara wrote:
> On Tue 19-06-18 11:11:48, John Hubbard wrote:
>> On 06/19/2018 03:41 AM, Jan Kara wrote:
>>> On Tue 19-06-18 02:02:55, Matthew Wilcox wrote:
>>>> On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote:
[...]
>>> I'm also still pondering the idea of inserting a "virtual" VMA into vma
>>> interval tree in the inode - as the GUP references are IMHO closest to an
>>> mlocked mapping - and that would achieve all the functionality we need as
>>> well. I just didn't have time to experiment with it.
>>
>> How would this work? Would it have the same virtual address range? And how
>> does it avoid the problems we've been discussing? Sorry to be a bit slow
>> here. :)
>
> The range covered by the virtual mapping would be the one sent to
> get_user_pages() to get page references. And then we would need to teach
> page_mkclean() to check for these virtual VMAs and block / skip / report
> (different situations would need different behavior) such page. But this
> second part is the same regardless how we identify a page that is pinned by
> get_user_pages().
OK. That neatly avoids the need a new page flag, I think. But of course it is
somewhat more extensive to implement. Sounds like something to keep in mind,
in case it has better tradeoffs than the direction I'm heading so far.
>>> And then there's the aspect that both these approaches are a bit too
>>> heavyweight for some get_user_pages_fast() users (e.g. direct IO) - Al Viro
>>> had an idea to use page lock for that path but e.g. fs/direct-io.c would have
>>> problems due to lock ordering constraints (filesystem ->get_block would
>>> suddently get called with the page lock held). But we can probably leave
>>> performance optimizations for phase two.
>>
>>
>> So I assume that phase one would be to apply this approach only to
>> get_user_pages_longterm. (Please let me know if that's wrong.)
>
> No, I meant phase 1 would be to apply this to all get_user_pages() flavors.
> Then phase 2 is to try to find a way to make get_user_pages_fast() fast
> again. And then in parallel to that, we also need to find a way for
> get_user_pages_longterm() to signal to the user pinned pages must be
> released soon. Because after phase 1 pinned pages will block page
> writeback and such system won't oops but will become unusable
> sooner rather than later. And again this problem needs to be solved
> regardless of a mechanism of identifying pinned pages.
>
OK, thanks, that does help. I had the priorities of these get_user_pages*()
changes all scrambled, but between your and Dan's explanation, I finally
understand the preferred ordering of this work.
Powered by blists - more mailing lists