[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4jbwtOqG_473SeK12LKghMo6mCDWRuTxqYVP6R-sLhpoA@mail.gmail.com>
Date: Mon, 18 Jun 2018 13:04:36 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: John Hubbard <jhubbard@...dia.com>, Christoph Hellwig <hch@....de>,
John Hubbard <john.hubbard@...il.com>,
Matthew Wilcox <willy@...radead.org>,
Michal Hocko <mhocko@...nel.org>,
Christopher Lameter <cl@...ux.com>, Jan Kara <jack@...e.cz>,
Linux MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-rdma <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()
On Mon, Jun 18, 2018 at 12:31 PM, Jason Gunthorpe <jgg@...pe.ca> wrote:
> On Mon, Jun 18, 2018 at 12:21:46PM -0700, Dan Williams wrote:
>> On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard <jhubbard@...dia.com> wrote:
>> > On 06/18/2018 10:56 AM, Dan Williams wrote:
>> >> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard <jhubbard@...dia.com> wrote:
>> >>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote:
>> >>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote:
>> >>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that
>> >>>>> these pages are going to be treated specially. In particular, the caller
>> >>>>> does not really want or need to support certain file operations, while the
>> >>>>> page is flagged this way.
>> >>>>>
>> >>>>> If necessary, we could add a new API call.
>> >>>>
>> >>>> That API call is called get_user_pages_longterm.
>> >>>
>> >>> OK...I had the impression that this was just semi-temporary API for dax, but
>> >>> given that it's an exported symbol, I guess it really is here to stay.
>> >>
>> >> The plan is to go back and provide api changes that bypass
>> >> get_user_page_longterm() for RDMA. However, for VFIO and others, it's
>> >> not clear what we could do. In the VFIO case the guest would need to
>> >> be prepared handle the revocation.
>> >
>> > OK, let's see if I understand that plan correctly:
>> >
>> > 1. Change RDMA users (this could be done entirely in the various device drivers'
>> > code, unless I'm overlooking something) to use mmu notifiers, and to do their
>> > DMA to/from non-pinned pages.
>>
>> The problem with this approach is surprising the RDMA drivers with
>> notifications of teardowns. It's the RDMA userspace applications that
>> need the notification, and it likely needs to be explicit opt-in, at
>> least for the non-ODP drivers.
>
> Well, more than that, we have no real plan on how to accomplish this,
> or any idea if it can even really work.. Most userspace give up
> control of the memory lifetime to the remote side of the connection
> and have no way to recover it other than a full teardown.
>
> Given that John is trying to fix a kernel oops, I don't think we
> should tie progress on it to the RDMA notification idea.
>
> .. and given that John is trying to fix a kernel oops, maybe the
> weird/bad/ugly behavior of ftruncte is a better bug to have than for
> unprivileged users to be able to oops the kernel???
Trading one bug for another is not a fix. We did not fix the
DAX-dma-vs-trruncate bug by breaking truncate() guarantees.
Powered by blists - more mailing lists