linux-kernel - Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com>
Date:   Mon, 18 Jun 2018 14:36:44 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...pe.ca>,
        John Hubbard <john.hubbard@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Christopher Lameter <cl@...ux.com>, Jan Kara <jack@...e.cz>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rdma <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

On 06/18/2018 12:21 PM, Dan Williams wrote:
> On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard <jhubbard@...dia.com> wrote:
>> On 06/18/2018 10:56 AM, Dan Williams wrote:
>>> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard <jhubbard@...dia.com> wrote:
>>>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote:
>>>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote:
>>>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that
>>>>>> these pages are going to be treated specially. In particular, the caller
>>>>>> does not really want or need to support certain file operations, while the
>>>>>> page is flagged this way.
>>>>>>
>>>>>> If necessary, we could add a new API call.
>>>>>
>>>>> That API call is called get_user_pages_longterm.
>>>>
>>>> OK...I had the impression that this was just semi-temporary API for dax, but
>>>> given that it's an exported symbol, I guess it really is here to stay.
>>>
>>> The plan is to go back and provide api changes that bypass
>>> get_user_page_longterm() for RDMA. However, for VFIO and others, it's
>>> not clear what we could do. In the VFIO case the guest would need to
>>> be prepared handle the revocation.
>>
>> OK, let's see if I understand that plan correctly:
>>
>> 1. Change RDMA users (this could be done entirely in the various device drivers'
>> code, unless I'm overlooking something) to use mmu notifiers, and to do their
>> DMA to/from non-pinned pages.
> 
> The problem with this approach is surprising the RDMA drivers with
> notifications of teardowns. It's the RDMA userspace applications that
> need the notification, and it likely needs to be explicit opt-in, at
> least for the non-ODP drivers.
> 
>> 2. Return early from get_user_pages_longterm, if the memory is...marked for
>> RDMA? (How? Same sort of page flag that I'm floating here, or something else?)
>> That would avoid the problem with pinned pages getting their buffer heads
>> removed--by disallowing the pinning. Makes sense.
> 
> Well, right now the RDMA workaround is DAX specific and it seems we
> need to generalize it for the page-cache case. One thought is to have
> try_to_unmap() take it's own reference and wait for the page reference
> count to drop to one so that the truncate path knows the page is
> dma-idle and disconnected from the page cache, but I have not looked
> at the details.
> 
>> Also, is there anything I can help with here, so that things can happen sooner?
> 
> I do think we should explore a page flag for pages that are "long
> term" pinned. Michal asked for something along these lines at LSF / MM
> so that the core-mm can give up on pages that the kernel has lost
> lifetime control. Michal, did I capture your ask correctly?


OK, that "refcount == 1" approach sounds promising:

   -- still use a page flag, but narrow the scope to get_user_pages_longterm() pages
   -- just wait in try_to_unmap, instead of giving up

I'll look into it, while waiting for Michal's thoughts on this.