lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c854b2d6-5ec1-a8b5-e366-fbefdd9fdd10@nvidia.com>
Date:   Tue, 19 Mar 2019 18:43:45 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Jerome Glisse <jglisse@...hat.com>,
        Dave Chinner <david@...morbit.com>
CC:     "Kirill A. Shutemov" <kirill@...temov.name>,
        <john.hubbard@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <linux-mm@...ck.org>, Al Viro <viro@...iv.linux.org.uk>,
        Christian Benvenuti <benve@...co.com>,
        Christoph Hellwig <hch@...radead.org>,
        Christopher Lameter <cl@...ux.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dennis Dalessandro <dennis.dalessandro@...el.com>,
        Doug Ledford <dledford@...hat.com>,
        Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Mike Marciniszyn <mike.marciniszyn@...el.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Tom Talpey <tom@...pey.com>,
        LKML <linux-kernel@...r.kernel.org>,
        <linux-fsdevel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v4 1/1] mm: introduce put_user_page*(), placeholder
 versions

On 3/19/19 5:08 PM, Jerome Glisse wrote:
> On Wed, Mar 20, 2019 at 10:57:52AM +1100, Dave Chinner wrote:
>> On Tue, Mar 19, 2019 at 06:06:55PM -0400, Jerome Glisse wrote:
>>> On Wed, Mar 20, 2019 at 08:23:46AM +1100, Dave Chinner wrote:
>>>> On Tue, Mar 19, 2019 at 10:14:16AM -0400, Jerome Glisse wrote:
>>>>> On Tue, Mar 19, 2019 at 09:47:24AM -0400, Jerome Glisse wrote:
>>>>>> On Tue, Mar 19, 2019 at 03:04:17PM +0300, Kirill A. Shutemov wrote:
>>>>>>> On Fri, Mar 08, 2019 at 01:36:33PM -0800, john.hubbard@...il.com wrote:
>>>>>>>> From: John Hubbard <jhubbard@...dia.com>
>>>>>> [...]
>>>>> Forgot to mention one thing, we had a discussion with Andrea and Jan
>>>>> about set_page_dirty() and Andrea had the good idea of maybe doing
>>>>> the set_page_dirty() at GUP time (when GUP with write) not when the
>>>>> GUP user calls put_page(). We can do that by setting the dirty bit
>>>>> in the pte for instance. They are few bonus of doing things that way:
>>>>>     - amortize the cost of calling set_page_dirty() (ie one call for
>>>>>       GUP and page_mkclean()
>>>>>     - it is always safe to do so at GUP time (ie the pte has write
>>>>>       permission and thus the page is in correct state)
>>>>>     - safe from truncate race
>>>>>     - no need to ever lock the page
>>>>
>>>> I seem to have missed this conversation, so please excuse me for
>>>
>>> The set_page_dirty() at GUP was in a private discussion (it started
>>> on another topic and drifted away to set_page_dirty()).
>>>
>>>> asking a stupid question: if it's a file backed page, what prevents
>>>> background writeback from cleaning the dirty page ~30s into a long
>>>> term pin? i.e. I don't see anything in this proposal that prevents
>>>> the page from being cleaned by writeback and putting us straight
>>>> back into the situation where a long term RDMA is writing to a clean
>>>> page....
>>>
>>> So this patchset does not solve this issue.
>>
>> OK, so it just kicks the can further down the road.
>>
>>>     [3..N] decide what to do for GUPed page, so far the plans seems
>>>          to be to keep the page always dirty and never allow page
>>>          write back to restore the page in a clean state. This does
>>>          disable thing like COW and other fs feature but at least
>>>          it seems to be the best thing we can do.
>>
>> So the plan for GUP vs writeback so far is "break fsync()"? :)
>>
>> We might need to work on that a bit more...
> 
> Sorry forgot to say that we still do write back using a bounce page
> so that at least we write something to disk that is just a snapshot
> of the GUPed page everytime writeback kicks in (so either through
> radix tree dirty page write back or fsync or any other sync events).
> So many little details that i forgot the big chunk :)
> 
> Cheers,
> Jérôme
> 

Dave, Jan, Jerome,

Bounce pages for periodic data integrity still seem viable. But for the
question of things like fsync or truncate, I think we were zeroing in
on file leases as a nice building block.

Can we revive the file lease discussion? By going all the way out to user
space and requiring file leases to be coordinated at a high level in the
software call chain, it seems like we could routinely avoid some of the
worst conflicts that the kernel code has to resolve.

For example:

Process A
=========
    gets a lease on file_a that allows gup 
        usage on a range within file_a

    sets up writable DMA:
        get_user_pages() on the file_a range
        start DMA (independent hardware ops)
            hw is reading and writing to range

                                                    Process B
                                                    =========
                                                    truncate(file_a)
                                                       ...
                                                       __break_lease()
    
    handle SIGIO from __break_lease
         if unhandled, process gets killed
         and put_user_pages should get called
         at some point here

...and so this way, user space gets to decide the proper behavior,
instead of leaving the kernel in the dark with an impossible decision
(kill process A? Block process B? User space knows the preference,
per app, but kernel does not.)
        

thanks,
-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ