lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Dec 2021 10:58:36 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Nadav Amit <namit@...are.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

On 22.12.21 09:51, David Hildenbrand wrote:
> On 21.12.21 20:07, Jason Gunthorpe wrote:
>> On Tue, Dec 21, 2021 at 06:40:30PM +0100, David Hildenbrand wrote:
>>
>>> 2) is certainly the cherry on top. But it just means that R/O pins don't
>>> have to be the weird kid. And yes, achieving 2) would require
>>> FAULT_FLAG_EXCLUSIVE / FAULT_FLAG_UNSHARED, but it would really 99% do
>>> what existing COW logic does, just bypass the "map writable" and
>>> "trigger write fault" semantics.
>>
>> I still don't agree with this - when you come to patches can you have
>> this work at the end and under a good cover letter? Maybe it will make
>> more sense then.
> 
> Yes. But really, I think it's the logical consequence of what Linus said
> [1]:
> 
>   "And then all GUP-fast would need to do is to refuse to look up a page
>    that isn't exclusive to that VM. We already have the situation that
>    GUP-fast can fail for non-writable pages etc, so it's just another
>    test."
> 
> We must not FOLL_PIN a page that is not exclusive (not only on gup-fast,
> but really, on any gup). If we special case R/O FOLL_PIN, we cannot
> enable the sanity check on unpin as suggested by Linus [2]:
> 
>   "If we only set the exclusive VM bit on pages that get mapped into
>    user space, and we guarantee that GUP only looks up such pages, then
>    we can also add a debug test to the "unpin" case that the bit is
>    still set."
> 
> There are really only two feasible options I see when we want to take a
> R/O FOLL_PIN on a !PageAnonExclusive() anon page
> 
> (1) Fail the pinning completely. This implies that we'll have to fail
>     O_DIRECT once converted to FOLL_PIN.
> (2) Request to mark the page PageAnonExclusive() via a
>     FAULT_FLAG_UNSHARE and let it succeed.
> 
> 
> Anything else would require additional accounting that we already
> discussed in the past is hard -- for example, to differentiate R/O from
> R/W pins requiring two pin counters.
> 
> The only impact would be that FOLL_PIN after fork() has to go via a
> FAULT_FLAG_UNSHARE once, to turn the page PageAnonExclusive. IMHO this
> is the right thing to do for FOLL_LONGTERM. For !FOLL_LONGTERM it would
> be nice to optimize this, to *not* do that, but again ... this would
> require even more counters I think, for example, to differentiate
> between "R/W short/long-term or R/O long-term pin" and "R/O short-term pin".
> 
> So unless we discover a way to do additional accounting for ordinary 4k
> pages, I think we really can only do (1) or (2) to make sure we never
> ever pin a !PageAnonExclusive() page.

BTW, I just wondered if the optimization should actually be that R/O
short-term FOLL_PIN users should actually be using FOLL_GET instead. So
O_DIRECT with R/O would already be doing the right thing.

And it somewhat aligns with what we found: only R/W short-term FOLL_GET
is problematic, where we can lose writes to the page from the device via
O_DIRECT.

IIUC, our COW logic makes sure that a shared anonymous page that might
still be used by a R/O FOLL_GET cannot be modified, because any attempt
to modify it would result in a copy.

But I might be missing something, just an idea.


-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ