lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fca16906-8e7d-5d04-6990-dfa8392bad8b@redhat.com>
Date:   Sun, 19 Dec 2021 18:59:51 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Nadav Amit <namit@...are.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

On 19.12.21 18:44, Linus Torvalds wrote:
> David, you said that you were working on some alternative model. Is it
> perhaps along these same lines below?
> 
> I was thinking that a bit in the page tables to say "this page is
> exclusive to this VM" would be a really simple thing to deal with for
> fork() and swapout and friends.
> 
> But we don't have such a bit in general, since many architectures have
> very limited sets of SW bits, and even when they exist we've spent
> them on things like UDDF_WP.,
> 
> But the more I think about the "bit doesn't even have to be in the
> page tables", the more I think maybe that's the solution.
> 
> A bit in the 'struct page' itself.
> 

Exactly what I am prototyping right now.

> For hugepages, you'd have to distribute said bit when  you split the hugepage.

Yes, that's one tricky part ...

> 
> But other than that it looks quite simple: anybody who does a virtual
> copy will inevitably be messing with the page refcount, so clearing
> the "exclusive ownership" bit wouldn't be costly: the 'struct page'
> cacheline is already getting dirtied.
> 
> Or what was your model you were implying you were thinking about in
> your other email? You said

I'm playing with the idea of not setting the bit always during COW but
only on GUP request to set the bit (either manually if possible or via
FOLL_UNSHARE). That's a bit more tricky but allows for decoupling that
approach completely from the page_pin() counter.

fork() is allowed to clear the bit if page_count() == 1 and share the
page. So no GUP->no fork() performance changes (!) . Otherwise the bit
can only vanish if we swapout/migrate the page: in which case there are
no additional GUP/references on the page that rely on it!

The bit can be set directly if we have to copy the page in the fault
handler (COW or unshare). Outside of COW/Unshare code, the bit can only
be set if page_count() == 1 and we sync against fork(). (and that's the
problem for gup-fast-only that I'm investigating right now, because it
would then always have to fallback to the slow variant if the bit isn't
already set)

So the bit can "vanish" whenever there is no additional reference on the
page. GUP syncs against fork() and can thereby set the bit/request to
set the bit.

I'm trying to decouple it completely from the page_pin() counter to also
be able to handle FOLL_GET (O_DIRECT reproducers unfortunately) correctly.

Not set it stone, just an idea what I'm playing with right now ... and I
have to tripple-check if
* page is PTE mapped in the page table I'm walking
* page_count() == 1
Really means that "this is the only reference.". I do strongly believe
so .. :)

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ