lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 30 Oct 2022 11:51:48 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nadav Amit <nadav.amit@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        jroedel@...e.de, ubizjak@...il.com,
        Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

On Sun, Oct 30, 2022 at 11:19 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> And we'd _like_ to do the TLB flush before the remove_rmap(), but we
> *really* don't want to do that for every page.

Hmm. I have yet another crazy idea.

We could keep the current placement of the TLB flush, to just before
we drop the page table lock.

And we could do all the things we do in 'page_remove_rmap()' right now
*except* for the mapcount stuff.

And only move the mapcount code to the page freeing stage.

Because all the rmap() walk synchronization really needs is that
'page->_mapcount' is still elevated, and if it is it will serialize
with the page table lock.

And it turns out that 'page_remove_rmap()' already treats the case we
care about differently, and all it does is

        lock_page_memcg(page);

        if (!PageAnon(page)) {
                page_remove_file_rmap(page, compound);
                goto out;
        }
       ...
out:
        unlock_page_memcg(page);

        munlock_vma_page(page, vma, compound);

for that case.

And that 'page_remove_file_rmap()' is literally the code that modifies
the _mapcount.

Annoyingly, this is all complicated by that 'compound' argument, but
that's always false in that zap_page_range() case.

So what we *could* do, is make a new version of page_remove_rmap(),
which is specialized for this case: no 'compound' argument (always
false), and doesn't call 'page_remove_file_rmap()', because we'll do
that for the !PageAnon(page) case later after the TLB flush.

That would keep the existing TLB flush logic, keep the existing 'mark
page dirty' and would just make sure that 'folio_mkclean()' ends up
being serialized with the TLB flush simply because it will take the
page table lock because we delay the '._mapcount' update until
afterwards.

Annoyingly, the organization of 'page_remove_rmap()' is a bit ugly,
and we have several other callers that want the existing logic, so
while the above sounds conceptually simple, I think the patch would be
a bit messy.

                  Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ