lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Nov 2022 20:28:07 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Hugh Dickins <hughd@...gle.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Nick Piggin <npiggin@...il.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Joerg Roedel <jroedel@...e.de>,
        Uros Bizjak <ubizjak@...il.com>,
        Alistair Popple <apopple@...dia.com>,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: mm: delay rmap removal until after TLB flush

On Mon, Nov 7, 2022 at 3:47 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Ok, so I think I have a fairly clean way to do this.
>
> Let me try to make that series look reasonable, although it might be
> until tomorrow. I'll need to massage my mess into not just prettier
> code, but a sane history.

Ugh. Ok, so massaging it into a saner form and splitting it out into a
pretty history took a lot longer than writing the initial ugly
"something like this".

Anyway, the end result looks very different from the previous series,
since I very consciously tried to keep away from rmap changes to not
clash with Hugh's work, and also made sure we still do the
page_remove_rmap() under the page table lock, just *after* the TLB
flush.

It does share some basic stuff - I'm still using that "struct
encoded_page" thing, I just moved it into a separate commit, and moved
it to where it conceptually belongs (I'd love to say that I also made
the mlock.c code use it, but I didn't - that 'pagevec' thing is a
pretty pointless abstraction and I didn't want to fight it).

So you'll see some familiar support structures and scaffolding, but
the actual approach to zap_pte_range() is very different.

The approach taken to the "s390 is very different" issue is also
completely different. It should actually be better for s390: we used
to cause that "force_flush" whenever we hit a dirty shared page, and
s390 doesn't care. The new model makes that "s390 doesn't care" be
part of the whole design, so now s390 treats dirty shared pages
basically as if they weren't anything special at all. Which they
aren't on s390.

But, as with the previous case, I don't even try to cross-compile for
s390, so my attempts at handling s390 well are just that: attempts.
The code may be broken.

Of course, the code may be broken on x86-64 too, but at least there
I've compiled it and am running it right now.

Oh, and because I copied large parts of the commit message from the
previous approach (because the problem description is the same), I
noticed that I also kept the "Acked-by:". Those are bogus, because the
code is sufficiently different that any previous acks are just not
valid any more, but I just hadn't fixed that yet.

The meat of it all is in that last patch, the rest is just type system
cleanups to prep for it. But it was also that last patch that I spent
hours on just tweaking it to look sensible. The *code* was pretty
easy. Making it have sensible naming and a sensible abstraction
interface that worked well for s390 too, that was 90% of the effort.

But also I hope the end result is reasonably easy to review as a result.

I'm sending this out because I'm stepping away from the keyboard,
because that whole "let's massage it into something lagible" was
really somewhat exhausting. You don't see all the small side turns it
took only to go "that's ugly, let's try again" ;)

Anybody interested in giving this a peek?

(Patch 2/4 might make some people pause. It's fairly small and simple.
It's effective and makes it easy to do some of the later changes. And
it's also quite different from our usual model. It was "inspired" by
the signed-vs-unsigned char thread from a few weeks ago. But patch 4/4
is the one that matters).

                  Linus

View attachment "0001-mm-introduce-encoded-page-pointers-with-embedded-ext.patch" of type "text/x-patch" (2749 bytes)

View attachment "0002-mm-teach-release_pages-to-take-an-array-of-encoded-p.patch" of type "text/x-patch" (4058 bytes)

View attachment "0003-mm-mmu_gather-prepare-to-gather-encoded-page-pointer.patch" of type "text/x-patch" (3376 bytes)

View attachment "0004-mm-delay-page_remove_rmap-until-after-the-TLB-has-be.patch" of type "text/x-patch" (11173 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ