lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgvx5sDaOfCTMkVpZ9+-iFCeh5AOML5aJG1EiR0+e1aBQ@mail.gmail.com>
Date:   Sun, 6 Nov 2022 14:34:51 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Nick Piggin <npiggin@...il.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Joerg Roedel <jroedel@...e.de>,
        Uros Bizjak <ubizjak@...il.com>,
        Alistair Popple <apopple@...dia.com>,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: mm: delay rmap removal until after TLB flush

[ Editing down to just the bare-bones problem cases ]

On Sun, Nov 6, 2022 at 1:06 PM Hugh Dickins <hughd@...gle.com> wrote:
>
> anon_vma (bad)
> --------------
>
> See folio_lock_anon_vma_read(): folio_mapped() plays a key role in
> establishing the continued validity of an anon_vma.  See comments
> above folio_get_anon_vma(), some by me but most by PeterZ IIRC.
>
> I believe what has happened is that your patchset has, very intentionally,
> kept the page as "folio_mapped" until after free_pgtables() does its
> unlink_anon_vmas(); but that is telling folio_lock_anon_vma_read()
> that the anon_vma is safe to use when actually it has been freed.
> (It looked like a page table when I peeped at it.)
>
> I'm not certain, but I think that you made page_zap_pte_rmap() handle
> anon as well as file, just for the righteous additional simplification;
> but I'm afraid that (without opening a huge anon_vma refcounting can of
> worms) that unification has to be reverted, and anon left to go the
> same old way it did before.

Indeed. I made them separate initially, just because the only case
that mattered for the dirty bit was the file-mapped case.

But then the two functions ended up being basically the identical
function, so I unified them again.

But the anonvma lifetime issue looks very real, and so doing the
"delay rmap only for file mappings" seems sane.

In fact, I wonder if we should delay it only for *dirty* file
mappings, since it doesn't matter for the clean case.

Hmm.

I already threw away my branch (since Andrew picked the patches up),
so a question for Andrew: do you want me to re-do the branch entirely,
or do you want me to just send you an incremental patch?

To make for minimal changes, I'd drop the 're-unification' patch, and
then small updates to the zap_pte_range() code to keep the anon (and
possibly non-dirty) case synchronous.

And btw, this one is interesting: for anonymous (and non-dirty
file-mapped) patches, we actually can end up delaying the final page
free (and the rmap zapping) all the way to "tlb_finish_mmu()".

Normally we still have the vma's all available, but yes,
free_pgtables() can and does happen before the final TLB flush.

The file-mapped dirty case doesn't have that issue - not just because
it doesn't have an anonvma at all, but because it also does that
"force_flush" thing that just measn that the page freeign never gets
delayed that far in the first place.

> mm-unstable (bad)
> -----------------
> Aside from that PageAnon issue, mm-unstable is in an understandably bad
> state because you could not have foreseen my subpages_mapcount addition
> to page_remove_rmap().  page_zap_pte_rmap() now needs to handle the
> PageCompound (but not the "compound") case too.  I rushed you and akpm
> an emergency patch for that on Friday night, but you, let's say, had
> reservations about it.  So I haven't posted it, and while the PageAnon
> issue remains, I think your patchset has to be removed from mm-unstable
> and linux-next anyway.

So I think I'm fine with your patch, I just want to move the memcg
accounting to outside of it.

I can re-do my series on top of mm-unstable, I guess. That's probably
the easiest way to handle this all.

Andrew - can you remove those patches again, and I'll create a new
series for you?

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ