[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgvx5sDaOfCTMkVpZ9+-iFCeh5AOML5aJG1EiR0+e1aBQ@mail.gmail.com>
Date: Sun, 6 Nov 2022 14:34:51 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>,
Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
Nick Piggin <npiggin@...il.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Nadav Amit <nadav.amit@...il.com>,
Jann Horn <jannh@...gle.com>,
John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
kernel list <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Joerg Roedel <jroedel@...e.de>,
Uros Bizjak <ubizjak@...il.com>,
Alistair Popple <apopple@...dia.com>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: mm: delay rmap removal until after TLB flush
[ Editing down to just the bare-bones problem cases ]
On Sun, Nov 6, 2022 at 1:06 PM Hugh Dickins <hughd@...gle.com> wrote:
>
> anon_vma (bad)
> --------------
>
> See folio_lock_anon_vma_read(): folio_mapped() plays a key role in
> establishing the continued validity of an anon_vma. See comments
> above folio_get_anon_vma(), some by me but most by PeterZ IIRC.
>
> I believe what has happened is that your patchset has, very intentionally,
> kept the page as "folio_mapped" until after free_pgtables() does its
> unlink_anon_vmas(); but that is telling folio_lock_anon_vma_read()
> that the anon_vma is safe to use when actually it has been freed.
> (It looked like a page table when I peeped at it.)
>
> I'm not certain, but I think that you made page_zap_pte_rmap() handle
> anon as well as file, just for the righteous additional simplification;
> but I'm afraid that (without opening a huge anon_vma refcounting can of
> worms) that unification has to be reverted, and anon left to go the
> same old way it did before.
Indeed. I made them separate initially, just because the only case
that mattered for the dirty bit was the file-mapped case.
But then the two functions ended up being basically the identical
function, so I unified them again.
But the anonvma lifetime issue looks very real, and so doing the
"delay rmap only for file mappings" seems sane.
In fact, I wonder if we should delay it only for *dirty* file
mappings, since it doesn't matter for the clean case.
Hmm.
I already threw away my branch (since Andrew picked the patches up),
so a question for Andrew: do you want me to re-do the branch entirely,
or do you want me to just send you an incremental patch?
To make for minimal changes, I'd drop the 're-unification' patch, and
then small updates to the zap_pte_range() code to keep the anon (and
possibly non-dirty) case synchronous.
And btw, this one is interesting: for anonymous (and non-dirty
file-mapped) patches, we actually can end up delaying the final page
free (and the rmap zapping) all the way to "tlb_finish_mmu()".
Normally we still have the vma's all available, but yes,
free_pgtables() can and does happen before the final TLB flush.
The file-mapped dirty case doesn't have that issue - not just because
it doesn't have an anonvma at all, but because it also does that
"force_flush" thing that just measn that the page freeign never gets
delayed that far in the first place.
> mm-unstable (bad)
> -----------------
> Aside from that PageAnon issue, mm-unstable is in an understandably bad
> state because you could not have foreseen my subpages_mapcount addition
> to page_remove_rmap(). page_zap_pte_rmap() now needs to handle the
> PageCompound (but not the "compound") case too. I rushed you and akpm
> an emergency patch for that on Friday night, but you, let's say, had
> reservations about it. So I haven't posted it, and while the PageAnon
> issue remains, I think your patchset has to be removed from mm-unstable
> and linux-next anyway.
So I think I'm fine with your patch, I just want to move the memcg
accounting to outside of it.
I can re-do my series on top of mm-unstable, I guess. That's probably
the easiest way to handle this all.
Andrew - can you remove those patches again, and I'll create a new
series for you?
Linus
Powered by blists - more mailing lists