[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <47678198-C502-47E1-B7C8-8A12352CDA95@gmail.com>
Date: Sat, 29 Oct 2022 11:05:12 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jann Horn <jannh@...gle.com>,
John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
kernel list <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
jroedel@...e.de, ubizjak@...il.com,
Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
On Oct 28, 2022, at 5:42 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> I think the proper fix (or at least _a_ proper fix) would be to
> actually carry the dirty bit along to the __tlb_remove_page() point,
> and actually treat it exactly the same way as the page pointer itself
> - set the page dirty after the TLB flush, the same way we can free the
> page after the TLB flush.
>
> We could easiy hide said dirty bit in the low bits of the
> "batch->pages[]" array or something like that. We'd just have to add
> the 'dirty' argument to __tlb_remove_page_size() and friends.
Thank you for your quick response. I was slow to respond due to a jet lag.
Anyhow, I am not sure whether the solution that you propose would work.
Please let me know if my understanding makes sense.
Let’s assume that we do not call set_page_dirty() before we remove the rmap
but only after we invalidate the page [*]. Let’s assume that
shrink_page_list() is called after the page’s rmap is removed and the page
is no longer mapped, but before set_page_dirty() was actually called.
In such a case, shrink_page_list() would consider the page clean, and would
indeed keep the page (since __remove_mapping() would find elevated page
refcount), which appears to give us a chance to mark the page as dirty
later.
However, IIUC, in this case shrink_page_list() might still call
filemap_release_folio() and release the buffers, so calling set_page_dirty()
afterwards - after the actual TLB invalidation took place - would fail.
> Your idea of "do the page_remove_rmap() late instead" would also work,
> but the reason I think just squirrelling away the dirty bit is the
> "proper" fix is that it would get rid of the whole need for
> 'force_flush' in this area entirely. So we'd not only fix that race
> you noticed, we'd actually do so and reduce the number of TLB flushes
> too.
I’m all for reducing the number of TLB flushes, and your solution does sound
better in general. I proposed something that I considered having the path of
least resistance (i.e., least chance of breaking something). I can do what
you propsosed, but I am not sure how to deal with the buffers being removed.
One more note: This issue, I think, also affects migrate_vma_collect_pmd().
Alistair recently addressed an issue there, but in my prior feedback to him
I missed this issue.
[*] Note that this would be for our scenario pretty much the same if we also
called set_page_dirty() before removing the rmap, but the page was cleaned
while the TLB invalidation has still not been performed.
Powered by blists - more mailing lists