[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjzngbbwHw4nAsqo_RpyOtUDk5G+Wus=O0w0A6goHvBWA@mail.gmail.com>
Date: Sat, 29 Oct 2022 11:36:53 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jann Horn <jannh@...gle.com>,
John Hubbard <jhubbard@...dia.com>, X86 ML <x86@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
kernel list <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
jroedel@...e.de, ubizjak@...il.com,
Alistair Popple <apopple@...dia.com>
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
On Sat, Oct 29, 2022 at 11:05 AM Nadav Amit <nadav.amit@...il.com> wrote:
>
> Anyhow, I am not sure whether the solution that you propose would work.
> Please let me know if my understanding makes sense.
Let me include my (UNTESTED! PROBABLY GARBAGE!) patch here just as a
"I meant something like this".
Note that it's untested, but I tried to make it intentionally use
different names and an opaque type in the 'mmu_gather' data structure
so that at least these bit games end up being type-safe and more
likely to be correct if it compiles.
And I will say that it does compile for me - but only in my normal
x86-64 config. I haven't tested anything else, and I guarantee it
won't build on s390, for example, because s390 has its own mmu_gather
functions.
> Let’s assume that we do not call set_page_dirty() before we remove the rmap
> but only after we invalidate the page [*]. Let’s assume that
> shrink_page_list() is called after the page’s rmap is removed and the page
> is no longer mapped, but before set_page_dirty() was actually called.
>
> In such a case, shrink_page_list() would consider the page clean, and would
> indeed keep the page (since __remove_mapping() would find elevated page
> refcount), which appears to give us a chance to mark the page as dirty
> later.
Right. That is not different to any other function (like "write()"
having looked up the page.
> However, IIUC, in this case shrink_page_list() might still call
> filemap_release_folio() and release the buffers, so calling set_page_dirty()
> afterwards - after the actual TLB invalidation took place - would fail.
I'm not seeing why.
That would imply that any "look up page, do set_page_dirty()" is
broken. They don't have rmap either. And we have a number of them all
over (eg think "GUP users" etc).
But hey, you were right about the stale TLB case, so I may just be
missing something.
I *think* the important thing is just that we need to mark the page
dirtty from the page tables _after_ we've flushed the TLB, just to
make sure that there can be no subsequent dirtiers that then get lost.
Anyway, I think the best documentation for "this is what I meant" is
simply the patch. Does this affect your PoC on your setup?
I tried to run your program on my machine (WITHOUT this patch - I have
compiled this patch, but I haven't even booted it - it really could be
horrible broken).
But it doesn't fail for me even without the patch and I just get
"Finished without an error" over and over again - but you said it had
to be run on a RAM block device, which I didn't do, so my testing is
very suspect,.
Again: THIS PATCH IS UNTESTED. I feel like it might actually work, if
only because I tried to be so careful with the type system.
But that "might actually work" is probably me being wildly optimistic,
and also depends on the whole concept being ok in the first place.
So realistically, think of this patch more as a "document in code what
Linus meant with his incoherent ramblings" rather than anything else.
Hmm?
Linus
View attachment "patch.diff" of type "text/x-patch" (6722 bytes)
Powered by blists - more mailing lists