[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z76APkysrjgHjgR2@casper.infradead.org>
Date: Wed, 26 Feb 2025 02:45:18 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
kernel test robot <oliver.sang@...el.com>,
Ingo Molnar <mingo@...nel.org>, x86@...nel.org
Subject: Re: [tip: x86/mm] x86/mm: Clear _PAGE_DIRTY when we clear _PAGE_RW
On Tue, Feb 25, 2025 at 01:07:08PM -0800, Linus Torvalds wrote:
> On Tue, 25 Feb 2025 at 12:10, tip-bot2 for Matthew Wilcox (Oracle)
> <tip-bot2@...utronix.de> wrote:
> >
> > We should, therefore, clear the _PAGE_DIRTY bit whenever we clear
> > _PAGE_RW. I opted to do it in the callers in case we want to use
> > __change_page_attr() to create shadow stacks inside the kernel at some
> > point in the future. Arguably, we might also want to clear _PAGE_ACCESSED
> > here.
>
> This explanation makes ZERO sense, and screams "this is a major bug" to me.
>
> If a page is dirty, it doesn't magically turn clean just because it
> becomes read-only. The dirty data remains and may need to be written
> back to memory.
Are you saying that the PTE dirty bit controls whether the CPU flushes
cache back to memory? That isn't how I understand CPUs to work.
> Imagine writing to a shared memory area, and then marking it all
> read-only after you're done. It's still dirty, even if it's read-only.
>
> Now, I don't actually expect this patch to be wrong, I'm literally
> just complaining about the explanation. Because the explanation is
> very lacking. That's particularly true for the __set_pages_np() case
> which also clears _PAGE_PRESENT, because then the whole shadow stacks
> explanation flies right out the window: the shadow stack rules simply
> do NOT APPLY to non-present pte's in the first place.
Dave and I talked about that case. We were concerned not that _this_
manipulation would lead to a shadow stack entry appearing (since the
present bit is being cleared), but that the next manipulation would just
set the present bit without setting the RW bit and we'd accidentally
end up with one.
> So honestly, I think this wants an explanation for why it's actually a
> safe change, and how the dirty bit has been saved before the
> operation.
I don't understand why the dirty bit needs to be saved. At least in
the vfree() case, we're freeing the memory so any unflushed writes to
it may as well disappear. But I don't know all the circumstances in
which these functions are called.
Powered by blists - more mailing lists