[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whPCKbhw7+XGBgJ0bM-5QShV90T_yqy2DWp-uPPReTOrw@mail.gmail.com>
Date: Wed, 23 Dec 2020 15:39:53 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Yu Zhao <yuzhao@...gle.com>, Andy Lutomirski <luto@...nel.org>,
Peter Xu <peterx@...hat.com>,
Nadav Amit <nadav.amit@...il.com>,
linux-mm <linux-mm@...ck.org>,
lkml <linux-kernel@...r.kernel.org>,
Pavel Emelyanov <xemul@...nvz.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
stable <stable@...r.kernel.org>,
Minchan Kim <minchan@...nel.org>,
Will Deacon <will@...nel.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli <aarcange@...hat.com> wrote:
>
> On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
> > Thanks for the details.
>
> I hope we can find a way put the page_mapcount back where there's a
> page_count right now.
I really don't think that's ever going to happen - at least if you're
talking about do_wp_page().
I refuse to make the *simple* core operations VM have to jump through
hoops, and the old COW mapcount logic was that. I *much* prefer the
newer COW code, because the rules are more straightforward.
> page_count is far from optimal, but it is a feature it finally allowed
> us to notice that various code (clear_refs_write included apparently
> even after the fix) leaves stale too permissive TLB entries when it
> shouldn't.
I absolutely agree that page_count isn't exactly optimal, but
"mapcount" is just so much worse.
page_count() is at least _logical_, and has a very clear meaning:
"this page has other users". mapcount() means something else, and is
simply not sufficient or relevant wrt COW.
That doesn't mean that page_mapcount() is wrong - it's just that it's
wrong for COW. page_mapcount() is great for rmap, so that we can see
when we need to shoot down a memory mapping of a page that gets
released (truncate being the classic example).
I think that the mapcount games we used to have were horrible. I
absolutely much prefer where we are now wrt COW.
The modern rules for COW handling are:
- if we got a COW fault there might be another user, we copy (and
this is where the page count makes so much logical sense).
- if somebody needs to pin the page in the VM, we either make sure
that it is pre-COWed and we
(a) either never turn it back into a COW page (ie the fork()-time
stuff we have for pinned pages)
(b) or there is some explicit marker on the page or in the page
table (ie the userfaultfd_pte_wp thing).
those are _so_ much more straightforward than the very complex rules
we used to have that were actively buggy, in addition to requiring the
page lock. So they were buggy and slow.
And yes, I had forgotten about that userfaultfd_pte_wp() because I was
being myopic and only looking at wp_page_copy(). So using that as a
way to make sure that a page doesn't go through COW is a good way to
avoid the COW race, but I think that thing requires a bit in the page
table which might be a problem on some architectures?
Linus
Powered by blists - more mailing lists