[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzisEwtcBA93Xo74RM-6X9V=_go=YhN7eFJOHLeHs3HEQ@mail.gmail.com>
Date:	Wed, 1 Oct 2014 13:20:53 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Dave Jones <davej@...hat.com>, Al Viro <viro@...iv.linux.org.uk>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Rik van Riel <riel@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	Michel Lespinasse <walken@...gle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Mel Gorman <mgorman@...e.de>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: pipe/page fault oddness.
On Wed, Oct 1, 2014 at 9:18 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> So I'd really suggest we do exactly that. Get rid of "pte_numa()"
> entirely, get rid of "_PAGE_[BIT_]NUMA" entirely, and instead add a
> "pte_protnone()" helper to check for the "protnone" case (which on x86
> is testing the _PAGE_PROTNONE bit, and on most other architectures is
> just testing that the page has no access rights).
>
> Then we throw away "pte_mknuma()" and "pte_mknonnuma()" entirely,
> because they are brainless sh*t, and we just use
>
>     ptent = ptep_modify_prot_start(mm, addr, pte);
>     ptent = pte_modify(ptent, newprot);
>     ptep_modify_prot_commit(mm, addr, pte, ptent);
>
> reliably instead (where for the mknuma case "newprot" is PROT_NONE,
> and for mknonnuma() it is vma->vm_page_prot. Yes, that means that you
> have to pass in the vma to those functions, but that just makes sense
> anyway.
>
> And if that means that we lose the numa flag on mprotect etc, nobody sane cares.
So here is a *COMPLETELY UNTESTED* and probably seriously buggy first
version of such a patch. It doesn't do the powerpc conversion, so
somebody would need to check that eventually, but aside from that
obvious issue, can people fix this up? Or comment on why it doesn't
work.
Now, I like this because it gets rid of the horrible PAGE_NUMA special
cases, but it really seems to simplify things in general. Lookie here:
     13 files changed, 74 insertions(+), 268 deletions(-)
that's really mainly just the removal of odd and broken numa pte/pmd
helper functions from <asm-generic/pgtable.h> that aren't needed any
more because the normal "change protections" functions just DTRT
automatically. Although there are actually a few other cases that got
simpler too, so it's not *just* removal of those _PAGE_NUMA-specific
helpers.
One thing this does *not* remove is the special pte locking rule in
the "change_*_range()" functions: they still take that broken
"prot_numa" argument. HOWEVER, it isn't actually used for any page
table modifications, the only reason for it existing is the hacky
locking issue (see lock_pte_protection(), and the comment about races
with the transhuge accesses).
Now, I'll be honest: this patch *migth* just work, but I expect it to
have some stupid problem. It compiles. I haven't even dared boot it,
much less try any numa benchmarks that woudln't show anything sane on
my machine anyway.
So I'm really sending this patch out in the hope that it will get
comments, fixup and possibly even testing by people who actually know
the NUMA balancing code. Rik?  Anybody?
                Linus
View attachment "patch.diff" of type "text/plain" (19594 bytes)
Powered by blists - more mailing lists
 
