[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.9999.0711160921510.4260@woody.linux-foundation.org>
Date: Fri, 16 Nov 2007 09:35:03 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Jeremy Fitzhardinge <jeremy@...p.org>
cc: William Lee Irwin III <wli@...omorphy.com>,
Andi Kleen <ak@...e.de>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
Nick Piggin <nickpiggin@...oo.com.au>,
"H. Peter Anvin" <hpa@...or.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Why preallocate pmd in x86 32-bit PAE?
On Fri, 16 Nov 2007, Jeremy Fitzhardinge wrote:
> >
> > IIRC, the present bit is ignored in the magic 4-entry PGD. All entries
> > have to be present.
>
> Hm, do you recall what processors that might affect? As far as I know,
> current processors will ignore non-present top-level entries.
Are you sure?
Anyway, this is not worth making a distinction for. Just pre-allocate all
of them. There really is just 4 PGD entries, and it really *is* different
from having a full three-level page table, and of the four PGD entries:
- one is used for the kernel mapping (assuming the regular 1:3 layout)
- AT LEAST two are required by user space anyway
so pre-allocating is never going to waste more than one page.
And you may feel that pre-allocating is a special case, but it's an
*easier* special case than the one that you are apparently thinking about
(which is to special-case according to CPU version).
So don't do it. Just preallocate for the magic 4-entry PGD. You can make
the special case just be something like
/* Preallocate for small PGD's */
#if PTRS_PER_PGD == 4
for (i = 0; i < USER_PTRS_PER_PGD; i++) {
pmd_t *pmd = pmd_alloc();
set_pgd(pgd+i, __pgd(PAGE_PRESENT | __pa(pmd));
}
#endif
or similar.
There is absolutely *zero* reason not to do this, and there is also zero
reason to make this be a "32-bit vs 64-bit" issue. The code can be there
in both, and the #if could even be all in C code (ie there may be reasons
to prefer writing it as
/* The old-style PAE PGD needs to be preallocated */
if (USER_PTRS_PER_PGD <= 4) {
...
}
and the compiler should even compile it away entirely for all practical
cases even without using the preprocessor.
> Anyway, we can point them not present to empty_zero_page, so testing the
> present bit will still be sufficient to tell if we need to allocate a
> new pmd, but if the hardware decides to follow the page reference
> there's no harm done. (Hm, unless the hardware decides it wants to set
> A or D bits in empty_zero_page for some reason...)
x86 page table walking never sets A/D bits on non-present entries.
That said, there's still a huge difference.
For "real" page table walking, you can always just insert entries without
flushing the cache if those entries weren't there before (because the TLB
is supposed to not cache negative entries).
Again, because of the way the mahic 4-entry PGD works, that isn't true for
it. It caches the entries regardless, so if you change it from non-present
to present, you have to flush the TLB (well, "reload %cr3", which is the
same thing in practice, although it's for a different *reason*).
> That just means we need to reload cr3 after populating the pgd with a
> new pmd, right?
BUT ONLY FOR THIS CASE!
And if you preallocate it, you make *that* special case go away.
So you're going to have special cases regardless. Do the simple and
really straightforward one, please! Nothing subtle.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists