lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.0.9999.0711160921510.4260@woody.linux-foundation.org>
Date:	Fri, 16 Nov 2007 09:35:03 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
cc:	William Lee Irwin III <wli@...omorphy.com>,
	Andi Kleen <ak@...e.de>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Why preallocate pmd in x86 32-bit PAE?



On Fri, 16 Nov 2007, Jeremy Fitzhardinge wrote:
> >
> > IIRC, the present bit is ignored in the magic 4-entry PGD.  All entries 
> > have to be present.
> 
> Hm, do you recall what processors that might affect?  As far as I know,
> current processors will ignore non-present top-level entries.

Are you sure?

Anyway, this is not worth making a distinction for. Just pre-allocate all 
of them. There really is just 4 PGD entries, and it really *is* different 
from having a full three-level page table, and of the four PGD entries:

 - one is used for the kernel mapping (assuming the regular 1:3 layout)
 - AT LEAST two are required by user space anyway

so pre-allocating is never going to waste more than one page.

And you may feel that pre-allocating is a special case, but it's an 
*easier* special case than the one that you are apparently thinking about 
(which is to special-case according to CPU version).

So don't do it. Just preallocate for the magic 4-entry PGD. You can make 
the special case just be something like

	/* Preallocate for small PGD's */
	#if PTRS_PER_PGD == 4
		for (i = 0; i < USER_PTRS_PER_PGD; i++) {
			pmd_t *pmd = pmd_alloc();
			set_pgd(pgd+i, __pgd(PAGE_PRESENT | __pa(pmd));
		}	
	#endif

or similar. 

There is absolutely *zero* reason not to do this, and there is also zero 
reason to make this be a "32-bit vs 64-bit" issue. The code can be there 
in both, and the #if could even be all in C code (ie there may be reasons 
to prefer writing it as

	/* The old-style PAE PGD needs to be preallocated */
	if (USER_PTRS_PER_PGD <= 4) {
		...
	}

and the compiler should even compile it away entirely for all practical 
cases even without using the preprocessor.

> Anyway, we can point them not present to empty_zero_page, so testing the 
> present bit will still be sufficient to tell if we need to allocate a 
> new pmd, but if the hardware decides to follow the page reference 
> there's no harm done.  (Hm, unless the hardware decides it wants to set 
> A or D bits in empty_zero_page for some reason...)

x86 page table walking never sets A/D bits on non-present entries.

That said, there's still a huge difference. 

For "real" page table walking, you can always just insert entries without 
flushing the cache if those entries weren't there before (because the TLB 
is supposed to not cache negative entries). 

Again, because of the way the mahic 4-entry PGD works, that isn't true for 
it. It caches the entries regardless, so if you change it from non-present 
to present, you have to flush the TLB (well, "reload %cr3", which is the 
same thing in practice, although it's for a different *reason*).

> That just means we need to reload cr3 after populating the pgd with a
> new pmd, right?

BUT ONLY FOR THIS CASE!

And if you preallocate it, you make *that* special case go away. 

So you're going to have special cases regardless. Do the simple and 
really straightforward one, please! Nothing subtle.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ