linux-kernel - Re: [PATCH] x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eae782dc-17da-4d2b-9840-f2b027d5b192@intel.com>
Date:   Thu, 2 Nov 2023 09:02:44 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Steve Wahl <steve.wahl@....com>, rja_direct@...ups.int.hpe.com,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mm/ident_map: Use gbpages only where full GB page
 should be mapped.

On 10/31/23 12:50, Steve Wahl wrote:
> Instead of using gbpages for all memory regions, use them only when
> map creation requests include the full GB page of space; descend to
> using smaller 2M pages when only portions of a GB page are requested.
...

The logic here is sound: we obviously don't want systems rebooting
because speculation went wonky.

> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..b63a1ffcfe9f 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -31,18 +31,26 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  		if (next > end)
>  			next = end;
>  
> -		if (info->direct_gbpages) {
> +		/*
> +		 * if gbpages allowed, this entry not yet present, and
> +		 * the full gbpage range is requested (both ends are
> +		 * correctly aligned), create a gbpage.
> +		 */
> +		if (info->direct_gbpages
> +		    && !pud_present(*pud)
> +		    && !(addr & ~PUD_MASK)
> +		    && !(next & ~PUD_MASK)) {

This is a _bit_ too subtle for my taste.

Let's say someone asks for mapping of 2MB@5G, then later asks for 1G@5G.
 The first call in here will do the 2MB mapping with small (pud)
entries.  The second call will see the new pud_present() check and
*also* skip large pud creation.

That's a regression.  It might not matter _much_, but it's a place where
the old code creates large PUDs and the new code creates small ones.
It's the kind of behavior change that at least needs to be noted in the
changelog.

Off the top of my head, I can't think of why we'd get overlapping
requests in here, though.  Did you think through that case?  Is it common?

>  			pud_t pudval;
>  
> -			if (pud_present(*pud))
> -				continue;
> -
> -			addr &= PUD_MASK;
>  			pudval = __pud((addr - info->offset) | info->page_flag);
>  			set_pud(pud, pudval);
>  			continue;
>  		}
>  
> +		/* if this is already a gbpage, this portion is already mapped */
> +		if (pud_large(*pud))
> +			continue;

I'd probably move this check up to above the large PUD creation code.
It would make it more obvious that any PUD that's encountered down below
is a small PUD.

>  		if (pud_present(*pud)) {
>  			pmd = pmd_offset(pud, 0);
>  			ident_pmd_init(info, pmd, addr, next);

That would end up looking something like this:

	bool do_gbpages = true;
	...

	// Is the whole range already mapped?
	if (pud_large(*pud))
		continue;

	/* PUD is either empty or small */

	// GB pages allowed?
	do_gbpages &= info->direct_gbpages;
	// Addresses aligned for GB pages?
	do_gbpages &= ~(addr & ~PUD_MASK);
	do_gbpages &= ~(next & ~PUD_MASK);
	// Check for existing mapping using a small PUD
	do_gbpages &= !pud_present(*pud);

	if (do_gbpages) {
		set_pud(pud, __pud((addr - info->offset) |
					info->page_flag));
		continue
	}