lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZgWX_x-CB7OjKAGD@swahl-home.5wahls.com>
Date: Thu, 28 Mar 2024 11:17:03 -0500
From: Steve Wahl <steve.wahl@....com>
To: Steve Wahl <steve.wahl@....com>, Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
        Linux regressions mailing list <regressions@...ts.linux.dev>,
        Pavin Joseph <me@...injoseph.com>, stable@...r.kernel.org,
        Eric Hagberg <ehagberg@...il.com>
Cc: Simon Horman <horms@...ge.net.au>, Eric Biederman <ebiederm@...ssion.com>,
        Dave Young <dyoung@...hat.com>, Sarah Brofeldt <srhb@....dk>,
        Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>,
        Hou Wenlong <houwenlong.hwl@...group.com>,
        Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>,
        Yuntao Wang <ytcoode@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only
 where full GB page should be mapped.

Note: I cc:'d stable in the email headers by mistake.  NO CC: stable
tag, I don't want this to go into stable.

Thanks,

--> Steve

On Thu, Mar 28, 2024 at 11:06:14AM -0500, Steve Wahl wrote:
> When ident_pud_init() uses only gbpages to create identity maps, large
> ranges of addresses not actually requested can be included in the
> resulting table; a 4K request will map a full GB.  On UV systems, this
> ends up including regions that will cause hardware to halt the system
> if accessed (these are marked "reserved" by BIOS).  Even processor
> speculation into these regions is enough to trigger the system halt.
> And MTRRs cannot be used to restrict this speculation, there are not
> enough MTRRs to cover all the reserved regions.
> 
> The fix for that would be to only use gbpages when map creation
> requests include the full GB page of space, and falling back to using
> smaller 2M pages when only portions of a GB page are included in the
> request.
> 
> But on some other systems, possibly due to buggy bios, that solution
> leaves some areas out of the identity map that are needed for kexec to
> succeed.  It is believed that these areas are not marked properly for
> map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and
> map them.  The nogbpages kernel command line option also causes these
> systems to fail even without these changes.
> 
> So, create kexec identity maps using full GB pages on all platforms
> but UV; on UV, use narrower 2MB pages in the identity map where a full
> GB page would include areas outside the region requested.
> 
> No attempt is made to coalesce mapping requests. If a request requires
> a map entry at the 2M (pmd) level, subsequent mapping requests within
> the same 1G region will also be at the pmd level, even if adjacent or
> overlapping such requests could have been combined to map a full
> gbpage.  Existing usage starts with larger regions and then adds
> smaller regions, so this should not have any great consequence.
> 
> Signed-off-by: Steve Wahl <steve.wahl@....com>
> 
> Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
> Reported-by: Pavin Joseph <me@...injoseph.com>
> Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
> Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@hpe.com/
> Tested-by: Pavin Joseph <me@...injoseph.com>
> Tested-by: Eric Hagberg <ehagberg@...il.com>
> Tested-by: Sarah Brofeldt <srhb@....dk>
> ---
> 
> v4: Incorporate fix for regression on systems relying on gbpages
>     mapping more than the ranges actually requested for successful
>     kexec, by limiting the effects of the change to UV systems.
>     This patch based on tip/x86/urgent.
> 
> v3: per Dave Hansen review, re-arrange changelog info,
>     refactor code to use bool variable and split out conditions.
> 
> v2: per Dave Hansen review: Additional changelog info,
>     moved pud_large() check earlier in the code, and
>     improved the comment describing the conditions
>     that restrict gbpage usage.
>    
> 
>  arch/x86/include/asm/init.h        |  1 +
>  arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++
>  arch/x86/mm/ident_map.c            | 24 +++++++++++++++++++-----
>  3 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index cc9ccf61b6bd..371d9faea8bc 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -10,6 +10,7 @@ struct x86_mapping_info {
>  	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
>  	unsigned long offset;		 /* ident mapping offset */
>  	bool direct_gbpages;		 /* PUD level 1GB page support */
> +	bool direct_gbpages_only;	 /* use 1GB pages exclusively */
>  	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>  };
>  
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index b180d8e497c3..3a2f5d291a88 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -28,6 +28,7 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  #include <asm/cpu.h>
> +#include <asm/uv/uv.h>
>  
>  #ifdef CONFIG_ACPI
>  /*
> @@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  
>  	if (direct_gbpages)
>  		info.direct_gbpages = true;
> +	/*
> +	 * UV systems need restrained use of gbpages in the identity
> +	 * maps to avoid system halts.  But some other systems rely on
> +	 * using gbpages to expand mappings outside the regions
> +	 * actually listed, to include areas required for kexec but
> +	 * not explicitly named by the bios.
> +	 */
> +	if (!is_uv_system())
> +		info.direct_gbpages_only = true;
>  
>  	for (i = 0; i < nr_pfn_mapped; i++) {
>  		mstart = pfn_mapped[i].start << PAGE_SHIFT;
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..a538a54aba5d 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  	for (; addr < end; addr = next) {
>  		pud_t *pud = pud_page + pud_index(addr);
>  		pmd_t *pmd;
> +		bool use_gbpage;
>  
>  		next = (addr & PUD_MASK) + PUD_SIZE;
>  		if (next > end)
>  			next = end;
>  
> -		if (info->direct_gbpages) {
> -			pud_t pudval;
> +		/* if this is already a gbpage, this portion is already mapped */
> +		if (pud_leaf(*pud))
> +			continue;
> +
> +		/* Is using a gbpage allowed? */
> +		use_gbpage = info->direct_gbpages;
>  
> -			if (pud_present(*pud))
> -				continue;
> +		if (!info->direct_gbpages_only) {
> +			/* Don't use gbpage if it maps more than the requested region. */
> +			/* at the beginning: */
> +			use_gbpage &= ((addr & ~PUD_MASK) == 0);
> +			/* ... or at the end: */
> +			use_gbpage &= ((next & ~PUD_MASK) == 0);
> +		}
> +		/* Never overwrite existing mappings */
> +		use_gbpage &= !pud_present(*pud);
> +
> +		if (use_gbpage) {
> +			pud_t pudval;
>  
> -			addr &= PUD_MASK;
>  			pudval = __pud((addr - info->offset) | info->page_flag);
>  			set_pud(pud, pudval);
>  			continue;
> 
> base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7
> -- 
> 2.26.2
> 

-- 
Steve Wahl, Hewlett Packard Enterprise

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ