lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1wsygryw3.fsf@ebiederm.dsl.xmission.com>
Date:	Thu, 07 Jun 2007 01:45:32 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Jesse Barnes <jesse.barnes@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org,
	Justin Piszcz <jpiszcz@...idpixels.com>
Subject: Re: [PATCH] trim memory not covered by WB MTRRs

Jesse Barnes <jesse.barnes@...el.com> writes:

> On some machines, buggy BIOSes don't properly setup WB MTRRs to
> cover all available RAM, meaning the last few megs (or even gigs)
> of memory will be marked uncached.  Since Linux tends to allocate
> from high memory addresses first, this causes the machine to be
> unusably slow as soon as the kernel starts really using memory
> (i.e. right around init time).
>
> This patch works around the problem by scanning the MTRRs at
> boot and figuring out whether the current end_pfn value (setup
> by early e820 code) goes beyond the highest WB MTRR range, and
> if so, trimming it to match.  A fairly obnoxious KERN_WARNING
> is printed too, letting the user know that not all of their
> memory is available due to a likely BIOS bug.
>
> Something similar could be done on i386 if needed, but the boot
> ordering would be slightly different, since the MTRR code on i386
> depends on the boot_cpu_data structure being setup.
>
> Justin, can you please test and make sure this patch works for
> you too?  It'll only work around the problem, but it's better
> than having to do mem= by hand or waiting for a fix from your
> BIOS vendor.

Ok.  Overall this feels good but a few nits below.
Would it make sense to split this into two patches.
The first to just do the cleanup that removes the allocations
for holding the mttr ranges?

> Thanks,
> Jesse
>
> Signed-off-by:  Jesse Barnes <jesse.barnes@...el.com>
>
> diff --git a/arch/i386/kernel/cpu/mtrr/generic.c
> b/arch/i386/kernel/cpu/mtrr/generic.c
> index c4ebb51..71fc768 100644
> --- a/arch/i386/kernel/cpu/mtrr/generic.c
> +++ b/arch/i386/kernel/cpu/mtrr/generic.c
> @@ -13,7 +13,7 @@
>  #include "mtrr.h"
>
>  
>  struct mtrr_state {
> -	struct mtrr_var_range *var_ranges;
> +	struct mtrr_var_range var_ranges[NUM_VAR_RANGES];

Could we name it MAX_VAR_RANGES and not NUM_VAR_RANGES.
In practices this is going to be 8 for every cpu I know of,
so calling this NUM_VAR_RANGES may be a little confusing.

>  	mtrr_type fixed_ranges[NUM_FIXED_RANGES];
>  	unsigned char enabled;
>  	unsigned char have_fixed;
> @@ -84,12 +84,6 @@ void get_mtrr_state(void)
>  	struct mtrr_var_range *vrs;
>  	unsigned lo, dummy;
>  
> -	if (!mtrr_state.var_ranges) {
> - mtrr_state.var_ranges = kmalloc(num_var_ranges * sizeof (struct
> mtrr_var_range),
> -						GFP_KERNEL);
> -		if (!mtrr_state.var_ranges)
> -			return;
> -	} 
>  	vrs = mtrr_state.var_ranges;
>  
>  	rdmsr(MTRRcap_MSR, lo, dummy);
> diff --git a/arch/i386/kernel/cpu/mtrr/if.c b/arch/i386/kernel/cpu/mtrr/if.c
> index c7d8f17..d7922ce 100644
> --- a/arch/i386/kernel/cpu/mtrr/if.c
> +++ b/arch/i386/kernel/cpu/mtrr/if.c
> @@ -12,7 +12,7 @@
>  #include "mtrr.h"
>  
>  /* RED-PEN: this is accessed without any locking */
> -extern unsigned int *usage_table;
> +extern unsigned int usage_table[];
I think that should be:
> +extern unsigned int usage_table[NUM_VAR_RANGES];
Or even better yet the declaration moved to a header file.


>  
> +/**
> + * mtrr_trim_uncached_memory - trim RAM not covered by MTRRs
> + *
> + * Some buggy BIOSes don't setup the MTRRs properly for systems with certain
> + * memory configurations.  This routine checks to make sure the MTRRs having
> + * a write back type cover all of the memory the kernel is intending to use.
> + * If not, it'll trim any memory off the end by adjusting end_pfn, removing
> + * it from the kernel's allocation pools, warning the user with an obnoxious
> + * message.
> + */
> +void __init mtrr_trim_uncached_memory(void)
> +{
> +	unsigned long i, base, size, highest_addr = 0;
> +	mtrr_type type;
> +
> +	/* Find highest cached pfn */
> +	for (i = 0; i < num_var_ranges; i++) {
> +		mtrr_if->get(i, &base, &size, &type);
> +		if (type != MTRR_TYPE_WRBACK)
> +			continue;
> +		base <<= PAGE_SHIFT;
> +		size <<= PAGE_SHIFT;
> +		if (highest_addr < base + size)
> +			highest_addr = base + size;
> +	}

This looks like it will handle the common case, so I have no major objections
to this code.

At least in theory and possibly in practice there are a couple of corner
cases we have missed her.

- Overlapping MTRRs.
- What happens if we have uncached memory lower down?
  Except for performance problems I guess that case is relatively harmless.
- Is it possible and worth it to amend the e820 map, so it shows the
  problem area as Reserved or otherwise not usable RAM?


> +
> +	if ((highest_addr >> PAGE_SHIFT) != end_pfn) {
> +		printk(KERN_WARNING "***************\n");
> +		printk(KERN_WARNING "**** WARNING: likely BIOS bug\n");
> +		printk(KERN_WARNING "**** MTRRs don't cover all of "
> +		       "memory, trimmed %ld pages\n", end_pfn -
> +		       (highest_addr >> PAGE_SHIFT));
> +		printk(KERN_WARNING "***************\n");
> +		end_pfn = highest_addr >> PAGE_SHIFT;
> +	}
> +}
>  
>  /**
>   * mtrr_bp_init - initialize mtrrs on the boot CPU
> diff --git a/arch/i386/kernel/cpu/mtrr/mtrr.h b/arch/i386/kernel/cpu/mtrr/mtrr.h
> index 289dfe6..a29dcba 100644
> --- a/arch/i386/kernel/cpu/mtrr/mtrr.h
> +++ b/arch/i386/kernel/cpu/mtrr/mtrr.h
> @@ -14,6 +14,7 @@
>  #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
>  
>  #define NUM_FIXED_RANGES 88
> +#define NUM_VAR_RANGES 256
MAX_VAR_RANGES?

>  #define MTRRfix64K_00000_MSR 0x250
>  #define MTRRfix16K_80000_MSR 0x258
>  #define MTRRfix16K_A0000_MSR 0x259

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ