lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160721081438.GA26531@gmail.com>
Date:	Thu, 21 Jul 2016 10:14:38 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Andy Lutomirski <luto@...nel.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Mario Limonciello <mario_limonciello@...l.com>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: [PATCH] x86/boot: Reorganize and clean up the BIOS area reservation
 code


* Andy Lutomirski <luto@...nel.org> wrote:

> Under some conditions, my Dell XPS 13 9350 puts the EBDA at 0x2c000
> but reports the lowmem cutoff as 0.  The old code reserves
> everything above 0x2c000 and I can't boot [1].

> [1] This only breaks boot in practice when some other firmware or
>     GRUB oddity that I don't fully understand kicks in causing the
>     memory below 0x2c000 to be unusable.

Exactly why can't Linux boot if *more* memory is reserved? Is it perhaps the SMP 
trampoline that cannot be allocated? Or does this shift some firmware memory 
corruption pattern that makes it more lethal than it already is?

Without more information we don't really know the root cause here and I'm hesitant 
to touch such really old quirks without first understanding what's going on on 
your machine.

Is the boot failure deterministic - if yes, could you try to dig a bit more into 
this?

My guess it's the SMP trampoline, and I think we should robustify that in a 
different way: lets put it aside very early as a reservation (possibly in this 
very function), to guarantee that we have a below 1MB buffer for the SMP 
trampoline. This would be a lot more robust ...

> --- a/arch/x86/kernel/ebda.c
> +++ b/arch/x86/kernel/ebda.c
> @@ -62,10 +62,12 @@ void __init reserve_ebda_region(void)
>  	if (lowmem < INSANE_CUTOFF)
>  		lowmem = LOWMEM_CAP;
>  
> -	/* Use the lower of the lowmem and EBDA markers as the cutoff */
> -	lowmem = min(lowmem, ebda_addr);
>  	lowmem = min(lowmem, LOWMEM_CAP); /* Absolute cap */
>  
>  	/* reserve all memory between lowmem and the 1MB mark */
>  	memblock_reserve(lowmem, 0x100000 - lowmem);
> +
> +	/* if the EBDA is in lowmem, reserve it separately. */
> +	if (ebda_addr < lowmem)
> +		memblock_reserve(ebda_addr, 4096);

Ugh, this code is a mess, and your fix does not help make it better:

- Firstly, could we please organize this code so that we first calculate ebda_addr
  and then lowmem - not this interleaved mess of first lowmem, then ebda_addr, 
  then lowmem tweaks...

- Also, could we please rename things to make it more obvious what's going on, and 
  then apply changes on top? Here's a list of problems:

- 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other 
  parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it 
  super confusing to read.

- It does not help at all that we have various memory range markers, half of which 
  are 'start of range', half of which are 'end of range' - but this crucial 
  property is not obvious in the naming at all ... gave me a headache trying to 
  understand all this.

- Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is 
  obvious, all values here are addresses!), while it does not highlight that it's 
  the start of the EBDA region ...

- 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
  that is a pointer to a value, not a memory range address.

- The function name itself is a misnomer: it says 'reserve_ebda_region()' while
  its main purpose is to reserve all the firmware ROM typically between 640K and
  1MB, while the 'EBDA' part is only a small part of that ... It should be renamed 
  to something like reserve_bios_regions(). (The plural is intentional, to signal
  that this is potentially about multiple memory regions.)

- Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this 
  too should be about whether to reserve firmware areas in the paravirt case. 
  Something like ::reserve_bios_regions would work for me.

- In fact thinking about this as 'end of RAM' is confusing: what this function 
  *really* wants to reserve is firmware data and code areas! Once the thinking is 
  inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 
  'reserved area' notion everything becomes a lot clearer.

So something like this, names ordered in (probable) address order:

	BIOS_RAM_SIZE_KB_PTR		// was: BIOS_LOWMEM_KILOBYTES

	BIOS_RAM_END_MIN		// was: INSANE_CUTOFF

	ebda_start			// was: ebda_addr

	bios_ram_end			// was: lowmem

	BIOS_RAM_END_MAX		// was: LOWMEM_CAP

would work for me. Also the comments could need some fixing, refreshing and 
general harmonization.

Blah - the patch below does this all (untested). Just look how much more readable 
it all became!

In fact I believe we should try another patch on top of this and massively 
simplify the whole EBDA mess:

 - Find a suitable address for the SMP trampoline and remember it for the
   SMP boot code.

 - Reserve *ALL* of the first 2MB of RAM.

 - (are there any other pieces of kernel code that require memory below 1MB/2MB?)

The thing is, x86 CPUs are already treating the first 1MB of RAM in a special way: 
for example 2MB TLBs get split up into 4K TLBs regardless of what the pagetable or 
the MTRRs say. So it's a substandard piece of RAM already that we want to skip for 
performance reasons.

Furthermore we had various problems in the past where firmware corrupted the first 
1MB of RAM.

So could we please just get rid of this class of problems and reserve it all?

The only thing we need from there is a suitable place for the SMP trampoline. We 
should try to allocate that somewhere in the middle of the (suspected...) 
available RAM range based on BIOS_RAM_SIZE_KB_PTR (no EBDA quirking at this 
point), where the probability of firmware interference is the lowest.

If the EBDA pointer happens to point to the exact same location we should bump the 
trampoline address down so that it's below the EBDA. (and should generate a 
low-key KERN_INFO printk that we've done so.)

This strategy should solve your boot problem too I believe.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ