lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230914170726.4am7xi36m4hdgiyk@box>
Date:   Thu, 14 Sep 2023 20:07:26 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Ard Biesheuvel <ardb@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Aaron Lu <aaron.lu@...el.com>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        Tom Lendacky <thomas.lendacky@....com>, x86@...nel.org,
        kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
        regressions@...ts.linux.de
Subject: Re: [PATCH] x86/boot/compressed: Reserve more memory for page tables

On Thu, Sep 14, 2023 at 08:51:50AM -0700, Dave Hansen wrote:
> On 9/14/23 05:30, Kirill A. Shutemov wrote:
> > +/*
> > + * Total number of page table kernel_add_identity_map() can allocate,
> > + * including page tables consumed by startup_32().
> > + */
> > +# define BOOT_PGT_SIZE		(32*4096)
> 
> I agree that needing to know this in advance *exactly* is troublesome.
> 
> But I do think that we should preserve the comment about the worst-case
> scenario.

Want me to send v2 for that?

> Also, I thought this was triggered by unaccepted memory.  Am
> I remembering it wrong?  How was it in play?

Unaccepted memory touched EFI system table. I was able to reproduce
without unaccepted memory enabled: if get_rsdp_addr() takes
efi_get_rsdp_addr() path. So it is not the root cause, just a trigger.

So we need several things to run into the problem:

- System supports 5-level paging and it is enabled;

- Decompressor takes control in 64-bit mode, so it uses page tables
  inherited from bootloader until initialize_identity_maps().

  In initialize_identity_maps() kernel resets page tables, rebuilding them
  from scratch. Here we only map what is definitely required: kernel,
  cmdline, boot_patams, setup_data.

  Entering in 32-bit mode would make startup_32() map the first 4G
  unconditionally, but in this setup we rely more on #PF to fill page
  table. It masks problem as we rarely need all four PMD tables.

- Make kernel touch at least one page per-gigabyte in the first 4G.

  In our case, unaccepted memory path was the last straw: it triggered
  allocation of the fourth PMD table which failed. 

We can increase the constant by one and it will work as long as nobody
need anything beyond the first 4G (or any 1G-aligned 4G region where we've
got loaded, I guess). I am not sure we can guarantee this with
(potentially buggy) ACPI and EFI in the picture.

> Either way, I think your general approach here is sound.  But let's add
> one little tweak to at least warn when we're getting close to the limit.

Yeah, makes sense.


-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ