lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200630121300.GO5180@lahna.fi.intel.com>
Date:   Tue, 30 Jun 2020 15:13:00 +0300
From:   Mika Westerberg <mika.westerberg@...ux.intel.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Borislav Petkov <bp@...en8.de>, x86@...nel.org,
        Benoit Grégoire <benoitg@...us.ca>,
        juhapekka.heikkila@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/resource: Do not exclude regions that are marked as
 MMIO in EFI memmap

On Fri, Jun 26, 2020 at 05:43:44PM -0500, Bjorn Helgaas wrote:
> On Mon, Mar 02, 2020 at 05:14:51PM +0300, Mika Westerberg wrote:
> > Commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> > space") made the resource allocation code to avoid all regions that are
> > in E820 table. This prevents the kernel to assign MMIO resources to
> > regions that may be real RAM for example.
> > 
> > However, at least with Lenovo Yoca C940 and S740 this causes problems
> > when allocating resources for PCIe devices behind Thunderbolt port(s).
> > 
> > On Yoga S740 the E820 table contains an entry like this:
> > 
> >   BIOS-e820: [mem 0x000000002bc50000-0x00000000cfffffff] reserved
> > 
> > and ACPI _CRS method for the host bridge returns these windows:
> > 
> >   pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> >   pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> >   pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> >   pci_bus 0000:00: root bus resource [mem 0x45400000-0xbfffffff window]
> >   pci_bus 0000:00: root bus resource [mem 0x4000000000-0x7fffffffff window]
> > 
> > Note that the 0x45400000-0xbfffffff entry is also included in the E820
> > table and marked as "reserved".
> > 
> > When Thunderbolt device is connected and the PCIe gets tunneled PCI core
> > tries to allocate memory for the new devices but it fails because all
> > the resources are inside this reserved region so arch_remove_reservations()
> > clips them which makes the resource assignment fail as in below log:
> > 
> >   pci 0000:00:07.0: PCI bridge to [bus 01-2a]
> >   pci 0000:00:07.0:   bridge window [mem 0x46000000-0x521fffff]
> >   pci 0000:00:07.0:   bridge window [mem 0x6000000000-0x601bffffff 64bit pref]
> >   ...
> >   pci 0000:02:04.0: bridge window [mem 0x00100000-0x001fffff 64bit pref] to [bus 07-2a] add_size 100000 add_align 100000
> >   pci 0000:02:04.0: bridge window [mem 0x00100000-0x001fffff] to [bus 07-2a] add_size 100000 add_align 100000
> >   pci 0000:01:00.0: bridge window [mem 0x00100000-0x005fffff 64bit pref] to [bus 02-2a] add_size 100000 add_align 100000
> >   pci 0000:01:00.0: bridge window [mem 0x00100000-0x005fffff] to [bus 02-2a] add_size 100000 add_align 100000
> >   pci 0000:01:00.0: bridge window [io  0x1000-0x5fff] shrunken by 0x0000000000004000
> >   pci 0000:01:00.0: bridge window [mem 0x00100000-0x005fffff] extended by 0x000000000bd00000
> >   pci 0000:01:00.0: bridge window [mem 0x00100000-0x005fffff 64bit pref] extended by 0x000000001bb00000
> >   pci 0000:02:04.0: bridge window [mem 0x00100000-0x001fffff] extended by 0x000000000bd00000
> >   pci 0000:02:04.0: bridge window [mem 0x00100000-0x001fffff 64bit pref] extended by 0x000000001bb00000
> >   pci 0000:01:00.0: BAR 8: no space for [mem size 0x0c200000]
> >   pci 0000:01:00.0: BAR 8: failed to assign [mem size 0x0c200000]
> >   pci 0000:01:00.0: BAR 9: assigned [mem 0x6000000000-0x601bffffff 64bit pref]
> >   pci 0000:01:00.0: BAR 7: assigned [io  0x4000-0x4fff]
> > 
> > The 01:00.0 is the upstream port of the PCIe switch that is connected to
> > the PCIe root port (00:07.1) over Thunderbolt link.
> > 
> > If I add "efi=debug" to the command line I can see that the EFI memory
> > map actually contains several entries:
> > 
> >   [Reserved           |   |  |  |  |  |  |  | |   |WB|WT|WC|UC] range=[0x000000002bc50000-0x000000003fffffff] (323MB)
> >   [Reserved           |   |  |  |  |  |  |  | |   |WB|  |  |UC] range=[0x0000000040000000-0x0000000040ffffff] (16MB)
> >   [Reserved           |   |  |  |  |  |  |  | |   |  |  |  |  ] range=[0x0000000041000000-0x00000000453fffff] (68MB)
> >   [Memory Mapped I/O  |RUN|  |  |  |  |  |  | |   |  |  |  |UC] range=[0x0000000045400000-0x00000000cfffffff] (2220MB)
> > 
> > I think the EFI stub merges these consecutive entries into that single
> > E820 entry showed above. The last region marked as EFI_MEMORY_MAPPED_IO
> > actually covers the PCI host bridge window entirely. However, since
> > there is corresponding E820 type for this it is simply marked as
> > E820_TYPE_RESERVED.
> 
> Is this supposed to say "since there is *no* corresponding E820 type"?

Yes, I think it makes more sense that way :)

> I guess this refers to setup_e820()?  I see the switch case there that
> sets e820_type to E820_TYPE_RESERVED for EFI_MEMORY_MAPPED_IO.
> 
> I'm really not familiar with the EFI stub, but I guess that when
> booting with EFI, we basically build our own E820 table based on the
> EFI memory map, then use that constructed E820 table to initialize
> iomem_resource via e820__reserve_resources()?
> 
> I wonder why we don't just use the EFI memory map to initialize
> iomem_resource directly.  It seems like the EFI map -> E820 table ->
> iomem_resource path adds complication that makes issues like this more
> likely.
>
> IIUC, this patch allows Linux to use EFI_MEMORY_MAPPED_IO for PCI MMIO
> space, i.e., if "avail" is in a EFI_MEMORY_MAPPED_IO region, we don't
> need to consider it "reserved" and we can put PCI resources there.
> 
> UEFI v2.8, Table 30 (doc p 160, PDF p 233) says about
> EfiMemoryMappedIO that
> 
>   before ExitBootServices():
> 
>     Used by system firmware to request that a memory-mapped IO region
>     be mapped by the OS to a virtual address so it can be accessed by
>     EFI runtime services.
> 
>   after ExitBootServices():
> 
>     This memory is not used by the OS. All system memory-mapped IO
>     information should come from ACPI tables.
> 
> I would read that as basically saying that the OS must virtually map
> EfiMemoryMappedIO for use by EFI runtime services (I don't know if
> Linux uses any of those).
> 
> The "after ExitBootServices()" part doesn't seem very clear to me.
> "This memory is not used by the OS" isn't quite stated as a
> requirement for the OS.  And "All .. MMIO information should come from
> ACPI tables" sounds like the OS should *ignore* this EFI map in favor
> of ACPI.

It is not clear to me either and I'm not 100% sure the patch here is the
correct thing to do. I think we may need some help from x86 maintainers
for a proper fix.

> So I don't know how to fix this.  I just feel like this patch adds
> complication to an already very messy flow.  We may have to do that,
> but it always makes me sad ;)
> 
> > All in all, I think we can fix this by modifying arch_remove_reservations()
> > to check the EFI type as well and if it is EFI_MEMORY_MAPPED_IO skip the
> > clipping in that case.
> > 
> > Reported-by: Benoit Grégoire <benoitg@...us.ca>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> > Signed-off-by: Mika Westerberg <mika.westerberg@...ux.intel.com>
> > ---
> >  arch/x86/kernel/resource.c | 34 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> > index 9b9fb7882c20..c0bc9117dd7d 100644
> > --- a/arch/x86/kernel/resource.c
> > +++ b/arch/x86/kernel/resource.c
> > @@ -1,4 +1,5 @@
> >  // SPDX-License-Identifier: GPL-2.0
> > +#include <linux/efi.h>
> >  #include <linux/ioport.h>
> >  #include <asm/e820/api.h>
> >  
> > @@ -36,6 +37,36 @@ static void remove_e820_regions(struct resource *avail)
> >  	}
> >  }
> >  
> > +#ifdef CONFIG_EFI
> > +static bool efi_mmio_mem(const struct resource *avail)
> > +{
> > +	resource_size_t start, end;
> > +	efi_memory_desc_t desc;
> > +
> > +	if (!efi_enabled(EFI_MEMMAP) ||
> > +	    efi_mem_desc_lookup(avail->start, &desc))
> > +		return false;
> > +
> > +	start = desc.phys_addr;
> > +	end = desc.phys_addr + (desc.num_pages << EFI_PAGE_SHIFT) - 1;
> > +
> > +	/*
> > +	 * No need to clip the resource if it is fully contained in an
> > +	 * EFI memory mapped region.
> > +	 */
> > +	if (avail->start >= start && avail->end <= end &&
> > +	    desc.type == EFI_MEMORY_MAPPED_IO)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +#else
> > +static inline bool efi_mmio_mem(const struct resource *avail)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> >  void arch_remove_reservations(struct resource *avail)
> >  {
> >  	/*
> > @@ -46,6 +77,7 @@ void arch_remove_reservations(struct resource *avail)
> >  	if (avail->flags & IORESOURCE_MEM) {
> >  		resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
> >  
> > -		remove_e820_regions(avail);
> > +		if (!efi_mmio_mem(avail))
> > +			remove_e820_regions(avail);
> >  	}
> >  }
> > -- 
> > 2.25.0
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ