[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231120162933.GA197390@bhelgaas>
Date: Mon, 20 Nov 2023 10:29:33 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Tomasz Pala <gotar@...anet.pl>
Cc: linux-pci@...r.kernel.org,
Dan J Williams <dan.j.williams@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Tony Luck <tony.luck@...el.com>,
David E Box <david.e.box@...el.com>,
Yunying Sun <yunying.sun@...el.com>,
Dave Jiang <dave.jiang@...el.com>,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
Giovanni Cabiddu <giovanni.cabiddu@...el.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Hans de Goede <hdegoede@...hat.com>,
Florent DELAHAYE <linuxkernelml@...ead.fr>,
Konrad J Hambrick <kjhambrick@...il.com>,
Matt Hansen <2lprbe78@...k.com>,
Nicholas Johnson <nicholas.johnson-opensource@...look.com.au>,
Benoit Grégoire <benoitg@...us.ca>,
Werner Sembach <wse@...edocomputers.com>,
mumblingdrunkard@...tonmail.com, linux-kernel@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>,
Sebastian Manciulea <manciuleas@...tonmail.com>
Subject: Re: [PATCH 2/2] x86/pci: Treat EfiMemoryMappedIO as reservation of
ECAM space
On Sat, Nov 18, 2023 at 03:21:43PM +0100, Tomasz Pala wrote:
> On Thu, Nov 09, 2023 at 12:44:05 -0600, Bjorn Helgaas wrote:
>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=218050
> >>
> >> I think the problem is that the MMCONFIG region is at
> >> [mem 0x80000000-0x8fffffff], and that is *also* included in one of the
> >> host bridge windows reported via _CRS:
> >>
> >> PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
> >> pci_bus 0000:00: root bus resource [mem 0x80000000-0xfbffffff window]
> >>
> >> I'll try to figure out how to deal with that. In the meantime, would
> >> you mind attaching the contents of /proc/iomem to the bugzilla? I
> >
> > I attached a debug patch to both bugzilla entries. If you could
> > attach the "acpidump" output and (if practical) boot a kernel with the
> > debug patch and attach the dmesg logs, that would be great.
>
> I've posted the files. There are signs of buggy BIOS, but I don't expect
> any firmware update to be released for this hw anymore.
Thank you! A BIOS update is almost never the answer because even if
an update exists, we have to assume that most users in the field will
never install the update.
I want to look at the BIOS info in case we can learn about something
*Linux* is doing wrong. This most likely works fine with Windows, so
I assume Linux is doing something wrong or at least differently than
Windows.
> DMI: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.4 11/20/2019
>
> .text .data .bss are not marked as E820_TYPE_RAM!
Added by 4eea6aa581ab ("x86, mm: if kernel .text .data .bss are not
marked as E820_RAM, complain and fix"). No idea. A shame we didn't
include the .text/.data values in the message.
> tboot: non-0 tboot_addr but it is not of type E820_TYPE_RESERVED
Added by 316253406959 ("x86, intel_txt: Intel TXT boot support"). No
idea about this either.
> DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000df243000-0x00000000df251fff], contact BIOS vendor for fixes
> DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000df243000-0x00000000df251fff]
Both related to arch_rmrr_sanity_check(), added by f036c7fa0ab6
("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
and f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails
sanity check").
No idea about this one either. The VT-d spec (r1.3, sec 8.4) says
"BIOS must report the RMRR reported memory addresses as reserved in
the system memory map returned through methods such as INT15, EFI
GetMemoryMap etc."
arch_rmrr_sanity_check() only looks at your e820 map, which only has
this:
BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
BIOS-e820: [mem 0x0000000000100000-0x00000000d1f36fff] usable
I think Linux basically converts the info from EFI GetMemoryMap
to an e820 format; I think booting with "efi=debug" would show more
details of this.
Anyway, this is all a tangent.
> BTW is there a reason for this logging discrepancy?
>
> efi: Remove mem173: MMIO range=[0xe0000000-0xefffffff] (256MB) from e820 map
> efi: Not removing mem71: MMIO range=[0xe0000000-0xefffffff] (262144KB) from e820 map
>
> efi: Not removing mem74: MMIO range=[0xff000000-0xffffffff] (16384KB) from e820 map
> efi: Remove mem176: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
>
> This is arch/x86/platform/efi/efi.c:
> static void __init efi_remove_e820_mmio(void)
>
> Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) ... size >> 20
> Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) ... size >> 10
You mean the MB vs KB difference? That's my fault. I guess I used KB
for the "Not removing" message because those are smaller (< 256KB) so
the size in MB wouldn't be useful there. We could use KB for both,
but I guess I used MB for the "Remove" case because it's a little
easier to read and I expected "Not removing" to be a relatively
unusual case.
Bjorn
Powered by blists - more mailing lists