[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da212948-61ea-4eb3-b7be-4023303d850b@amd.com>
Date: Tue, 5 Dec 2023 16:17:21 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J . Wysocki" <rjw@...ysocki.net>,
linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/pci: Stop requiring MMCONFIG to be declared in E820,
ACPI or EFI for newer systems
On 12/5/2023 12:28, Mario Limonciello wrote:
> On 12/5/2023 11:31, Bjorn Helgaas wrote:
>> On Tue, Dec 05, 2023 at 11:00:31AM -0600, Mario Limonciello wrote:
>>> On 12/5/2023 10:17, Bjorn Helgaas wrote:
>>>> On Tue, Dec 05, 2023 at 09:48:45AM -0600, Mario Limonciello wrote:
>>>>> commit 7752d5cfe3d1 ("x86: validate against acpi motherboard
>>>>> resources")
>>>>> introduced checks for ensuring that MCFG table also has memory region
>>>>> reservations to ensure no conflicts were introduced from a buggy BIOS.
>>>>>
>>>>> This has proceeded over time to add other types of reservation checks
>>>>> for ACPI PNP resources and EFI MMIO memory type. The PCI firmware
>>>>> spec
>>>>> however says that these checks are only required when the operating
>>>>> system
>>>>> doesn't comprehend the firmware region:
>>>>>
>>>>> ```
>>>>> If the operating system does not natively comprehend reserving the
>>>>> MMCFG
>>>>> region, the MMCFG region must be reserved by firmware. The address
>>>>> range
>>>>> reported in the MCFG table or by _CBA method (see Section 4.1.3)
>>>>> must be
>>>>> reserved by declaring a motherboard resource. For most systems, the
>>>>> motherboard resource would appear at the root of the ACPI namespace
>>>>> (under \_SB) in a node with a _HID of EISAID (PNP0C02), and the
>>>>> resources
>>>>> in this case should not be claimed in the root PCI bus’s _CRS. The
>>>>> resources can optionally be returned in Int15 E820h or EFIGetMemoryMap
>>>>> as reserved memory but must always be reported through ACPI as a
>>>>> motherboard resource.
>>>>> ```
>>>>
>>>> My understanding is that native comprehension would mean Linux knows
>>>> how to discover and/or configure the MMCFG base address and size in
>>>> the hardware and that Linux would then reserve that region so it's not
>>>> used for anything else.
>>>>
>>>> Linux doesn't have that, at least for x86. It relies on the MCFG
>>>> table to discover the MMCFG region, and it relies on PNP0C02 _CRS to
>>>> reserve it.
>>>
>>> MCFG to discover it matches the PCI firmware spec, but as I point
>>> out above the decision to reserve this region doesn't require
>>> PNP0C01/PNP0C02 _CRS.
>>
>> Can you explain this reasoning a little more? I claim Linux does not
>> natively comprehend reserving the MMCFG region, but it sounds like you
>> don't agree? I think "native" comprehension would mean Linux would
>> not need the MCFG table.
>
> After our thread and the spec again I think you're right Linux doesn't
> natively comprehend (reserve this region;) particularly because of the
> stance you have on "static table" vs _CRS.
>
>>
>>> This is a decision made by Linux historically.
>>>
>>>>> Running this check causes problems with accessing extended PCI
>>>>> configuration space on OEM laptops that don't specify the region in
>>>>> PNP
>>>>> resources or in the EFI memory map. That later manifests as
>>>>> problems with
>>>>> dGPU and accessing resizable BAR.
>>>>
>>>> Is there a problem report we can reference here?
>>>
>>> Nothing public to share. AMD BIOS team is in discussion with the OEM
>>> to add
>>> the reservation in a BIOS upgrade so it works with things like the LTS
>>> kernels.
>>
>> Is there some reason this can't be made public (it's obviously fine to
>> redact proprietary details)? It's really hard to make this code work
>> for all the cases even when we know all the details, and practically
>> impossible if we don't.
>
> I just don't want to throw the vendor under the bus as it could have
> been caught "sooner" and fixed by BIOS adding _CRS.
>
> I'll share the full dmesg below just redacting the DMI information.
>
>>
>>> Knowing Windows works without it I feel this is still something that we
>>> should be looking at fixing from an upstream perspective though which is
>>> what prompted my patch and discussion.
>>
>> We definitely need to change Linux so it works correctly with firmware
>> in the field, whether that means fixing a Linux defect or working
>> around a firmware defect.
>>
>>>> Does the problem still occur with this series?
>>>> https://lore.kernel.org/r/20231121183643.249006-1-helgaas@kernel.org
>>>>
>>>> This appeared in linux-next 20231130.
>>>
>>> Thanks for sharing that. If I do respin a variation of this patch I'll
>>> rebase on top of that.
>>>
>>> I had a try with that series on top of 6.7-rc4, but it doesn't fix
>>> the issue
>>> (but obviously the patch I sent does).
>>>
>>> # journalctl -k | grep ECAM
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem
>>> 0xe0000000-0xefffffff]
>>> (base 0xe0000000) for domain 0000 [bus 00-ff]
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
>>> 0xe0000000-0xefffffff] not reserved)
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem
>>> 0xe0000000-0xefffffff]
>>> (base 0xe0000000) for domain 0000 [bus 00-ff]
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: [Firmware Info]: ECAM [mem
>>> 0xe0000000-0xefffffff] not reserved in ACPI motherboard resources
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
>>> 0xe0000000-0xefffffff] not reserved)
>>
>> Can you boot with 'efi=debug dyndbg="file arch/x86/pci +p"' and share
>> the complete dmesg log (redacted if necessary) somewhere? It's
>> important to know more about why and how this doesn't work. I added
>> more debug logging, but possibly it's still not enough.
>
> Here you go (6.7-rc4 + that series you linked):
> https://gist.github.com/superm1/eca87ae661793b9ab969829946adb084
>
>>
>>>>> Similar problems don't exist in Windows 11 with exact same
>>>>> laptop/firmware stack, and in discussion with AMD's BIOS team
>>>>> Windows doesn't have similar checks.
>>>>
>>>> I would love to know AMD BIOS team's take on this. Does the BIOS
>>>> reserve the MMCFG space in any way?
>>>
>>> On the AMD reference platform this OEM system is based on it is
>>> reserved in
>>> the EFI memory map. So on a 6.7 based kernel the reference system
>>> you can
>>> see this emitted:
>>>
>>> PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved as
>>> EfiMemoryMappedIO
>>
>> The EfiMemoryMappedIO entry is not a *reservation* (this was a poor
>> choice of words in the logging, and my series changes it). This entry
>> only means the firmware requests that the OS map this region to a
>> virtual address so it can be used by EFI runtime services (UEFI v2.9,
>> sec 7.2).
>
> In that sense the only reason this works on the AMD reference platform
> is because that region happens to have been reserved from a subset of
> another region.
>
> Per the stance on "static table", we should advocate for _CRS to be
> populated with MCFG on AMD reference platform too, right?
>
>>
>>> But on the OEM system this is not reserved by EFI memory map or _CRS.
>>>
>>> That's why my assumption after reading the firmware spec and seeing the
>>> behavior is that Windows makes the reservation *based on* what's in
>>> MCFG.
>>
>> Is there some spec language that says MCFG reserves space? I'm not
>> aware of anything about ACPI static tables reserving MMIO space.
>> Here's my reasoning around static tables vs _CRS for reservations:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.6#n32
>
> Reading your stance it makes sense more of why we're where we are now.
>
> Let me ask though - why does the distinction of old OS vs new OS matter?
> If a vendor wants it to work with a kernel that didn't use MCFG to make
> a reservation _CRS or some other overlapping reservation is their only
> option.
>
> But if we changed this behavior in a newer kernel then the stance can be
> something like:
> "upstream kernel 6.8 or newer will reserve MCFG if not specified by _CRS
> or any other overlapping reservation"
> and
> "upstream kernel 6.7 or older require explicit reservations".
>
> It seems to me that this type of issue would entirely go away in most
> cases and it would satisfy the spec note about
> 'natively comprehend' reserving the MMCFG region.
>
>
I don't think this should be any surprise, but this patch on top of your
series fixes the issue on that system.
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 0cc9520666ef..6a77441565e2 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -571,8 +571,6 @@ static void __init pci_mmcfg_reject_broken(int early)
if (!pci_mmcfg_reserved(NULL, cfg, early)) {
pr_info("not using ECAM (%pR not reserved)\n",
&cfg->res);
- free_all_mmcfg();
- return;
}
}
}
And from what I can tell this *does* make a "reservation".
Specifically because pci_mmcfg_late_insert_resources() uses
insert_resource() to put it in place. I would expect if something else
tries to request that region later it would get a conflict.
Powered by blists - more mailing lists