[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d91910d-2676-477a-be62-5365afd199bb@amd.com>
Date: Thu, 14 Dec 2023 15:45:43 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
"Rafael J . Wysocki" <rjw@...ysocki.net>, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
Len Brown <lenb@...nel.org>, Robert Moore <robert.moore@...el.com>
Subject: Re: [PATCH] x86/pci: Stop requiring MMCONFIG to be declared in E820,
ACPI or EFI for newer systems
On 12/14/2023 14:43, Bjorn Helgaas wrote:
> [+cc Len, Robert in case I'm missing something about static tables
> reserving address space]
>
> On Tue, Dec 05, 2023 at 12:28:44PM -0600, Mario Limonciello wrote:
>> On 12/5/2023 11:31, Bjorn Helgaas wrote:
>>> On Tue, Dec 05, 2023 at 11:00:31AM -0600, Mario Limonciello wrote:
>>>> On 12/5/2023 10:17, Bjorn Helgaas wrote:
>>>>> On Tue, Dec 05, 2023 at 09:48:45AM -0600, Mario Limonciello wrote:
>>>>>> commit 7752d5cfe3d1 ("x86: validate against acpi motherboard
>>>>>> resources") introduced checks for ensuring that MCFG table
>>>>>> also has memory region reservations to ensure no conflicts
>>>>>> were introduced from a buggy BIOS.
>>>>>>
>>>>>> This has proceeded over time to add other types of
>>>>>> reservation checks for ACPI PNP resources and EFI MMIO
>>>>>> memory type. The PCI firmware spec however says that these
>>>>>> checks are only required when the operating system doesn't
>>>>>> comprehend the firmware region:
>>>>>>
>>>>>> ``` If the operating system does not natively comprehend
>>>>>> reserving the MMCFG region, the MMCFG region must be
>>>>>> reserved by firmware. The address range reported in the MCFG
>>>>>> table or by _CBA method (see Section 4.1.3) must be reserved
>>>>>> by declaring a motherboard resource. For most systems, the
>>>>>> motherboard resource would appear at the root of the ACPI
>>>>>> namespace (under \_SB) in a node with a _HID of EISAID
>>>>>> (PNP0C02), and the resources in this case should not be
>>>>>> claimed in the root PCI bus’s _CRS. The resources can
>>>>>> optionally be returned in Int15 E820h or EFIGetMemoryMap as
>>>>>> reserved memory but must always be reported through ACPI as
>>>>>> a motherboard resource. ```
>>>>>
>>>>> My understanding is that native comprehension would mean Linux
>>>>> knows how to discover and/or configure the MMCFG base address
>>>>> and size in the hardware and that Linux would then reserve
>>>>> that region so it's not used for anything else.
>>>>>
>>>>> Linux doesn't have that, at least for x86. It relies on the
>>>>> MCFG table to discover the MMCFG region, and it relies on
>>>>> PNP0C02 _CRS to reserve it.
>>>>
>>>> MCFG to discover it matches the PCI firmware spec, but as I
>>>> point out above the decision to reserve this region doesn't
>>>> require PNP0C01/PNP0C02 _CRS.
>>>
>>> Can you explain this reasoning a little more? I claim Linux does
>>> not natively comprehend reserving the MMCFG region, but it sounds
>>> like you don't agree? I think "native" comprehension would mean
>>> Linux would not need the MCFG table.
>>
>> After our thread and the spec again I think you're right Linux
>> doesn't natively comprehend (reserve this region;) particularly
>> because of the stance you have on "static table" vs _CRS.
>
> ["My stance" refers to this:
>
> Static tables like MCFG, HPET, ECDT, etc., are *not* mechanisms for
> reserving address space. The static tables are for things the OS
> needs to know early in boot, before it can parse the ACPI namespace.
> If a new table is defined, an old OS needs to operate correctly even
> though it ignores the table. _CRS allows that because it is generic
> and understood by the old OS; a static table does not.
>
> from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.6#n32]
>
> I don't think this is just my stance. The ACPI spec could be clearer
> in terms of requiring PNP0C02 devices, not static tables, to reserve
> address space, but I think that requirement is a logical consequence
> of the ACPI design.
>
> It's a goal of ACPI that an OS we release today should run on a
> platform released tomorrow. If the new platform uses a static table
> to reserve address space used by some new hardware, today's OS doesn't
> know about it and could place another device on top of it.
>
> Using _CRS of an ACPI device to reserve the new hardware address space
> is different because it works even with today's OS. Today's OS can't
> *operate* tomorrow's hardware, but at least it won't create address
> conflicts with it.
>
>> I just don't want to throw the vendor under the bus as it could have
>> been caught "sooner" and fixed by BIOS adding _CRS.
>
> The MCFG requirement for PNP0C02 _CRS has been in the PCI Firmware
> spec since r3.0 in 2005. I'm surprised that vendors still get this
> wrong.
Probably worth mentioning with the clairvoyance of the root cause of the
issue that prompted this conversation I've now discovered another system
with the exact same problem. It's a different OEM, different generation
of hardware and a different IBV that they use for their firmware.
I've also looked through the BIOS for a variety of reference designs and
I don't see a _CRS entry in any of them.
I'm fairly certain we're just getting lucky in Linux on a lot of devices
that the region is often overlapping with a region for EFI runtime services.
> Vendors definitely have an interest in making shipping OSes
> boot unchanged on new hardware.
At least the OEMs that I talk to use FWTS. FWTS catches this issue, but
it's marked as LOW. Everyone fixates on the HIGH or CRITICAL.
Given the severity of what I've seen it can do to a system I'm proposing
FWTS to move it to HIGH:
https://lists.ubuntu.com/archives/fwts-devel/2023-December/013772.html
>
>>>> Knowing Windows works without it I feel this is still something that we
>>>> should be looking at fixing from an upstream perspective though which is
>>>> what prompted my patch and discussion.
>
> The fact that Windows works doesn't mean the firmware is correct.
Of course. But it also means it's worth looking at the semantics of why
Windows works.
> Linux assigns PCI BARs from the bottom up, and ECAM is often at the
> bottom of a host bridge aperture.
>
> Windows assigns PCI BARs from the top down, so even without a _CRS
> reservation for the ECAM space, Windows is much less likely to put
> something on top of it.
>
> Bjorn
I guess I'm naïve in that I don't know how exactly to check what Windows
*really* does (gets lucky from top down or actually reserves from MCFG).
But I do feel the OS doing it from top down vs bottom up is tangential
to the decision of whether or not make a reservation in Linux for
something you know about without a _CRS entry.
I can push AMD's reference designs to improve, I can push OEMs I talk to
improve and I can try to influence FWTS but I have to ask:
What is the actual *harm* in just using this MCFG table to make a
reservation when there isn't a PNP0C02 _CRS region declared?
At worst (a buggy BIOS) you would end up with hole in the memory map
that isn't usable for devices. At best you end up with more working
devices without changing the firmware.
Powered by blists - more mailing lists