lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Dec 2023 16:17:21 -0600
From:   Mario Limonciello <mario.limonciello@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        linux-pci@...r.kernel.org, linux-acpi@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/pci: Stop requiring MMCONFIG to be declared in E820,
 ACPI or EFI for newer systems

On 12/5/2023 12:28, Mario Limonciello wrote:
> On 12/5/2023 11:31, Bjorn Helgaas wrote:
>> On Tue, Dec 05, 2023 at 11:00:31AM -0600, Mario Limonciello wrote:
>>> On 12/5/2023 10:17, Bjorn Helgaas wrote:
>>>> On Tue, Dec 05, 2023 at 09:48:45AM -0600, Mario Limonciello wrote:
>>>>> commit 7752d5cfe3d1 ("x86: validate against acpi motherboard 
>>>>> resources")
>>>>> introduced checks for ensuring that MCFG table also has memory region
>>>>> reservations to ensure no conflicts were introduced from a buggy BIOS.
>>>>>
>>>>> This has proceeded over time to add other types of reservation checks
>>>>> for ACPI PNP resources and EFI MMIO memory type.  The PCI firmware 
>>>>> spec
>>>>> however says that these checks are only required when the operating 
>>>>> system
>>>>> doesn't comprehend the firmware region:
>>>>>
>>>>> ```
>>>>> If the operating system does not natively comprehend reserving the 
>>>>> MMCFG
>>>>> region, the MMCFG region must be reserved by firmware. The address 
>>>>> range
>>>>> reported in the MCFG table or by _CBA method (see Section 4.1.3) 
>>>>> must be
>>>>> reserved by declaring a motherboard resource. For most systems, the
>>>>> motherboard resource would appear at the root of the ACPI namespace
>>>>> (under \_SB) in a node with a _HID of EISAID (PNP0C02), and the 
>>>>> resources
>>>>> in this case should not be claimed in the root PCI bus’s _CRS. The
>>>>> resources can optionally be returned in Int15 E820h or EFIGetMemoryMap
>>>>> as reserved memory but must always be reported through ACPI as a
>>>>> motherboard resource.
>>>>> ```
>>>>
>>>> My understanding is that native comprehension would mean Linux knows
>>>> how to discover and/or configure the MMCFG base address and size in
>>>> the hardware and that Linux would then reserve that region so it's not
>>>> used for anything else.
>>>>
>>>> Linux doesn't have that, at least for x86.  It relies on the MCFG
>>>> table to discover the MMCFG region, and it relies on PNP0C02 _CRS to
>>>> reserve it.
>>>
>>> MCFG to discover it matches the PCI firmware spec, but as I point
>>> out above the decision to reserve this region doesn't require
>>> PNP0C01/PNP0C02 _CRS.
>>
>> Can you explain this reasoning a little more?  I claim Linux does not
>> natively comprehend reserving the MMCFG region, but it sounds like you
>> don't agree?  I think "native" comprehension would mean Linux would
>> not need the MCFG table.
> 
> After our thread and the spec again I think you're right Linux doesn't 
> natively comprehend (reserve this region;) particularly because of the 
> stance you have on "static table" vs _CRS.
> 
>>
>>> This is a decision made by Linux historically.
>>>
>>>>> Running this check causes problems with accessing extended PCI
>>>>> configuration space on OEM laptops that don't specify the region in 
>>>>> PNP
>>>>> resources or in the EFI memory map. That later manifests as 
>>>>> problems with
>>>>> dGPU and accessing resizable BAR.
>>>>
>>>> Is there a problem report we can reference here?
>>>
>>> Nothing public to share. AMD BIOS team is in discussion with the OEM 
>>> to add
>>> the reservation in a BIOS upgrade so it works with things like the LTS
>>> kernels.
>>
>> Is there some reason this can't be made public (it's obviously fine to
>> redact proprietary details)?  It's really hard to make this code work
>> for all the cases even when we know all the details, and practically
>> impossible if we don't.
> 
> I just don't want to throw the vendor under the bus as it could have 
> been caught "sooner" and fixed by BIOS adding _CRS.
> 
> I'll share the full dmesg below just redacting the DMI information.
> 
>>
>>> Knowing Windows works without it I feel this is still something that we
>>> should be looking at fixing from an upstream perspective though which is
>>> what prompted my patch and discussion.
>>
>> We definitely need to change Linux so it works correctly with firmware
>> in the field, whether that means fixing a Linux defect or working
>> around a firmware defect.
>>
>>>> Does the problem still occur with this series?
>>>> https://lore.kernel.org/r/20231121183643.249006-1-helgaas@kernel.org
>>>>
>>>> This appeared in linux-next 20231130.
>>>
>>> Thanks for sharing that.  If I do respin a variation of this patch I'll
>>> rebase on top of that.
>>>
>>> I had a try with that series on top of 6.7-rc4, but it doesn't fix 
>>> the issue
>>> (but obviously the patch I sent does).
>>>
>>> # journalctl -k | grep ECAM
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem 
>>> 0xe0000000-0xefffffff]
>>> (base 0xe0000000) for domain 0000 [bus 00-ff]
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
>>> 0xe0000000-0xefffffff] not reserved)
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: ECAM [mem 
>>> 0xe0000000-0xefffffff]
>>> (base 0xe0000000) for domain 0000 [bus 00-ff]
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: [Firmware Info]: ECAM [mem
>>> 0xe0000000-0xefffffff] not reserved in ACPI motherboard resources
>>> Dec 05 06:37:46 cl-fw-fedora kernel: PCI: not using ECAM ([mem
>>> 0xe0000000-0xefffffff] not reserved)
>>
>> Can you boot with 'efi=debug dyndbg="file arch/x86/pci +p"' and share
>> the complete dmesg log (redacted if necessary) somewhere?  It's
>> important to know more about why and how this doesn't work.  I added
>> more debug logging, but possibly it's still not enough.
> 
> Here you go (6.7-rc4 + that series you linked):
> https://gist.github.com/superm1/eca87ae661793b9ab969829946adb084
> 
>>
>>>>> Similar problems don't exist in Windows 11 with exact same
>>>>> laptop/firmware stack, and in discussion with AMD's BIOS team
>>>>> Windows doesn't have similar checks.
>>>>
>>>> I would love to know AMD BIOS team's take on this.  Does the BIOS
>>>> reserve the MMCFG space in any way?
>>>
>>> On the AMD reference platform this OEM system is based on it is 
>>> reserved in
>>> the EFI memory map.  So on a 6.7 based kernel the reference system 
>>> you can
>>> see this emitted:
>>>
>>> PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved as 
>>> EfiMemoryMappedIO
>>
>> The EfiMemoryMappedIO entry is not a *reservation* (this was a poor
>> choice of words in the logging, and my series changes it).  This entry
>> only means the firmware requests that the OS map this region to a
>> virtual address so it can be used by EFI runtime services (UEFI v2.9,
>> sec 7.2).
> 
> In that sense the only reason this works on the AMD reference platform 
> is because that region happens to have been reserved from a subset of 
> another region.
> 
> Per the stance on "static table", we should advocate for _CRS to be 
> populated with MCFG on AMD reference platform too, right?
> 
>>
>>> But on the OEM system this is not reserved by EFI memory map or _CRS.
>>>
>>> That's why my assumption after reading the firmware spec and seeing the
>>> behavior is that Windows makes the reservation *based on* what's in 
>>> MCFG.
>>
>> Is there some spec language that says MCFG reserves space?  I'm not
>> aware of anything about ACPI static tables reserving MMIO space.
>> Here's my reasoning around static tables vs _CRS for reservations:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/PCI/acpi-info.rst?id=v6.6#n32
> 
> Reading your stance it makes sense more of why we're where we are now.
> 
> Let me ask though - why does the distinction of old OS vs new OS matter?
> If a vendor wants it to work with a kernel that didn't use MCFG to make 
> a reservation _CRS or some other overlapping reservation is their only 
> option.
> 
> But if we changed this behavior in a newer kernel then the stance can be
> something like:
> "upstream kernel 6.8 or newer will reserve MCFG if not specified by _CRS 
> or any other overlapping reservation"
> and
> "upstream kernel 6.7 or older require explicit reservations".
> 
> It seems to me that this type of issue would entirely go away in most 
> cases and it would satisfy the spec note about
> 'natively comprehend' reserving the MMCFG region.
> 
> 

I don't think this should be any surprise, but this patch on top of your 
series fixes the issue on that system.

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 0cc9520666ef..6a77441565e2 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -571,8 +571,6 @@ static void __init pci_mmcfg_reject_broken(int early)
                 if (!pci_mmcfg_reserved(NULL, cfg, early)) {
                         pr_info("not using ECAM (%pR not reserved)\n",
                                 &cfg->res);
-                       free_all_mmcfg();
-                       return;
                 }
         }
  }

And from what I can tell this *does* make a "reservation".
Specifically because pci_mmcfg_late_insert_resources() uses 
insert_resource() to put it in place.  I would expect if something else 
tries to request that region later it would get a conflict.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ