[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<IA2PR22MB5609866886BA191F64D5049D90E0A@IA2PR22MB5609.namprd22.prod.outlook.com>
Date: Tue, 7 Oct 2025 17:37:56 +0000
From: Filip Barczyk <filip.barczyk@...o.net>
To: Yazen Ghannam <yazen.ghannam@....com>, "Mario Limonciello (AMD)
(kernel.org)" <superm1@...nel.org>
CC: "x86@...nel.org" <x86@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] x86/amd_node: Fix AMD root device caching
On Tue, Oct 01, 2025 at 03:46 PM +0200, Yazen Ghannam wrote:
> On Tue, Sep 30, 2025 at 01:07:47PM -0500, Mario Limonciello (AMD) (kernel.org) wrote:
> >
> >
> > On 9/30/2025 11:45 AM, Yazen Ghannam wrote:
> > > Recent AMD node rework removed the "search and count" method of caching
> > > AMD root devices. This depended on the value from a Data Fabric register
> > > that was expected to hold the PCI bus of one of the root devices
> > > attached to that fabric.
> > >
> > > However, this expectation is incorrect. The register, when read from PCI
> > > config space, returns the bitwise-OR of the buses of all attached root
> > > devices.
> > >
> > > This behavior is benign on AMD reference design boards, since the bus
> > > numbers are aligned. This results in a bitwise-OR value matching one of
> > > the buses. For example, 0x00 | 0x40 | 0xA0 | 0xE0 = 0xE0.
> > >
> > > This behavior breaks on boards where the bus numbers are not exactly
> > > aligned. For example, 0x00 | 0x07 | 0xE0 | 0x15 = F.
> > >
> > > The bus numbering style in the reference boards is not a requirement.
> > > The numbering found in other boards is not incorrect. Therefore, the
> > > root device caching method needs to be adjusted.
> > >
> > > Go back to the "search and count" method used before the recent rework.
> > > Search for root devices using PCI class code rather than fixed PCI IDs.
> > >
> > > This keeps the goal of the rework (remove dependency on PCI IDs) while
> > > being able to support various board designs.
> > >
> > > Fixes: 40a5f6ffdfc8 ("x86/amd_nb: Simplify root device search")
> >
> > Was this a publicly reported failure?
> >
> > If so is there a link to LKML or a Bugzilla with the details of the failure
> > you can include here?
> >
> No, it was reported off-list.
> Thanks,
> Yazen
I confirm that an issue when EDAC fails to enumerate DIMMs on Dell PowerEdge R7725, 2x EPYC 9475F @ kernel 6.14.3 is fixed with this patch.
Thanks,
Filip
________________________________
Pico Quantitative Trading LLC ("PQT"). This e-mail (including any attachments) is intended only for use by the addressee(s) named above, and may contain confidential, proprietary or legally privileged information. If you are not the intended recipient of this e-mail, any review, use, disclosure, dissemination, distribution, printing or copying of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please notify Pico immediately by return e-mail and permanently delete the original from your system and any hard copy printout thereof. E-mails are not encrypted and cannot be guaranteed to be secure or error-free and, as with all Internet communications, information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Accordingly, Pico accepts no liability for any errors or omissions in the content contained herein. In compliance with applicable laws, rules and regulations and/or at its discretion, Pico may review and archive incoming and outgoing e-mail communications, copies of which may be produced at the request of regulators.
Powered by blists - more mailing lists