[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54ac8a9e-e846-42eb-b09e-76e781a4375f@linux.intel.com>
Date: Wed, 14 Jan 2026 13:19:46 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: Zide Chen <zide.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>, Eranian Stephane <eranian@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Xudong Hao <xudong.hao@...el.com>, Falcon Thomas <thomas.falcon@...el.com>,
Steve Wahl <steve.wahl@....com>
Subject: Re: [PATCH V2 2/2] perf/x86/intel/uncore: Fix die ID init and look up
bugs
On 1/14/2026 4:56 AM, Zide Chen wrote:
> In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path,
> uncore_device_to_die() may return -1 when all CPUs associated
> with the UBOX device are offline.
>
> Remove the WARN_ON_ONCE(die_id == -1) check for two reasons:
>
> - The current code breaks out of the loop. This is incorrect because
> pci_get_device() does not guarantee iteration in domain or bus order,
> so additional UBOX devices may be skipped during the scan.
>
> - Returning -EINVAL is incorrect, since marking offline buses with
> die_id == -1 is expected and should not be treated as an error.
>
> Separately, when NUMA is disabled on a NUMA-capable platform,
> pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die()
> to return -1 for all PCI devices. As a result,
> spr_update_device_location(), used on Intel SPR and EMR, ignores the
> corresponding PMON units and does not add them to the RB tree.
>
> Fix this by using uncore_pcibus_to_dieid(), which retrieves topology
> from the UBOX GIDNIDMAP register and works regardless of whether NUMA
> is enabled in Linux. This requires snbep_pci2phy_map_init() to be
> added in spr_uncore_pci_init().
>
> Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where
> NUMA is expected to be enabled.
>
> Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info")
> Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR")
> Tested-by: Steve Wahl <steve.wahl@....com>
> Signed-off-by: Zide Chen <zide.chen@...el.com>
> ---
> V2:
> - Fix the commit message to note that spr_update_device_location() is
> used by EMR, not GNR.
> - Rewrite the commit message for clarity.
> - Add a Tested-by tag.
>
> arch/x86/events/intel/uncore.c | 1 +
> arch/x86/events/intel/uncore_snbep.c | 13 ++++++-------
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index c126a29ab729..c721042be629 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die)
> return bus ? pci_domain_nr(bus) : -EINVAL;
> }
>
> +/* Note: This API can only be used when NUMA information is available. */
> int uncore_device_to_die(struct pci_dev *dev)
Not everyone could look at the comment and follow the rule. Could we add a
WARN_ON in this function and WARN the users if it's not used appropriately?
Others look good to me.
> {
> int node = pcibus_to_node(dev->bus);
> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
> index 7ca0429c4004..52dec34d18c4 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -1459,13 +1459,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool
> }
>
> map->pbus_to_dieid[bus] = die_id = uncore_device_to_die(ubox_dev);
> -
> raw_spin_unlock(&pci2phy_map_lock);
> -
> - if (WARN_ON_ONCE(die_id == -1)) {
> - err = -EINVAL;
> - break;
> - }
> }
> }
>
> @@ -6420,7 +6414,7 @@ static void spr_update_device_location(int type_id)
>
> while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
>
> - die = uncore_device_to_die(dev);
> + die = uncore_pcibus_to_dieid(dev->bus);
> if (die < 0)
> continue;
>
> @@ -6444,6 +6438,11 @@ static void spr_update_device_location(int type_id)
>
> int spr_uncore_pci_init(void)
> {
> + int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
> +
> + if (ret)
> + return ret;
> +
> /*
> * The discovery table of UPI on some SPR variant is broken,
> * which impacts the detection of both UPI and M3UPI uncore PMON.
Powered by blists - more mailing lists