linux-kernel - Re: [PATCH v2 1/2] RAS: Introduce AMD Address Translation Library

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f65db953-ebf6-488b-94ac-2bf65d2a982d@amd.com>
Date:   Mon, 16 Oct 2023 09:47:09 -0400
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     yazen.ghannam@....com, linux-edac@...r.kernel.org,
        tony.luck@...el.com, linux-kernel@...r.kernel.org,
        avadhut.naik@....com, john.allen@....com, william.roche@...cle.com
Subject: Re: [PATCH v2 1/2] RAS: Introduce AMD Address Translation Library

On 10/11/23 1:35 PM, Borislav Petkov wrote:
> On Thu, Oct 05, 2023 at 12:35:25PM -0500, Yazen Ghannam wrote:
>> AMD Zen-based systems report memory errors through Machine Check banks
>> representing Unified Memory Controllers (UMCs). The address value
>> reported for DRAM ECC errors is a "normalized address" that is relative
>> to the UMC. This normalized address must be converted to a system
>> physical address to be usable by the OS.
>>
>> Support for this address translation was introduced to the MCA subsystem
>> with Zen1 systems. The code was later moved to the AMD64 EDAC module,
>> since this was the only user of the code at the time.
>>
>> However, there are uses for this translation outside of EDAC. The system
>> physical address can be used in MCA for preemptive page offlining as done
>> in some MCA notifier functions. Also, this translation is needed as the
>> basis of similar functionality needed for some CXL configurations on AMD
>> systems.
>>
>> Introduce a common address translation library that can be used for
>> multiple subsystems including MCA, EDAC, and CXL.
>>
>> Include support for UMC normalized to system physical address
>> translation for current CPU systems.
>>
>> Future development to include:
>> - DF4.5 Non-power-of-2 interleaving modes.
>> - Heterogeneous CPU+GPU system support.
>> - CXL translation support.
>> - Caching of common intermediate values and results.
>> - Leverage UEFI PRM methods as alternate backends to existing native
>>   code.
> 
> That last preview-of-things-to-come paragraph can go. Not needed here.
>

Ack.

>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 35977b269d5e..5a286cb8e7f1 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -891,6 +891,13 @@ Q:	https://patchwork.kernel.org/project/linux-rdma/list/
>>  F:	drivers/infiniband/hw/efa/
>>  F:	include/uapi/rdma/efa-abi.h
>>  
>> +AMD ADDRESS TRANSLATION LIBRARY (ATL)
>> +M:	Yazen Ghannam <Yazen.Ghannam@....com>
>> +L:	linux-edac@...r.kernel.org
>> +S:	Supported
> 
> "Maintained" right?
> 
> Otherwise
> 
> ./scripts/get_maintainer.pl -f drivers/ras/amd/atl/access.c
> Yazen Ghannam <Yazen.Ghannam@....com> (supporter:AMD ADDRESS TRANSLATION LIBRARY (ATL))
> Tony Luck <tony.luck@...el.com> (maintainer:RAS INFRASTRUCTURE)
> Borislav Petkov <bp@...en8.de> (maintainer:RAS INFRASTRUCTURE)
> 
> I'm not maintaining it. :-P
>

Of course not! :P

I do mean "Supported" though. From the top of MAINTAINERS file:

        S: *Status*, one of the following:
           Supported:   Someone is actually paid to look after this.

>> +#define DF_BROADCAST		0xFF
>> +
>> +#define DF_FICAA_INST_EN	BIT(0)
>> +#define DF_FICAA_REG_NUM	GENMASK(10, 1)
>> +#define DF_FICAA_FUNC_NUM	GENMASK(13, 11)
>> +#define DF_FICAA_INST_ID	GENMASK(23, 16)
>> +
>> +/* Register field changed in new systems. */
> 
> I don't understand that comment.

I'll make it more explicit.

The "REG_NUM" field changed. Please note the slightly different
bitmasks.

> 
>> +#define DF_FICAA_REG_NUM_LEGACY	GENMASK(10, 2)
>> +
>> +static int __df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
>> +{
>> +	u32 ficaa_addr = 0x8C, ficad_addr = 0xB8;
>> +	struct pci_dev *F4;
>> +	int err = -ENODEV;
>> +	u32 ficaa = 0;
>> +
>> +	if (node >= amd_nb_num())
>> +		goto out;
>> +
>> +	F4 = node_to_amd_nb(node)->link;
>> +	if (!F4)
>> +		goto out;
>> +
>> +	/* Enable instance-specific access. */
>> +	if (instance_id != DF_BROADCAST) {
>> +		ficaa |= FIELD_PREP(DF_FICAA_INST_EN, 1);
>> +		ficaa |= FIELD_PREP(DF_FICAA_INST_ID, instance_id);
> 
> Dunno, this is understandable to me:
> 
> 		ficaa |= (DF_FICAA_INST_EN << 1) |
> 			 (DF_FICAA_INST_ID << instance_id);
> 
> All that FIELD_* macro crap not.
> 
> What's the advantage of using them? The additional build checks?
> 

Build checks are one advantage. Another is explicitly defining bitfields
so it's not just shifts and masks that need additional comments.

> The code is a lot less readable with those things IMO.
>

Fair. It took some getting used to, but I've come to prefer the bitops.
I'd like to keep them if you don't mind.

>> +int norm_to_sys_addr(u8 socket_id, u8 die_id, u8 cs_inst_id, u64 *addr)
>> +{
>> +	struct addr_ctx ctx;
>> +
>> +	if (df_cfg.rev == UNKNOWN)
>> +		return -EINVAL;
>> +
>> +	memset(&ctx, 0, sizeof(ctx));
>> +
>> +	/* We start from the normalized address */
>> +	ctx.ret_addr = *addr;
>> +	ctx.inst_id = cs_inst_id;
>> +
>> +	if (determine_node_id(&ctx, socket_id, die_id)) {
>> +		pr_warn("Failed to determine Node ID");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (get_address_map(&ctx)) {
>> +		pr_warn("Failed to get address maps");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (denormalize_address(&ctx)) {
>> +		pr_warn("Failed to denormalize address");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!late_hole_remove(&ctx) && add_base_and_hole(&ctx)) {
>> +		pr_warn("Failed to add DRAM base address and hole");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (dehash_address(&ctx)) {
>> +		pr_warn("Failed to dehash address");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (late_hole_remove(&ctx) && add_base_and_hole(&ctx)) {
>> +		pr_warn("Failed to add DRAM base address and hole");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (addr_over_limit(&ctx)) {
>> +		pr_warn("Calculated address is over limit");
>> +		return -EINVAL;
> 
> All those error messages do not dump the address they failed to process
> and if you have multiple failures, you can't know which address they're
> talking about.
> 
> I think it would be better if you have the respective functions state
> the failure and also dump the relevant address/bits/etc which can help
> the user debug this further. And remove the warns here.
> 
> Then, if the user can't do anything about the error, then you don't need
> to say anything either but simply return -EINVAL from this function and
> the upper layers would simply report the error.
>

Okay, will rework.

>> +	}
>> +
>> +	*addr = ctx.ret_addr;
>> +	return 0;
>> +}
>> +
>> +static void check_for_legacy_df_access(void)
>> +{
>> +	/*
>> +	 * All Zen-based systems before Family 19h use the legacy
>> +	 * DF Indirect Access (FICAA/FICAD) offsets.
>> +	 */
>> +	if (boot_cpu_data.x86 < 0x19) {
>> +		df_cfg.flags.legacy_ficaa = true;
>> +		return;
>> +	}
>> +
>> +	/* All systems after Family 19h use the current offsets. */
>> +	if (boot_cpu_data.x86 > 0x19)
>> +		return;
>> +
>> +	/* Some Family 19h systems use the legacy offsets. */
>> +	switch (boot_cpu_data.x86_model) {
>> +	case 0x00 ... 0x0f:
>> +	case 0x20 ... 0x5f:
>> +	       df_cfg.flags.legacy_ficaa = true;
>> +	}
> 
>> +}
>> +
>> +static int __init amd_atl_init(void)
>> +{
>> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>> +		return -ENODEV;
> 
> That thing should use a MODULE_DEVICE_TABLE with a
> 
> X86_MATCH_VENDOR_FAM(AMD, X86_FAMILY_ANY, ...)
> 
> in it.
>

Is this so it loads independently?

I'm thinking it should only load as a dependency for other modules.

>> +	check_for_legacy_df_access();
>> +
>> +	/*
>> +	 * Not sure if this should return an error code.
>> +	 * That may prevent other modules from loading.
> 
> Why would this prevent other modules from loading?
>

If it fails to load, wouldn't modules that depend on it fail to load?

This is what I thought, but I'll double check.

>> +	 * For now, don't fail out. The translation function
>> +	 * will do a check and return early if the DF revision
>> +	 * is not set.
>> +	 */
>> +	if (get_df_system_info()) {
>> +		pr_warn("Failed to get DF information");
>> +		df_cfg.rev = UNKNOWN;
>> +	}
> 
> No, you want to return an error here and not load the ATL - it is dead
> code.
>

Right, but this goes back to the question above.

EDAC, for example, can still function without this code. It will just be
missing the correct "pfn/offset". This is true on many AMD systems
today.

So should EDAC not load if ATL fails to load? Or is there another way to
decouple them?

>> +	pr_info("AMD Address Translation Library initialized");
>> +	return 0;
>> +}
>> +
>> +static void __exit amd_atl_exit(void)
>> +{
>> +}
>> +
>> +module_init(amd_atl_init);
>> +module_exit(amd_atl_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> diff --git a/drivers/ras/amd/atl/dehash.c b/drivers/ras/amd/atl/dehash.c
>> new file mode 100644
>> index 000000000000..e501f2e918d7
>> --- /dev/null
>> +++ b/drivers/ras/amd/atl/dehash.c
>> @@ -0,0 +1,459 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * AMD Address Translation Library
>> + *
>> + * dehash.c : Functions to account for hashing bits
>> + *
>> + * Copyright (c) 2023, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Yazen Ghannam <Yazen.Ghannam@....com>
>> + */
>> +
>> +#include "internal.h"
>> +
>> +static int df2_dehash_addr(struct addr_ctx *ctx)
>> +{
>> +	u8 hashed_bit, intlv_bit;
>> +
>> +	/* Assert that interleave bit is 8 or 9. */
>> +	if (ctx->map.intlv_bit_pos != 8 && ctx->map.intlv_bit_pos != 9) {
>> +		pr_warn("%s: Invalid interleave bit: %u",
>> +			__func__, ctx->map.intlv_bit_pos);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Assert that die and socket interleaving are disabled. */
>> +	if (ctx->map.num_intlv_dies > 1) {
>> +		pr_warn("%s: Invalid number of interleave dies: %u",
>> +			__func__, ctx->map.num_intlv_dies);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (ctx->map.num_intlv_sockets > 1) {
>> +		pr_warn("%s: Invalid number of interleave sockets: %u",
>> +			__func__, ctx->map.num_intlv_sockets);
>> +		return -EINVAL;
>> +	}
> 
> That sanity-checking is almost identical and repeated across a bunch of
> functions. Pls carve out into a helper.
>

Ack.

> ...
> 
>> diff --git a/drivers/ras/amd/atl/denormalize.c b/drivers/ras/amd/atl/denormalize.c
>> new file mode 100644
>> index 000000000000..fe1480c8e0d8
>> --- /dev/null
>> +++ b/drivers/ras/amd/atl/denormalize.c
>> @@ -0,0 +1,644 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * AMD Address Translation Library
>> + *
>> + * denormalize.c : Functions to account for interleaving bits
>> + *
>> + * Copyright (c) 2023, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Yazen Ghannam <Yazen.Ghannam@....com>
>> + */
>> +
>> +#include "internal.h"
>> +
>> +/*
>> + * Returns the Destination Fabric ID. This is the first (lowest)
>> + * CS Fabric ID used within a DRAM Address map.
>> + */
>> +static u16 get_dst_fabric_id(struct addr_ctx *ctx)
>> +{
>> +	switch (df_cfg.rev) {
>> +	case DF2:
>> +		return FIELD_GET(DF2_DST_FABRIC_ID, ctx->map.limit);
>> +	case DF3:
>> +		return FIELD_GET(DF3_DST_FABRIC_ID, ctx->map.limit);
>> +	case DF3p5:
>> +		return FIELD_GET(DF3p5_DST_FABRIC_ID, ctx->map.limit);
>> +	case DF4:
>> +		return FIELD_GET(DF4_DST_FABRIC_ID, ctx->map.ctl);
>> +	case DF4p5:
>> +		return FIELD_GET(DF4p5_DST_FABRIC_ID, ctx->map.ctl);
> 
> You can put each case on a single line:
> 
> 	case DF2:	return... ;
> 	case DF3:	return... ;
> 
> and so on.

Ack.

> 
>> +	default:
>> +		ATL_BAD_DF_REV;
> 
> Ew, this is ugly. Make that and ATL_BAD_INTLV_MODE into simple inline
> functions which you can call.
>

Ack.

>> +		return 0;
>> +	}
>> +}
>> +
>> +/*
>> + * Make a contiguous gap in address for N bits starting at bit P.
>> + *
>> + * Example:
>> + * address bits:		[20:0]
>> + * # of interleave bits    (n):	3
>> + * starting interleave bit (p):	8
>> + *
>> + * expanded address bits:	[20+n : n+p][n+p-1 : p][p-1 : 0]
>> + *				[23   :  11][10    : 8][7   : 0]
> 
> Very nice!
>

Thanks!

>> +static u64 make_space_for_cs_id(struct addr_ctx *ctx)
>> +{
>> +	switch (ctx->map.intlv_mode) {
>> +	case NOHASH_2CHAN:
>> +	case NOHASH_4CHAN:
>> +	case NOHASH_8CHAN:
>> +	case NOHASH_16CHAN:
>> +	case NOHASH_32CHAN:
>> +	case DF2_2CHAN_HASH:
>> +		return make_space_for_cs_id_at_intlv_bit(ctx);
>> +
>> +	case DF3_COD4_2CHAN_HASH:
>> +	case DF3_COD2_4CHAN_HASH:
>> +	case DF3_COD1_8CHAN_HASH:
>> +	case DF4_NPS4_2CHAN_HASH:
>> +	case DF4_NPS2_4CHAN_HASH:
>> +	case DF4_NPS1_8CHAN_HASH:
>> +	case DF4p5_NPS4_2CHAN_1K_HASH:
>> +	case DF4p5_NPS4_2CHAN_2K_HASH:
>> +	case DF4p5_NPS2_4CHAN_2K_HASH:
>> +	case DF4p5_NPS1_8CHAN_2K_HASH:
>> +	case DF4p5_NPS1_16CHAN_2K_HASH:
>> +		return make_space_for_cs_id_split_2_1(ctx);
>> +
>> +	case DF4p5_NPS2_4CHAN_1K_HASH:
>> +		//TODO
> 
> As discussed offlist, pls fix those.
>

Ack.

>> +	case DF4p5_NPS1_8CHAN_1K_HASH:
>> +		//TODO
>> +	case DF4p5_NPS1_16CHAN_1K_HASH:
>> +		//TODO
>> +	default:
>> +		ATL_BAD_INTLV_MODE(ctx->map.intlv_mode);
>> +		return ~0ULL;
>> +	}
>> +}
>> +
> 
> ...
> 
>> +static u16 get_cs_id_df4(struct addr_ctx *ctx)
>> +{
>> +	/*
>> +	 * Start with the original component mask and the number of interleave
>> +	 * bits for the channels in this map.
>> +	 */
>> +	u8 num_intlv_bits = ilog2(ctx->map.num_intlv_chan);
>> +	u16 mask = df_cfg.component_id_mask;
>> +
>> +	u16 socket_bits;
>> +
>> +	/* Set the derived CS ID to the input CS Fabric ID. */
>> +	u16 cs_id = ctx->cs_fabric_id & mask;
>> +
>> +	/*
>> +	 * Subtract the "base" Destination Fabric ID.
>> +	 * This accounts for systems with disabled Coherent Stations.
>> +	 */
>> +	cs_id -= get_dst_fabric_id(ctx) & mask;
>> +
>> +	/*
>> +	 * Generate and use a new mask based on the number of bits
>> +	 * needed for channel interleaving in this map.
>> +	 */
>> +	mask = GENMASK(num_intlv_bits - 1, 0);
>> +	cs_id &= mask;
>> +
>> +	/* Done if socket interleaving is not enabled. */
>> +	if (ctx->map.num_intlv_sockets <= 1)
>> +		return cs_id;
>> +
>> +	/*
>> +	 * Figure out how many bits are needed for the number of
>> +	 * interleaved sockets. And shift the derived CS ID to account
>> +	 * for these.
>> +	 */
>> +	num_intlv_bits = ilog2(ctx->map.num_intlv_sockets);
>> +	cs_id <<= num_intlv_bits;
>> +
>> +	/* Generate a new mask for the socket interleaving bits. */
>> +	mask = GENMASK(num_intlv_bits - 1, 0);
>> +
>> +	/* Get the socket interleave bits from the original CS Fabric ID. */
>> +	socket_bits = (ctx->cs_fabric_id & df_cfg.socket_id_mask) >> df_cfg.socket_id_shift;
>> +
>> +	/* Apply the appropriate socket bits to the derived CS ID. */
>> +	cs_id |= socket_bits & mask;
>> +
>> +	return cs_id;
>> +}
> 
> Those are some good comments above, cool.
>

Thanks!

>> +
>> +/*
>> + * Derive the correct CS ID that represents the interleave bits
>> + * used within the system physical address. This accounts for the
>> + * interleave mode, number of interleaved channels/dies/sockets, and
>> + * other system/mode-specific bit swizzling.
> 
>     * Returns: ... on success.
>     	       ... on error.

Will add.

> 
> ...
> 
> 
>> diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h
>> new file mode 100644
>> index 000000000000..f3888c8fd02d
>> --- /dev/null
>> +++ b/drivers/ras/amd/atl/internal.h
>> @@ -0,0 +1,307 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD Address Translation Library
>> + *
>> + * internal.h : Helper functions and common defines
>> + *
>> + * Copyright (c) 2023, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Yazen Ghannam <Yazen.Ghannam@....com>
>> + */
>> +
>> +#ifndef __AMD_ATL_INTERNAL_H__
>> +#define __AMD_ATL_INTERNAL_H__
>> +
>> +#include <asm/amd_nb.h>
>> +
>> +#include <linux/amd-atl.h>
> 
> Is this file in the linux/ namespace because of CXL which is somewhere
> in drivers?
>

Yes. The intention is to allow any code to use this "library" including
arch code like MCA.

>> +#include <linux/bitfield.h>
>> +#include <linux/bitops.h>
>> +
>> +#include "reg_fields.h"
> 
> Include namespaces are in that order:
> 
> 	linux/
> 	asm/
> 	"internal"
> 

Ack.

>> +
>> +/* Maximum possible number of Coherent Stations within a single Data Fabric. */
>> +#define MAX_CS_CHANNELS			32
>> +
>> +/* PCI IDs for Genoa DF Function 0. */
>> +#define DF_FUNC0_ID_GENOA		0x14AD1022
> 
> Genoa is an internal name - pls use Zen4 or so. Change "genoa_quirk"
> too.
> 

Genoa is a public name for a particular Server model group. And the
quirk applies to that group. It doesn't apply to other Zen4 systems like
Client models, etc.

But I agree the names can be more generic. I'll change them to describe
the quirk rather than name a model group.

>> +
>> +/* Shift needed for adjusting register values to true values. */
>> +#define DF_DRAM_BASE_LIMIT_LSB		28
>> +
>> +/*
>> + * Glossary of acronyms used in address translation for Zen-based systems
>> + *
>> + * CCM		= Cache Coherent Moderator
>> + * COD		= Cluster-on-Die
>> + * CS		= Coherent Station
>> + * DF		= Data Fabric
> 
> We have edac.rst for such things.
>

Sure, but I don't understand.

Should these be moved to edac.rst? This code isn't part of EDAC.

Or are you suggesting that this new "library" should have a
Documentation/ entry?

> ...
> 
>> +/*
>> + * Make a gap in 'data' that is 'num_bits' long starting at 'bit_num.
>> + * e.g. data		= 11111111'b
>> + *	bit_num		= 3
>> + *	num_bits	= 2
>> + *	result		= 1111100111'b
>> + */
>> +static inline u64 expand_bits(u8 bit_num, u8 num_bits, u64 data)
>> +{
>> +	u64 temp1, temp2;
>> +
>> +	/*
>> +	 * Return the orginal data if the "space" needed is '0'.
> 
> "original".
> 
> Run through spellchecker pls.
>

Ack.

>> +	 * This helps avoid the need to check for '0' at each
>> +	 * caller.
>> +	 */
>> +	if (!num_bits)
>> +		return data;
>> +
>> +	if (!bit_num)
>> +		return data << num_bits;
>> +
>> +	temp1 = data & GENMASK_ULL(bit_num - 1, 0);
>> +
>> +	temp2 = data & GENMASK_ULL(63, bit_num);
>> +	temp2 <<= num_bits;
>> +
>> +	return temp1 | temp2;
>> +}
>> +
> 
> ...
> 
>> +/*
>> + * Some, but not all, cases have asserts.
>> + * So use return values to indicate failure where needed.
>> + */
>> +static int get_intlv_mode(struct addr_ctx *ctx)
>> +{
>> +	switch (df_cfg.rev) {
>> +	case DF2:
>> +		return df2_get_intlv_mode(ctx);
>> +	case DF3:
>> +		return df3_get_intlv_mode(ctx);
>> +	case DF3p5:
>> +		return df3p5_get_intlv_mode(ctx);
>> +	case DF4:
>> +		return df4_get_intlv_mode(ctx);
>> +	case DF4p5:
>> +		return df4p5_get_intlv_mode(ctx);
> 
> Put each case on a single line.
>

Ack.

>> +	default:
>> +		ATL_BAD_DF_REV;
>> +		return -EINVAL;
>> +	}
>> +}
> 
> 
> ...
> 
>> +static int get_dram_addr_map(struct addr_ctx *ctx)
>> +{
>> +	switch (df_cfg.rev) {
>> +	case DF2:
>> +		return df2_get_dram_addr_map(ctx);
>> +	case DF3:
>> +	case DF3p5:
>> +		return df3_get_dram_addr_map(ctx);
>> +	case DF4:
>> +		return df4_get_dram_addr_map(ctx);
>> +	case DF4p5:
>> +		return df4p5_get_dram_addr_map(ctx);
> 
> Also on single line.
>

Ack.

>> +	default:
>> +		ATL_BAD_DF_REV;
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +static int lookup_cs_fabric_id(struct addr_ctx *ctx)
>> +{
>> +	u32 reg;
>> +
>> +	/* Read D18F0x50 (FabricBlockInstanceInformation3). */
>> +	if (df_indirect_read_instance(ctx->node_id, 0, 0x50, ctx->inst_id, &reg))
>> +		return -EINVAL;
>> +
>> +	if (df_cfg.rev < DF4p5)
>> +		ctx->cs_fabric_id = FIELD_GET(DF2_CS_FABRIC_ID, reg);
>> +	else
>> +		ctx->cs_fabric_id = FIELD_GET(DF4p5_CS_FABRIC_ID, reg);
>> +
>> +	return 0;
>> +}
>> +
>> +static int get_cs_fabric_id(struct addr_ctx *ctx)
>> +{
>> +	/* TODO: Add special path for DF3.5 heterogeneous systems. */
> 
> No TODOs - you return an error for unsupported systems until that
> support comes.
>

Ack.

>> +	return lookup_cs_fabric_id(ctx);
>> +}
> 
> ...
> 
>> +int get_address_map(struct addr_ctx *ctx)
>> +{
>> +	int ret = 0;
>> +
>> +	/* TODO: Add special path for DF3.5 heterogeneous systems. */
> 
> Ditto.
>

Ack.

>> +	ret = get_address_map_common(ctx);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (get_global_map_data(ctx))
>> +		return -EINVAL;
>> +
>> +	dump_address_map(&ctx->map);
>> +
>> +	return ret;
>> +}
> 
> ...
> 
> ...
> 
>> diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
>> new file mode 100644
>> index 000000000000..80030db6b8a5
>> --- /dev/null
>> +++ b/drivers/ras/amd/atl/umc.c
>> @@ -0,0 +1,53 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * AMD Address Translation Library
>> + *
>> + * umc.c : Unified Memory Controller (UMC) topology helpers
>> + *
>> + * Copyright (c) 2023, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Yazen Ghannam <Yazen.Ghannam@....com>
>> + */
>> +
>> +#include "internal.h"
>> +
>> +static u8 get_socket_id(struct mce *m)
>> +{
>> +	return m->socketid;
>> +}
> 
> Looks useless.
> 
>> +
>> +static u8 get_die_id(struct mce *m)
>> +{
>> +	/*
>> +	 * For CPUs, this is the AMD Node ID modulo the number
>> +	 * of AMD Nodes per socket.
>> +	 */
>> +	return topology_die_id(m->extcpu) % amd_get_nodes_per_socket();
>> +}
>> +
>> +static u64 get_norm_addr(struct mce *m)
>> +{
>> +	return m->addr;
>> +}
> 
> Ditto.
>

These are stubbed out because they get expanded for the GPU code. I
originally had it coded together to see how it all fits.

I'll remove these for now, and add them back when needed.

>> +
>> +#define UMC_CHANNEL_NUM	GENMASK(31, 20)
>> +static u8 get_cs_inst_id(struct mce *m)
>> +{
>> +	return FIELD_GET(UMC_CHANNEL_NUM, m->ipid);
>> +}
>> +
>> +int umc_mca_addr_to_sys_addr(struct mce *m, u64 *sys_addr)
>> +{
>> +	u8 cs_inst_id = get_cs_inst_id(m);
>> +	u8 socket_id = get_socket_id(m);
>> +	u64 addr = get_norm_addr(m);
>> +	u8 die_id = get_die_id(m);
>> +
>> +	if (norm_to_sys_addr(socket_id, die_id, cs_inst_id, &addr))
>> +		return -EINVAL;
>> +
>> +	*sys_addr = addr;
>> +	return 0;
>> +}
> 
> PA 0 is an invalid address, right?
>
> So can this function return 0 on error or the actual PA on success so
> that you don't need the IO *sys_addr argument?
> 

No, PA 0 is a valid address. The physical memory map (at least on x86)
starts at 0.

We can still get hardware errors for address 0 even though it's part of
a reserved space. These could be found by patrol scrubbers, etc.

>> +EXPORT_SYMBOL_GPL(umc_mca_addr_to_sys_addr);
> 
> That's an AMD-specific function so:
> 
> 	amd_convert_mca_addr_to_sys_addr();
> 
> with a verb in the name and so on.
>

Ack.

Thanks,
Yazen