[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2f114c41-5dbb-4019-b0f1-046509521d44@amd.com>
Date: Mon, 18 Dec 2023 16:53:24 -0500
From: Yazen Ghannam <yazen.ghannam@....com>
To: Christophe JAILLET <christophe.jaillet@...adoo.fr>, bp@...en8.de,
linux-edac@...r.kernel.org
Cc: yazen.ghannam@....com, linux-kernel@...r.kernel.org,
avadhut.naik@....com, tony.luck@...el.com, john.allen@....com,
william.roche@...cle.com, muralidhara.mk@....com
Subject: Re: [PATCH v4 1/3] RAS: Introduce AMD Address Translation Library
On 12/18/2023 2:21 PM, Christophe JAILLET wrote:
> Le 18/12/2023 à 20:04, Yazen Ghannam a écrit :
>> AMD Zen-based systems report memory errors through Machine Check banks
>> representing Unified Memory Controllers (UMCs). The address value
>> reported for DRAM ECC errors is a "normalized address" that is relative
>> to the UMC. This normalized address must be converted to a system
>> physical address to be usable by the OS.
>>
>> Support for this address translation was introduced to the MCA subsystem
>> with Zen1 systems. The code was later moved to the AMD64 EDAC module,
>> since this was the only user of the code at the time.
>>
>> However, there are uses for this translation outside of EDAC. The system
>> physical address can be used in MCA for preemptive page offlining as done
>> in some MCA notifier functions. Also, this translation is needed as the
>> basis of similar functionality needed for some CXL configurations on AMD
>> systems.
>>
>> Introduce a common address translation library that can be used for
>> multiple subsystems including MCA, EDAC, and CXL.
>>
>> Include support for UMC normalized to system physical address
>> translation for current CPU systems.
>>
>> The Data Fabric Indirect register access offsets and one of the register
>> fields were changed. Default to the current offsets and register field
>> definition. And fallback to the older values if running on a "legacy"
>> system.
>>
>> Provide built-in code to facilitate the loading and unloading of the
>> library module without affecting other modules or built-in code.
>>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
>> ---
>
> ...
>
>> +int get_address_map(struct addr_ctx *ctx)
>> +{
>> + int ret = 0;
>
> Nit: unneeded init
>
>> +
>> + ret = get_address_map_common(ctx);
>> + if (ret)
>> + goto out;
>> +
>> + ret = get_global_map_data(ctx);
>> + if (ret)
>> + goto out;
>> +
>> + dump_address_map(&ctx->map);
>> +
>> +out:
>> + return ret;
>> +}
>> diff --git a/drivers/ras/amd/atl/reg_fields.h
>> b/drivers/ras/amd/atl/reg_fields.h
>> new file mode 100644
>> index 000000000000..6aaa5093f42c
>> --- /dev/null
>> +++ b/drivers/ras/amd/atl/reg_fields.h
>> @@ -0,0 +1,603 @@
>
> ...
>
>> +static void get_num_maps(void)
>> +{
>> + switch (df_cfg.rev) {
>> + case DF2:
>> + case DF3:
>> + case DF3p5:
>> + df_cfg.num_coh_st_maps = 2;
>> + break;
>> + case DF4:
>> + df_cfg.num_coh_st_maps = 4;
>> + break;
>
> If 4 is the correct value in both cases, DF4 and DF4p5 cases could be
> merged.
>
> CJ
>
>> + case DF4p5:
>> + df_cfg.num_coh_st_maps = 4;
>> + break;
>> + default:
>> + atl_debug_on_bad_df_rev();
>> + }
>> +}
>
> ...
>
Yep, good points. Thanks for your feedback!
-Yazen
Powered by blists - more mailing lists