lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Jul 2023 10:47:57 -0400
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Muralidhara M K <muralimk@....com>, linux-edac@...r.kernel.org,
        x86@...nel.org
Cc:     yazen.ghannam@....com, linux-kernel@...r.kernel.org, bp@...en8.de,
        mingo@...hat.com, mchehab@...nel.org, nchatrad@....com,
        Muralidhara M K <muralidhara.mk@....com>,
        Naveen Krishna Chatradhi <naveenkrishna.chatradhi@....com>
Subject: Re: [PATCH 6/7] EDAC/amd64: Add error instance get_err_info() to
 pvt->ops

On 7/20/2023 8:54 AM, Muralidhara M K wrote:
> From: Muralidhara M K <muralidhara.mk@....com>
> 
> On CPUs the data fabric ID of an instance on a CPU is equal to the
> UMC number. since the UMC number and channel are equal in CPU nodes,
> the channel can be used as the data fabric ID of the instance.
> 
> GPU node has 'X' number of PHYs and 'Y' number of channels.
> This results in 'X*Y' number of instances in the data fabric.
> Therefore the data fabric ID of an instance in GPU as below:
>    df_inst_id = 'X' * number of channels per PHY + 'Y'
> 
> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@....com>
> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@....com>
> Signed-off-by: Muralidhara M K <muralidhara.mk@....com>
> ---
>   drivers/edac/amd64_edac.c | 36 +++++++++++++++++++++++++++++++++++-
>   drivers/edac/amd64_edac.h |  2 ++
>   2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 45d8093c117a..74b2b47cc22a 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3047,6 +3047,17 @@ static inline void decode_bus_error(int node_id, struct mce *m)
>   	__log_ecc_error(mci, &err, ecc_type);
>   }
>   
> +/*
> + * On CPUs, The data fabric ID of an instance is equal to the UMC number.
> + * and since the UMC number and channel are equal in CPU nodes, the channel can be
> + * used as the data fabric ID of the instance.
> + */
> +static int umc_inst_id(struct mem_ctl_info *mci, struct amd64_pvt *pvt,
> +		       struct err_info *err)
> +{
> +	return err->channel;
> +}
> +
>   /*
>    * To find the UMC channel represented by this bank we need to match on its
>    * instance_id. The instance_id of a bank is held in the lower 32 bits of its
> @@ -3071,6 +3082,7 @@ static void decode_umc_error(int node_id, struct mce *m)
>   	struct mem_ctl_info *mci;
>   	struct amd64_pvt *pvt;
>   	struct err_info err;
> +	u8 df_inst_id;
>   	u64 sys_addr;
>   
>   	node_id = fixup_node_id(node_id, m);
> @@ -3101,8 +3113,9 @@ static void decode_umc_error(int node_id, struct mce *m)
>   	}
>   
>   	pvt->ops->get_err_info(m, &err);
> +	df_inst_id = pvt->ops->get_inst_id(mci, pvt, &err);
>   
> -	if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
> +	if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
>   		err.err_code = ERR_NORM_ADDR;
>   		goto log_error;
>   	}

This patch is not useful until the address translation is updated. So 
lets drop this for now. And these changes can be included as part of the 
address translation updates.

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ