lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7e50ef5-44f1-4b39-8d60-9e271b294ea0@amd.com>
Date: Sun, 7 Apr 2024 10:17:26 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: John Allen <john.allen@....com>, rafael@...nel.org, lenb@...nel.org,
 bp@...en8.de
Cc: yazen.ghannam@....com, linux-acpi@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org
Subject: Re: [PATCH 2/2] RAS/AMD/ATL: Translate normalized to system physical
 addresses using PRM



On 3/26/24 17:26, John Allen wrote:
> Future AMD platforms will provide a UEFI PRM module that implements a
> number of address translation PRM handlers. This will provide an
> interface for the OS to call platform specific code without requiring
> the use of SMM or other heavy firmware operations.
> 
> AMD Zen-based systems report memory error addresses through Machine
> Check banks representing Unified Memory Controllers (UMCs) in the form
> of UMC relative "normalized" addresses. A normalized address must be
> converted to a system physical address to be usable by the OS.
> 
> Add support for the normalized to system physical address translation
> PRM handler in the AMD Address Translation Library and prefer it over
> native code if available. The GUID and parameter buffer structure are
> specific to the normalized to system physical address handler provided
> by the address translation PRM module included in future AMD systems.
> 
> The address translation PRM module is documented in chapter 22 of the
> publicly available "AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
> ACPI v6.5 Porting Guide":
> https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/58088-0.75-pub.pdf
> 
> Signed-off-by: John Allen <john.allen@....com>
> ---
>   drivers/ras/amd/atl/Makefile   |  1 +
>   drivers/ras/amd/atl/internal.h |  2 ++
>   drivers/ras/amd/atl/prm.c      | 61 ++++++++++++++++++++++++++++++++++
>   drivers/ras/amd/atl/umc.c      |  5 +++
>   4 files changed, 69 insertions(+)
>   create mode 100644 drivers/ras/amd/atl/prm.c
> 
> diff --git a/drivers/ras/amd/atl/Makefile b/drivers/ras/amd/atl/Makefile
> index 4acd5f05bd9c..8f1afa793e3b 100644
> --- a/drivers/ras/amd/atl/Makefile
> +++ b/drivers/ras/amd/atl/Makefile
> @@ -14,5 +14,6 @@ amd_atl-y		+= denormalize.o
>   amd_atl-y		+= map.o
>   amd_atl-y		+= system.o
>   amd_atl-y		+= umc.o
> +amd_atl-y		+= prm.o
>   
>   obj-$(CONFIG_AMD_ATL)	+= amd_atl.o
> diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h
> index 5de69e0bb0f9..f739dcada126 100644
> --- a/drivers/ras/amd/atl/internal.h
> +++ b/drivers/ras/amd/atl/internal.h
> @@ -234,6 +234,8 @@ int dehash_address(struct addr_ctx *ctx);
>   unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr);
>   unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
>   
> +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr);
> +
>   /*
>    * Make a gap in @data that is @num_bits long starting at @bit_num.
>    * e.g. data		= 11111111'b
> diff --git a/drivers/ras/amd/atl/prm.c b/drivers/ras/amd/atl/prm.c
> new file mode 100644
> index 000000000000..54a69e660eb5
> --- /dev/null
> +++ b/drivers/ras/amd/atl/prm.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * AMD Address Translation Library
> + *
> + * prm.c : Plumbing code to UEFI Platform Runtime Mechanism (PRM)
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: John Allen <john.allen@....com>
> + */
> +
> +#include "internal.h"
> +
> +#if defined(CONFIG_ACPI_PRMT)
> +
> +#include <linux/prmt.h>
> +
> +struct prm_umc_param_buffer_norm {
> +	u64 norm_addr;
> +	u8 socket;
> +	u64 umc_bank_inst_id;
> +	void *output_buffer;
> +} __packed;
> +
> +const guid_t norm_to_sys_prm_handler_guid = GUID_INIT(0xE7180659, 0xA65D,

Use the static keyword since this is only used in the current file.

> +						      0x451D, 0x92, 0xCD,
> +						      0x2B, 0x56, 0xF1, 0x2B,
> +						      0xEB, 0xA6);
> +
> +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr)
> +{
> +	struct prm_umc_param_buffer_norm param_buffer;
> +	unsigned long ret_addr;
> +	int ret;
> +
> +	param_buffer.norm_addr        = addr;
> +	param_buffer.socket           = socket_id;
> +	param_buffer.umc_bank_inst_id = umc_bank_inst_id;
> +	param_buffer.output_buffer    = &ret_addr;
> +
> +	ret = acpi_call_prm_handler(norm_to_sys_prm_handler_guid, &param_buffer);
> +	if (!ret)
> +		return ret_addr;
> +
> +	if (ret == -ENODEV)
> +		pr_info("PRM module/handler not available\n");

Make this a pr_debug(). I don't think this is something a user could do
anything about. And one goal of this library to abstract how the
functions work. So "trying different backends" is a library developer
concern.

> +	else
> +		pr_info("PRM address translation failed\n");

Make this a pr_notice_once().

If the handler is available and fails, then this is likely a bug. It
should be reported to the system vendor. And it may be possible for the
user to update the PRM handler. This could be through a BIOS update or
the runtime update option for PRM.

Aside: is the runtime update option implemented?

"Notice" is between info and warning. I think we'd want the user to
notice, but this isn't so severe to need a warning.

Also, *_once() will prevent duplicate messages in the case of multiple
memory errors in the system. The handler shouldn't fail on any valid
input, so a single notice is enough. Especially if the message doesn't
have any error/context-specific details.

Another aside: it's possible to have invalid input. This can happen in
"software/simulated" MCA errors, i.e. the user provides an arbitrary
value for MCA_ADDR. But this would be a user error. I don't think it's
worth trying to filter out this case. An expert user could provide valid
inputs, and they may want to test the full flow. And this isn't an issue
just for PRM but the ATL overall. I hit this myself while testing
another feature. I used a signature for MCA_ADDR (0xC001C0DE01ABCDEF ?)
and the translation failed. But I was more interested in the signature
than the real value. :)

> +
> +	return ret;
> +}
> +
> +#else /* ACPI_PRMT */
> +
> +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr)
> +{
> +	return -ENODEV;
> +}
> +
> +#endif
> diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
> index 59b6169093f7..954cbe6bf465 100644
> --- a/drivers/ras/amd/atl/umc.c
> +++ b/drivers/ras/amd/atl/umc.c
> @@ -333,9 +333,14 @@ unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err)
>   	u8 coh_st_inst_id = get_coh_st_inst_id(err);
>   	unsigned long addr = get_addr(err->addr);
>   	u8 die_id = get_die_id(err);
> +	unsigned long ret_addr;
>   
>   	pr_debug("socket_id=0x%x die_id=0x%x coh_st_inst_id=0x%x addr=0x%016lx",
>   		 socket_id, die_id, coh_st_inst_id, addr);
>   
> +	ret_addr = prm_umc_norm_to_sys_addr(socket_id, err->ipid, addr);
> +	if (!IS_ERR_VALUE(ret_addr))
> +		return ret_addr;
> +
>   	return norm_to_sys_addr(socket_id, die_id, coh_st_inst_id, addr);
>   }

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ