lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240214092839.GBZcyHxzsaz9NcijyV@fat_crate.local>
Date: Wed, 14 Feb 2024 10:28:39 +0100
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: tony.luck@...el.com, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, avadhut.naik@....com,
	john.allen@....com, muralidhara.mk@....com,
	naveenkrishna.chatradhi@....com, sathyapriya.k@....com
Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

On Tue, Feb 13, 2024 at 09:35:16PM -0600, Yazen Ghannam wrote:
> +config RAS_FMPM
> +	tristate "FRU Memory Poison Manager"
> +	default m
> +	depends on X86_MCE

I know this is generic-ish but it needs to be enabled only on AMD for
now. Whoever wants it somewhere else, then whoever needs to test it
there first and then enable it there.

> +	imply AMD_ATL
> +	help
> +	  Support saving and restoring memory error information across reboot
> +	  cycles using ACPI ERST as persistent storage. Error information is

s/cycles//

> +	  saved with the UEFI CPER "FRU Memory Poison" section format.
> +
> +	  Memory may be retired during boot time and run time depending on

s/may/is/

Please check all your text - too many "may"s for something which is not
a vendor doc. :)

> +	  platform-specific policies.
> +
>  endif
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 3fac80f58005..11f95d59d397 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -3,4 +3,5 @@ obj-$(CONFIG_RAS)	+= ras.o
>  obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
>  obj-$(CONFIG_RAS_CEC)	+= cec.o
>  
> +obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
>  obj-y			+= amd/atl/
> diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
> new file mode 100644
> index 000000000000..077d9f35cc7d
> --- /dev/null
> +++ b/drivers/ras/amd/fmpm.c
> @@ -0,0 +1,776 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * FRU (Field-Replaceable Unit) Memory Poison Manager
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Authors:
> + *	Naveen Krishna Chatradhi <naveenkrishna.chatradhi@....com>
> + *	Muralidhara M K <muralidhara.mk@....com>
> + *	Yazen Ghannam <Yazen.Ghannam@....com>
> + *
> + * Implementation notes, assumptions, and limitations:
> + *
> + * - FRU Memory Poison Section and Memory Poison Descriptor definitions are not yet
> + *   included in the UEFI specification. So they are defined here. Afterwards, they
> + *   may be moved to linux/cper.h, if appropriate.
> + *
> + * - Platforms based on AMD MI300 systems will be the first to use these structures.
> + *   There are a number of assumptions made here that will need to be generalized
> + *   to support other platforms.
> + *
> + *   AMD MI300-based platform(s) assumptions:
> + *   - Memory errors are reported through x86 MCA.
> + *   - The entire DRAM row containing a memory error should be retired.
> + *   - There will be (1) FRU Memory Poison Section per CPER.
> + *   - The FRU will be the CPU Package (Processor Socket).
> + *   - The default number of Memory Poison Descriptor entries should be (8).
> + *   - The Platform will use ACPI ERST for persistent storage.
> + *   - All FRU records should be saved to persistent storage. Module init will
> + *     fail if any FRU record is not successfully written.

Please drop all that capitalized spelling.

> + * - Source code will be under 'drivers/ras/amd/' unless and until there is interest
> + *   to use this module for other vendors.

This is not needed.

> + * - Boot time memory retirement may occur later than ideal due to dependencies
> + *   on other libraries and drivers. This leaves a gap where bad memory may be
> + *   accessed during early boot stages.
> + *
> + * - Enough memory should be pre-allocated for each FRU record to be able to hold
> + *   the expected number of descriptor entries. This, mostly empty, record is
> + *   written to storage during init time. Subsequent writes to the same record
> + *   should allow the Platform to update the stored record in-place. Otherwise,
> + *   if the record is extended, then the Platform may need to perform costly memory
> + *   management operations on the storage. For example, the Platform may spend time
> + *   in Firmware copying and invalidating memory on a relatively slow SPI ROM.

That's a good thing to have here.

> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/cper.h>
> +#include <linux/ras.h>
> +
> +#include <acpi/apei.h>
> +
> +#include <asm/cpu_device_id.h>
> +#include <asm/mce.h>
> +
> +#pragma pack(1)

Is that some ugly thing to avoid adding __packed annotation to the
structure definitions below?

"GCC supports several types of pragmas, primarily in order to compile
code originally written for other compilers. Note that in general we do
not recommend the use of pragmas; See Declaring Attributes of Functions,
for further explanation. "

Oh, that 1 is something else:

-fpack-struct[=n]

    Without a value specified, pack all structure members together
    without holes. When a value is specified (which must be a small
    power of two), pack structure members according to this value,
    representing the maximum alignment (that is, objects with default
    alignment requirements larger than this are output potentially
    unaligned at the next fitting location.

So do I understand it correctly that struct members should be aligned to
2^1 bytes?

Grepping the tree, this looks like something BIOS does...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ