lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yuo3dioqb9mDAOcT@gmail.com>
Date:   Wed, 3 Aug 2022 10:53:10 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Jane Chu <jane.chu@...cle.com>
Cc:     tony.luck@...el.com, bp@...en8.de, tglx@...utronix.de,
        mingo@...hat.com, dave.hansen@...ux.intel.com, x86@...nel.org,
        linux-edac@...r.kernel.org, dan.j.williams@...el.com,
        linux-kernel@...r.kernel.org, hch@....de, nvdimm@...ts.linux.dev
Subject: Re: [PATCH v7] x86/mce: retrieve poison range from hardware


* Jane Chu <jane.chu@...cle.com> wrote:

> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine

s/Commit/commit

> poison granularity") that changed nfit_handle_mce() callback to report
> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting
> 2 back-to-back poisons and the driver ends up logging 8 badblocks,
> because 0x1000 bytes is 8 512-byte.
> 
> Dan Williams noticed that apei_mce_report_mem_error() hardcode
> the LSB field to PAGE_SHIFT instead of consulting the input
> struct cper_sec_mem_err record.  So change to rely on hardware whenever
> support is available.
> 
> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com
> 
> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
> Reviewed-by: Ingo Molnar <mingo@...nel.org>
> Signed-off-by: Jane Chu <jane.chu@...cle.com>
> ---
>  arch/x86/kernel/cpu/mce/apei.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
> index 717192915f28..8ed341714686 100644
> --- a/arch/x86/kernel/cpu/mce/apei.c
> +++ b/arch/x86/kernel/cpu/mce/apei.c
> @@ -29,15 +29,26 @@
>  void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
>  {
>  	struct mce m;
> +	int lsb;
>  
>  	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
>  		return;
>  
> +	/*
> +	 * Even if the ->validation_bits are set for address mask,
> +	 * to be extra safe, check and reject an error radius '0',
> +	 * and fall back to the default page size.
> +	 */
> +	if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> +		lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT);
> +	else
> +		lsb = PAGE_SHIFT;
> +
>  	mce_setup(&m);
>  	m.bank = -1;
>  	/* Fake a memory read error with unknown channel */
>  	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
> -	m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT;
> +	m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;

LGTM.

I suppose this wants to go upstream via the tree the bug came from (NVDIMM 
tree? ACPI tree?), or should we pick it up into the x86 tree?

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ