lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 27 Feb 2024 09:23:39 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Borislav Petkov <bp@...en8.de>, "james.morse@....com"
 <james.morse@....com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 Dan Williams <dan.j.williams@...el.com>, Ira Weiny <ira.weiny@...el.com>,
 "Luck, Tony" <tony.luck@...el.com>, rafael@...nel.org,
 wangkefeng.wang@...wei.com, tanxiaofei@...wei.com, mawupeng1@...wei.com,
 linmiaohe@...wei.com, naoya.horiguchi@....com, gregkh@...uxfoundation.org,
 will@...nel.org, jarkko@...nel.org, linux-acpi@...r.kernel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
 linux-edac@...r.kernel.org, x86@...nel.org, justin.he@....com,
 ardb@...nel.org, ying.huang@...el.com, ashish.kalra@....com,
 baolin.wang@...ux.alibaba.com, tglx@...utronix.de, mingo@...hat.com,
 dave.hansen@...ux.intel.com, lenb@...nel.org, hpa@...or.com,
 robert.moore@...el.com, lvying6@...wei.com, xiexiuqi@...wei.com,
 zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH v11 1/3] ACPI: APEI: send SIGBUS to current task if
 synchronous memory error not recovered



On 2024/2/26 18:29, Borislav Petkov wrote:
> On Sat, Feb 24, 2024 at 02:08:42PM +0800, Shuai Xue wrote:
>> @Borislav, do you have any other concerns?
> 
> Yes, this change needs to be further reviewed by an ARM person: I have
> no clue what those "abnormal synchronous errors" on ARM are 

Hi, Borislav,

May the `abnormal` is not inaccurate and misled you. I mean the preconditions
check before memory_failure_queue():

- `if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))` in ghes_handle_memory_failure()
- `if (flags == -1)` in ghes_handle_memory_failure()
- `if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))` in ghes_do_memory_failure()
- `if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) ` in ghes_do_memory_failure()

If the preconditions are not passed, the user-space process will trigger SEA again.
This loop can potentially exceed the platform firmware threshold or even
trigger a kernel hard lockup, leading to a system reboot.

> and how
> they're supposed to be handled properly there:
> 
> - what happens if you get such an error when ghes is disabled there?

If ghes_disable is set, the GHES driver will not be inited by acpi_ghes_init(),
so none of error notifications will be handled. IMHO, it is expected.

> 
> - is that even the right place to handle them?
> 
> James?
> 

Leave this to @James.

Thank you.

Best Regards,
Shuai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ