linux-kernel - Re: [PATCH v11 0/3] ACPI: APEI: handle synchronous exceptions in task work to send correct SIGBUS si

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1a6d101-1c08-4cd0-860f-af905869c573@linux.alibaba.com>
Date: Mon, 19 Feb 2024 09:46:57 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: bp@...en8.de, rafael@...nel.org, wangkefeng.wang@...wei.com,
 tanxiaofei@...wei.com, mawupeng1@...wei.com, tony.luck@...el.com,
 linmiaohe@...wei.com, naoya.horiguchi@....com, james.morse@....com,
 gregkh@...uxfoundation.org, will@...nel.org, jarkko@...nel.org
Cc: linux-acpi@...r.kernel.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
 linux-edac@...r.kernel.org, x86@...nel.org, justin.he@....com,
 ardb@...nel.org, ying.huang@...el.com, ashish.kalra@....com,
 baolin.wang@...ux.alibaba.com, tglx@...utronix.de, mingo@...hat.com,
 dave.hansen@...ux.intel.com, lenb@...nel.org, hpa@...or.com,
 robert.moore@...el.com, lvying6@...wei.com, xiexiuqi@...wei.com,
 zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH v11 0/3] ACPI: APEI: handle synchronous exceptions in task
 work to send correct SIGBUS si_code

Hi, James and Borislav,

Gentle Ping. Any feedback to this new version?

Thank you.

Best Regards,
Shuai

On 2024/2/4 16:01, Shuai Xue wrote:
> ## Changes Log
> changes since v10:
> - rebase to v6.8-rc2
> 
> changes since v9:
> - split patch 2 to address exactly one issue in one patch (per Borislav)
> - rewrite commit log according to template (per Borislav)
> - pickup reviewed-by tag of patch 1 from James Morse
> - alloc and free twcb through gen_pool_{alloc, free) (Per James)
> - rewrite cover letter
> 
> changes since v8:
> - remove the bug fix tag of patch 2 (per Jarkko Sakkinen)
> - remove the declaration of memory_failure_queue_kick (per Naoya Horiguchi)
> - rewrite the return value comments of memory_failure (per Naoya Horiguchi)
> 
> changes since v7:
> - rebase to Linux v6.6-rc2 (no code changed)
> - rewritten the cover letter to explain the motivation of this patchset
> 
> changes since v6:
> - add more explicty error message suggested by Xiaofei
> - pick up reviewed-by tag from Xiaofei
> - pick up internal reviewed-by tag from Baolin
> 
> changes since v5 by addressing comments from Kefeng:
> - document return value of memory_failure()
> - drop redundant comments in call site of memory_failure() 
> - make ghes_do_proc void and handle abnormal case within it
> - pick up reviewed-by tag from Kefeng Wang 
> 
> changes since v4 by addressing comments from Xiaofei:
> - do a force kill only for abnormal sync errors
> 
> changes since v3 by addressing comments from Xiaofei:
> - do a force kill for abnormal memory failure error such as invalid PA,
> unexpected severity, OOM, etc
> - pcik up tested-by tag from Ma Wupeng
> 
> changes since v2 by addressing comments from Naoya:
> - rename mce_task_work to sync_task_work
> - drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
> - add steps to reproduce this problem in cover letter
> 
> changes since v1:
> - synchronous events by notify type
> - Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/
> 
> ## Cover Letter
> 
> There are two major types of uncorrected recoverable (UCR) errors :
> 
> - Synchronous error: The error is detected and raised at the point of the
>   consumption in the execution flow, e.g. when a CPU tries to access
>   a poisoned cache line. The CPU will take a synchronous error exception
>   such as Synchronous External Abort (SEA) on Arm64 and Machine Check
>   Exception (MCE) on X86. OS requires to take action (for example, offline
>   failure page/kill failure thread) to recover this uncorrectable error.
> 
> - Asynchronous error: The error is detected out of processor execution
>   context, e.g. when an error is detected by a background scrubber. Some data
>   in the memory are corrupted. But the data have not been consumed. OS is
>   optional to take action to recover this uncorrectable error.
> 
> Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as
> MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED
> could be used to determine whether a synchronous exception occurs on ARM64
> platform. When a synchronous exception is detected, the kernel should
> terminate the current process which accessing the poisoned page. This is
> done by sending a SIGBUS signal with an error code BUS_MCEERR_AR,
> indicating an action-required machine check error on read.
> 
> However, the memory failure recovery is incorrectly sending a SIGBUS
> with wrong error code BUS_MCEERR_AO for synchronous errors in early kill
> mode, even MF_ACTION_REQUIRED is set. The main problem is that
> synchronous errors are queued as a memory_failure() work, and are
> executed within a kernel thread context, not the user-space process that
> encountered the corrupted memory on ARM64 platform. As a result, when
> kill_proc() is called to terminate the process, it sends the incorrect
> SIGBUS error code because the context in which it operates is not the
> one where the error was triggered.
> 
> To this end, fix the problem by:
> 
> - Patch 1: performing a force kill if no memory_failure() work is queued for
> 	   synchronous errors.
> - Patch 2: a minor comments improvement.
> - Patch 3: queue memory_failure() as a task_work so that it runs in the
> 	   context of the process that is actually consuming the poisoned
> 	   data, and it will send SIBBUS with si_code BUS_MCEERR_AR.
> 
> Lv Ying and XiuQi from Huawei also proposed to address similar problem[2][4].
> Acknowledge to discussion with them.
> 
> ## Steps to Reproduce This Problem
> 
> To reproduce this problem:
> 
> 	# STEP1: enable early kill mode
> 	#sysctl -w vm.memory_failure_early_kill=1
> 	vm.memory_failure_early_kill = 1
> 
> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
> 	#einj_mem_uc single
> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> 	injecting ...
> 	triggering ...
> 	signal 7 code 5 addr 0xffffb0d75000
> 	page not present
> 	Test passed
> 
> The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
> and it is not fact.
> 
> After this patch set:
> 
> 	# STEP1: enable early kill mode
> 	#sysctl -w vm.memory_failure_early_kill=1
> 	vm.memory_failure_early_kill = 1
> 
> 	# STEP2: inject an UCE error and consume it to trigger a synchronous error
> 	#einj_mem_uc single
> 	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
> 	injecting ...
> 	triggering ...
> 	signal 7 code 4 addr 0xffffb0d75000
> 	page not present
> 	Test passed
> 
> The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
> as we expected.
> 
> [1] Add ARMv8 RAS virtualization support in QEMU https://patchew.org/QEMU/20200512030609.19593-1-gengdongjiu@huawei.com/
> [2] https://lore.kernel.org/lkml/20221205115111.131568-3-lvying6@huawei.com/
> [3] https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
> [4] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/
> 
> Shuai Xue (3):
>   ACPI: APEI: send SIGBUS to current task if synchronous memory error
>     not recovered
>   mm: memory-failure: move return value documentation to function
>     declaration
>   ACPI: APEI: handle synchronous exceptions in task work to send correct
>     SIGBUS si_code
> 
>  arch/x86/kernel/cpu/mce/core.c |  9 +---
>  drivers/acpi/apei/ghes.c       | 84 +++++++++++++++++++++-------------
>  include/acpi/ghes.h            |  3 --
>  mm/memory-failure.c            | 22 +++------
>  4 files changed, 59 insertions(+), 59 deletions(-)
>