linux-kernel - [PATCH v10 0/4] ACPI: APEI: handle synchronous errors in task work with proper si

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231218064521.37324-1-xueshuai@linux.alibaba.com>
Date: Mon, 18 Dec 2023 14:45:17 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: bp@...en8.de,
	rafael@...nel.org,
	wangkefeng.wang@...wei.com,
	tanxiaofei@...wei.com,
	mawupeng1@...wei.com,
	tony.luck@...el.com,
	linmiaohe@...wei.com,
	naoya.horiguchi@....com,
	james.morse@....com,
	gregkh@...uxfoundation.org,
	will@...nel.org,
	jarkko@...nel.org
Cc: linux-acpi@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org,
	linux-edac@...r.kernel.org,
	acpica-devel@...ts.linuxfoundation.org,
	stable@...r.kernel.org,
	x86@...nel.org,
	xueshuai@...ux.alibaba.com,
	justin.he@....com,
	ardb@...nel.org,
	ying.huang@...el.com,
	ashish.kalra@....com,
	baolin.wang@...ux.alibaba.com,
	tglx@...utronix.de,
	mingo@...hat.com,
	dave.hansen@...ux.intel.com,
	lenb@...nel.org,
	hpa@...or.com,
	robert.moore@...el.com,
	lvying6@...wei.com,
	xiexiuqi@...wei.com,
	zhuo.song@...ux.alibaba.com
Subject: [PATCH v10 0/4] ACPI: APEI: handle synchronous errors in task work with proper si_code

## Changes Log

changes since v9:
- split patch 2 to address exactly one issue in one patch (per Borislav)
- rewrite commit log according to template (per Borislav)
- pickup reviewed-by tag of patch 1 from James Morse
- alloc and free twcb through gen_pool_{alloc, free) (Per James)
- rewrite cover letter

changes since v8:
- remove the bug fix tag of patch 2 (per Jarkko Sakkinen)
- remove the declaration of memory_failure_queue_kick (per Naoya Horiguchi)
- rewrite the return value comments of memory_failure (per Naoya Horiguchi)

changes since v7:
- rebase to Linux v6.6-rc2 (no code changed)
- rewritten the cover letter to explain the motivation of this patchset

changes since v6:
- add more explicty error message suggested by Xiaofei
- pick up reviewed-by tag from Xiaofei
- pick up internal reviewed-by tag from Baolin

changes since v5 by addressing comments from Kefeng:
- document return value of memory_failure()
- drop redundant comments in call site of memory_failure() 
- make ghes_do_proc void and handle abnormal case within it
- pick up reviewed-by tag from Kefeng Wang 

changes since v4 by addressing comments from Xiaofei:
- do a force kill only for abnormal sync errors

changes since v3 by addressing comments from Xiaofei:
- do a force kill for abnormal memory failure error such as invalid PA,
unexpected severity, OOM, etc
- pcik up tested-by tag from Ma Wupeng

changes since v2 by addressing comments from Naoya:
- rename mce_task_work to sync_task_work
- drop ACPI_HEST_NOTIFY_MCE case in is_hest_sync_notify()
- add steps to reproduce this problem in cover letter

changes since v1:
- synchronous events by notify type
- Link: https://lore.kernel.org/lkml/20221206153354.92394-3-xueshuai@linux.alibaba.com/

## Cover Letter

There are two major types of uncorrected recoverable (UCR) errors :

- Synchronous error: The error is detected and raised at the point of the
  consumption in the execution flow, e.g. when a CPU tries to access
  a poisoned cache line. The CPU will take a synchronous error exception
  such as Synchronous External Abort (SEA) on Arm64 and Machine Check
  Exception (MCE) on X86. OS requires to take action (for example, offline
  failure page/kill failure thread) to recover this uncorrectable error.

- Asynchronous error: The error is detected out of processor execution
  context, e.g. when an error is detected by a background scrubber. Some data
  in the memory are corrupted. But the data have not been consumed. OS is
  optional to take action to recover this uncorrectable error.

Currently, both synchronous and asynchronous errors are queued by
ghes_handle_memory_failure() with flag 0, and handled by a dedicated kernel
thread in a work queue on the ARM64 platform. As a result, the memory
failure recovery sends SIBUS with wrong BUS_MCEERR_AO si_code for
synchronous errors in early kill mode. The main problem is that the
memory_failure() work is handled in kthread context but not the user-space
process context which is accessing the corrupt memory location, so it will
send SIGBUS with BUS_MCEERR_AO si_code to the user-space process instead of
BUS_MCEERR_AR in kill_proc().

Fix the problem by:
- Patch 1: seting memory_failure() flags as MF_ACTION_REQUIRED on synchronous
	   errors.
- Patch 2: performing a force kill if no memory_failure() work is queued for
	   synchronous errors.
- Patch 3: a minor comments improve.
- Patch 4: queueing memory_failure() as a task_work so that the current
	   context in memory_failure() exactly belongs to the process
	   consuming poison data.

Lv Ying and XiuQi from Huawei also proposed to address similar problem[2][4].
Acknowledge to discussion with them.

## Steps to Reproduce This Problem

To reproduce this problem:

	# STEP1: enable early kill mode
	#sysctl -w vm.memory_failure_early_kill=1
	vm.memory_failure_early_kill = 1

	# STEP2: inject an UCE error and consume it to trigger a synchronous error
	#einj_mem_uc single
	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
	injecting ...
	triggering ...
	signal 7 code 5 addr 0xffffb0d75000
	page not present
	Test passed

The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error
and it is not fact.

After this patch set:

	# STEP1: enable early kill mode
	#sysctl -w vm.memory_failure_early_kill=1
	vm.memory_failure_early_kill = 1

	# STEP2: inject an UCE error and consume it to trigger a synchronous error
	#einj_mem_uc single
	0: single   vaddr = 0xffffb0d75400 paddr = 4092d55b400
	injecting ...
	triggering ...
	signal 7 code 4 addr 0xffffb0d75000
	page not present
	Test passed

The si_code (code 4) from einj_mem_uc indicates that it is BUS_MCEERR_AR error
as we expected.

[1] Add ARMv8 RAS virtualization support in QEMU https://patchew.org/QEMU/20200512030609.19593-1-gengdongjiu@huawei.com/
[2] https://lore.kernel.org/lkml/20221205115111.131568-3-lvying6@huawei.com/
[3] https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com
[4] https://lore.kernel.org/lkml/20221209095407.383211-1-lvying6@huawei.com/

Shuai Xue (4):
  ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
    synchronous events
  ACPI: APEI: send SIGBUS to current task if synchronous memory error
    not recovered
  mm: memory-failure: move memory_failure() return value documentation
    to function declaration
  ACPI: APEI: handle synchronous exceptions in task work

 arch/x86/kernel/cpu/mce/core.c |   9 +--
 drivers/acpi/apei/ghes.c       | 113 ++++++++++++++++++++++-----------
 include/acpi/ghes.h            |   3 -
 mm/memory-failure.c            |  22 ++-----
 4 files changed, 82 insertions(+), 65 deletions(-)

-- 
2.39.3