[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221206153354.92394-1-xueshuai@linux.alibaba.com>
Date: Tue, 6 Dec 2022 23:33:52 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: rafael@...nel.org, lenb@...nel.org, james.morse@....com,
tony.luck@...el.com, bp@...en8.de, dave.hansen@...ux.intel.com,
jarkko@...nel.org, naoya.horiguchi@....com, linmiaohe@...wei.com,
akpm@...ux-foundation.org
Cc: linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
cuibixuan@...ux.alibaba.com, baolin.wang@...ux.alibaba.com,
zhuo.song@...ux.alibaba.com, xueshuai@...ux.alibaba.com
Subject: [RFC PATCH 0/2] ACPI: APEI: handle synchronous exceptions in task work
Currently, both synchronous and asynchronous error are queued and handled by a
dedicated kthread in workqueue. Memory failure for synchronous error is
synced by a trick.
Although the task could be killed by page fault, the memory failure is handled
in a kthread context so that the hwpoison-aware mechanisms, e.g. PF_MCE_EARLY,
early kill, does not work as expected.
To this end, separate synchronous and asynchronous error handling into
different paths like X86 does:
- task work for synchronous error.
- and workqueue for asynchronous error.
This patch set is based on a new UEFI proposal submitted by our colleague Yingwen.[1]
> Background:
>
> In ARM world, two type events (Sync/Async) from hardware IP need OS/VMM take different actions.
> Current CPER memory error record is not able to distinguish sync/async type event right now.
> Current OS/VMM need to take extra actions beyond CPER which is heavy burden to identify the
> two type events
>
> Sync event (e.g. CPU consume poisoned data) --> Firmware -> CPER error log --> OS/VMM take recovery action.
> Async event (e.g. Memory controller detect UE event) --> Firmware --> CPER error log --> OS take page action.
>
>
> Proposal:
>
> - In section description Flags field(UEFI spec section N.2, add sync flag as below. OS/VMM
> could depend on this flag to distinguish sync/async events.
> - Bit8 – sync flag; if set this flag indicates that this event record is synchronous(e.g.
> cpu core consumes poison data, then cause instruction/data abort); if not set, this event record is asynchronous.
>
> Best regards,
> Yingwen Chen
>
> [ Shuai Xue: The thread is only opened to the member of UEFI Workgroup.
> Paste here for discussion.]
[1] https://members.uefi.org/wg/uswg/mail/thread/9453
Shuai Xue (2):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
synchronous events
ACPI: APEI: separate synchronous error handling into task work
drivers/acpi/apei/ghes.c | 120 ++++++++++++++++++++++-----------------
include/linux/cper.h | 22 +++++++
2 files changed, 89 insertions(+), 53 deletions(-)
--
2.20.1.12.g72788fdb
Powered by blists - more mailing lists