lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221206153354.92394-1-xueshuai@linux.alibaba.com>
Date:   Tue,  6 Dec 2022 23:33:52 +0800
From:   Shuai Xue <xueshuai@...ux.alibaba.com>
To:     rafael@...nel.org, lenb@...nel.org, james.morse@....com,
        tony.luck@...el.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        jarkko@...nel.org, naoya.horiguchi@....com, linmiaohe@...wei.com,
        akpm@...ux-foundation.org
Cc:     linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        cuibixuan@...ux.alibaba.com, baolin.wang@...ux.alibaba.com,
        zhuo.song@...ux.alibaba.com, xueshuai@...ux.alibaba.com
Subject: [RFC PATCH 0/2] ACPI: APEI: handle synchronous exceptions in task work

Currently, both synchronous and asynchronous error are queued and handled by a
dedicated kthread in workqueue. Memory failure for synchronous error is
synced by a trick.

Although the task could be killed by page fault, the memory failure is handled
in a kthread context so that the hwpoison-aware mechanisms, e.g. PF_MCE_EARLY,
early kill, does not work as expected.

To this end, separate synchronous and asynchronous error handling into
different paths like X86 does:

- task work for synchronous error.
- and workqueue for asynchronous error.

This patch set is based on a new UEFI proposal submitted by our colleague Yingwen.[1]

> Background:
> 
> In ARM world, two type events (Sync/Async) from hardware IP need OS/VMM take different actions. 
> Current CPER memory error record is not able to distinguish sync/async type event right now. 
> Current OS/VMM need to take extra actions beyond CPER which is heavy burden to identify the 
> two type events
>  
> Sync event (e.g. CPU consume poisoned data) --> Firmware  -> CPER error log  --> OS/VMM take recovery action.
> Async event (e.g. Memory controller detect UE event)  --> Firmware  --> CPER error log  --> OS take page action. 
> 
> 
> Proposal: 
>
> - In section description Flags field(UEFI spec section N.2, add sync flag as below. OS/VMM 
>  could depend on this flag to distinguish sync/async events.
> - Bit8 – sync flag; if set this flag indicates that this event record is synchronous(e.g. 
>  cpu core consumes poison data, then cause instruction/data abort); if not set, this event record is asynchronous.
> 
> Best regards,
> Yingwen Chen
> 
> [ Shuai Xue: The thread is only opened to the member of UEFI Workgroup.
>   Paste here for discussion.]

[1] https://members.uefi.org/wg/uswg/mail/thread/9453

Shuai Xue (2):
  ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
    synchronous events
  ACPI: APEI: separate synchronous error handling into task work

 drivers/acpi/apei/ghes.c | 120 ++++++++++++++++++++++-----------------
 include/linux/cper.h     |  22 +++++++
 2 files changed, 89 insertions(+), 53 deletions(-)

-- 
2.20.1.12.g72788fdb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ