[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210719182009.1409895-1-tony.luck@intel.com>
Date: Mon, 19 Jul 2021 11:20:03 -0700
From: Tony Luck <tony.luck@...el.com>
To: Sean Christopherson <seanjc@...gle.com>,
Jarkko Sakkinen <jarkko.sakkinen@...el.com>,
Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Tony Luck <tony.luck@...el.com>
Subject: [PATCH v2 0/6] Basic recovery for machine checks inside SGX
Very different from version 1 based on feedback.
Sean: Didn't like tracking types of SGX pages, so that's all gone now. I
do track the life cycle (in patch 1) using the "owner" field to
determine whether a page is in use vs. dirty/free. Currently
this series doesn't make use of that ... so patch 1 could be
dropped. But it is very small, and I think a pre-requisite for
future improvements to take pre-emptive action for asynch poison
notification (rather that just hoping that the enclave will exit
without accessing poison, or that if it does consume the poison
the error will be recoverable).
I think we should defer the whole asynch action to a subsequent
series that can build on top of this (and do it properly ...
my version 1 sent out SIGBUS signals without regard for system
(/proc/sys/vm/memory_failure_early_kill) or per-task (prctl
PR_MCE_KILL) policies).
Jarkko: Said poison pages should not just be dropped on the floor. They
should be added to a list for future tools to examine. I tried
the list approach, but safely removing pages from free/dirty
lists involved some complex locking, so I skipped ahead to the
"tools" idea and just added files in debugfs to show the count
of poison pages and a list of addresses (maybe the count is
redundant? Could just "wc -l poison_page_list"?).
Other: I got a complaint that after a poison page is handled Linux
spits out this message:
Could not invalidate pfn=0x2000c4d from 1:1 map
this is from set_mce_nospec() and happens because EPC pages
are not in the 1:1 map. Add code to check and ignore them.
Tony Luck (6):
x86/sgx: Provide indication of life-cycle of EPC pages
x86/sgx: Add infrastructure to identify SGX EPC pages
x86/sgx: Initial poison handling for dirty and free pages
x86/sgx: Add SGX infrastructure to recover from poison
x86/sgx: Hook sgx_memory_failure() into mainline code
x86/sgx: Add hook to error injection address validation
.../firmware-guide/acpi/apei/einj.rst | 19 +++
arch/x86/include/asm/set_memory.h | 4 +
arch/x86/kernel/cpu/sgx/encl.c | 2 +-
arch/x86/kernel/cpu/sgx/main.c | 137 +++++++++++++++++-
arch/x86/kernel/cpu/sgx/sgx.h | 6 +-
drivers/acpi/apei/einj.c | 3 +-
include/linux/mm.h | 15 ++
mm/memory-failure.c | 19 ++-
8 files changed, 195 insertions(+), 10 deletions(-)
base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c
--
2.29.2
Powered by blists - more mailing lists