[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <725178dae805211fdcf658dee33110de8342d274.camel@intel.com>
Date: Tue, 27 Jul 2021 01:54:02 +0000
From: "Sakkinen, Jarkko" <jarkko.sakkinen@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>,
"Hansen, Dave" <dave.hansen@...el.com>,
"seanjc@...gle.com" <seanjc@...gle.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v2 0/6] Basic recovery for machine checks inside SGX
On Mon, 2021-07-19 at 11:20 -0700, Tony Luck wrote:
> Very different from version 1 based on feedback.
>
> Sean: Didn't like tracking types of SGX pages, so that's all gone now. I
> do track the life cycle (in patch 1) using the "owner" field to
> determine whether a page is in use vs. dirty/free. Currently
> this series doesn't make use of that ... so patch 1 could be
> dropped. But it is very small, and I think a pre-requisite for
> future improvements to take pre-emptive action for asynch poison
> notification (rather that just hoping that the enclave will exit
> without accessing poison, or that if it does consume the poison
> the error will be recoverable).
>
> I think we should defer the whole asynch action to a subsequent
> series that can build on top of this (and do it properly ...
> my version 1 sent out SIGBUS signals without regard for system
> (/proc/sys/vm/memory_failure_early_kill) or per-task (prctl
> PR_MCE_KILL) policies).
>
> Jarkko: Said poison pages should not just be dropped on the floor. They
> should be added to a list for future tools to examine. I tried
> the list approach, but safely removing pages from free/dirty
> lists involved some complex locking, so I skipped ahead to the
> "tools" idea and just added files in debugfs to show the count
> of poison pages and a list of addresses (maybe the count is
> redundant? Could just "wc -l poison_page_list"?).
>
> Other: I got a complaint that after a poison page is handled Linux
> spits out this message:
> Could not invalidate pfn=0x2000c4d from 1:1 map
> this is from set_mce_nospec() and happens because EPC pages
> are not in the 1:1 map. Add code to check and ignore them.
>
> Tony Luck (6):
> x86/sgx: Provide indication of life-cycle of EPC pages
> x86/sgx: Add infrastructure to identify SGX EPC pages
> x86/sgx: Initial poison handling for dirty and free pages
> x86/sgx: Add SGX infrastructure to recover from poison
> x86/sgx: Hook sgx_memory_failure() into mainline code
> x86/sgx: Add hook to error injection address validation
>
> .../firmware-guide/acpi/apei/einj.rst | 19 +++
> arch/x86/include/asm/set_memory.h | 4 +
> arch/x86/kernel/cpu/sgx/encl.c | 2 +-
> arch/x86/kernel/cpu/sgx/main.c | 137 +++++++++++++++++-
> arch/x86/kernel/cpu/sgx/sgx.h | 6 +-
> drivers/acpi/apei/einj.c | 3 +-
> include/linux/mm.h | 15 ++
> mm/memory-failure.c | 19 ++-
> 8 files changed, 195 insertions(+), 10 deletions(-)
>
>
> base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c
Use jarkko@...nel.org in future versions.
/Jarkko
Powered by blists - more mailing lists