lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210719182009.1409895-1-tony.luck@intel.com>
Date:   Mon, 19 Jul 2021 11:20:03 -0700
From:   Tony Luck <tony.luck@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Jarkko Sakkinen <jarkko.sakkinen@...el.com>,
        Dave Hansen <dave.hansen@...el.com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        Tony Luck <tony.luck@...el.com>
Subject: [PATCH v2 0/6] Basic recovery for machine checks inside SGX

Very different from version 1 based on feedback.

Sean:	Didn't like tracking types of SGX pages, so that's all gone now. I
	do track the life cycle (in patch 1) using the "owner" field to
	determine whether a page is in use vs. dirty/free. Currently
	this series doesn't make use of that ... so patch 1 could be
	dropped. But it is very small, and I think a pre-requisite for
	future improvements to take pre-emptive action for asynch poison
	notification (rather that just hoping that the enclave will exit
	without accessing poison, or that if it does consume the poison
	the error will be recoverable).

	I think we should defer the whole asynch action to a subsequent
	series that can build on top of this (and do it properly ...
	my version 1 sent out SIGBUS signals without regard for system
	(/proc/sys/vm/memory_failure_early_kill) or per-task (prctl
	PR_MCE_KILL) policies).

Jarkko:	Said poison pages should not just be dropped on the floor. They
	should be added to a list for future tools to examine. I tried
	the list approach, but safely removing pages from free/dirty
	lists involved some complex locking, so I skipped ahead to the
	"tools" idea and just added files in debugfs to show the count
	of poison pages and a list of addresses (maybe the count is
	redundant? Could just "wc -l poison_page_list"?).

Other:	I got a complaint that after a poison page is handled Linux
	spits out this message:
		Could not invalidate pfn=0x2000c4d from 1:1 map
	this is from set_mce_nospec() and happens because EPC pages
	are not in the 1:1 map. Add code to check and ignore them.

Tony Luck (6):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 137 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |  19 ++-
 8 files changed, 195 insertions(+), 10 deletions(-)


base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c
-- 
2.29.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ