lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3bd497be-54d7-43b8-a392-4bf82bf64679@intel.com>
Date: Tue, 11 Feb 2025 08:25:58 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Andrew Zaborowski <andrew.zaborowski@...el.com>, x86@...nel.org,
 linux-sgx@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, Tony Luck
 <tony.luck@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
 Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
 "H . Peter Anvin" <hpa@...or.com>, balrogg@...il.com
Subject: Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming

I don't expect everyone to know the rules of every little part of the
kernel. But, it's really easy to see a pattern with:

	git log arch/x86/kernel/cpu/sgx/

That usually works for every little nook and cranny of the kernel and
will show you what the subject rules are.

Could you do that for this patch for v2, please?

Also, this isn't about _tracking_ pages per se. It's avoiding SGX page
reclaim, don't you think?

On 2/11/25 07:01, Andrew Zaborowski wrote:
> Pages used by an enclave only get page->poison set in

Could we please call these "epc_page" instead of "page"?

> arch_memory_failure() but stay on sgx_active_page_list.
> page->poison is not checked in the reclaimer logic meaning that a page could be
> reclaimed and go through ETRACK, EBLOCK and EWB.  This can lead to the
> firmware receiving and MCE in one of those operations and going into
> "unbreakable shutdown" and triggering a kernel panic on remaining cores.

This requires low-level SGX implementation knowledge to fully
understand. Both what "ETRACK, EBLOCK and EWB" are in the first place,
how they are involved in reclaim and also why EREMOVE doesn't lead to
the same fate.

Can it be written in a more approachable way?

During SGX reclaim, the CPU actually touches the SGX data page,
encrypting and writing its contents out to normal memory. These "EWB"
writeback operations are implemented in what are effectively big,
complicated chunks of microcode. Any machine checks encountered during
this writeback operation are usually fatal to the entire system.

If an epc_page has poison, reclaiming it is highly likely to bring the
whole system down. The SGX reclaim code does not currently check for poison.

--

> Remove the affected page from sgx_active_page_list but don't add it
> immediately to &node->sgx_poison_page_list to keep most of the current
> semantics.

What semantics are being kept? Are they important?

> Tested with CONFIG_PROVE_LOCKING as suggested by Tony Luck.

"I tested it with lockdep and it didn't blow up" is definitely better
than "I booted this and it didn't blow up" or not testing it at all.

But even better would be demonstrating in the changelog that the locking
rules were understood and respected in this patch.

> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 671c26513..7076464d4 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -719,6 +719,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  		goto out;
>  	}
>  
> +	sgx_unmark_page_reclaimable(page);
> +
>  	/*
>  	 * TBD: Add additional plumbing to enable pre-emptive
>  	 * action for asynchronous poison notification. Until

I'll 100% buy that this is the most expeditious fix.

But is it the _best_ one?

In the end, this patch has the semantics of avoiding SGX reclaim on
poisoned pages. Wouldn't it be most straightforward to implement that in
the SGX *reclaim* code?



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ