lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2b0ffc7-f8f8-4ebc-99da-9139c372bd09@intel.com>
Date: Thu, 29 Aug 2024 08:17:53 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Aaron Lu <aaron.lu@...el.com>, Jarkko Sakkinen <jarkko@...nel.org>,
 Dave Hansen <dave.hansen@...ux.intel.com>
Cc: x86@...nel.org, linux-sgx@...r.kernel.org, linux-kernel@...r.kernel.org,
 Zhimin Luo <zhimin.luo@...el.com>
Subject: Re: [PATCH] x86/sgx: Fix deadloop in __sgx_alloc_epc_page()

Generally, I think it's a bad idea to refer to function names in
subjects.  This, for instance would be much more informative:

	x86/sgx: Fix deadlock in SGX NUMA node search

On 8/28/24 19:38, Aaron Lu wrote:
> When current node doesn't have a EPC section configured by firmware and
> all other EPC sections memory are used up, CPU can stuck inside the
> while loop in __sgx_alloc_epc_page() forever and soft lockup will happen.
> Note how nid_of_current will never equal to nid in that while loop because
> nid_of_current is not set in sgx_numa_mask.
> 
> Also worth mentioning is that it's perfectly fine for firmware to not
> seup an EPC section on a node. Setting an EPC section on each node can
> be good for performance but that's not a requirement functionality wise.

The changelog is a little rough, but I think Kai gave some good
suggestions.  The other thing you can do is dump the text in chatgpt (or
whatever) and have it fix your grammar.  It actually does a pretty
decent job.

Also, you didn't say _how_ you fixed this.  That needs to be in here.
Something along the lines of:

	Rework the loop to start and end on *a* node that has SGX
	memory.  This avoids the deadlock looking for the current SGX-
	lacking node to show up in the loop when it never will.

The code looks fine, so feel free to add:

Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>

Also, I do think we should probably add some kind of sanity warning to
the SGX code in another patch.  If a node on an SGX system has CPUs and
memory, it's very likely it will also have some EPC.  It can be
something soft like a pr_info(), but I think it would be nice to have.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ