[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtFgem6_2j05S0MJ@ziqianlu-kbl>
Date: Fri, 30 Aug 2024 14:02:34 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Jarkko Sakkinen <jarkko@...nel.org>, Dave Hansen
<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <linux-sgx@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Zhimin Luo <zhimin.luo@...el.com>
Subject: Re: [PATCH] x86/sgx: Fix deadloop in __sgx_alloc_epc_page()
On Thu, Aug 29, 2024 at 08:17:53AM -0700, Dave Hansen wrote:
> Generally, I think it's a bad idea to refer to function names in
> subjects. This, for instance would be much more informative:
>
> x86/sgx: Fix deadlock in SGX NUMA node search
Indeed, will use this as subject, thanks.
> On 8/28/24 19:38, Aaron Lu wrote:
> > When current node doesn't have a EPC section configured by firmware and
> > all other EPC sections memory are used up, CPU can stuck inside the
> > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen.
> > Note how nid_of_current will never equal to nid in that while loop because
> > nid_of_current is not set in sgx_numa_mask.
> >
> > Also worth mentioning is that it's perfectly fine for firmware to not
> > seup an EPC section on a node. Setting an EPC section on each node can
> > be good for performance but that's not a requirement functionality wise.
>
> The changelog is a little rough, but I think Kai gave some good
> suggestions. The other thing you can do is dump the text in chatgpt (or
> whatever) and have it fix your grammar. It actually does a pretty
> decent job.
Thanks for the suggestion.
>
> Also, you didn't say _how_ you fixed this. That needs to be in here.
> Something along the lines of:
>
> Rework the loop to start and end on *a* node that has SGX
> memory. This avoids the deadlock looking for the current SGX-
> lacking node to show up in the loop when it never will.
Will add this to the changelog, thanks for the write-up.
>
> The code looks fine, so feel free to add:
>
> Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
Thanks.
>
> Also, I do think we should probably add some kind of sanity warning to
> the SGX code in another patch. If a node on an SGX system has CPUs and
> memory, it's very likely it will also have some EPC. It can be
> something soft like a pr_info(), but I think it would be nice to have.
I think there are systems with valid reason to not setup an EPC section
per node, e.g. a 8 sockets system with SNC=2, there would be a total of
16 nodes and it's not possible to have one EPC section per node because
the upper limit of EPC sections is 8. I'm not sure a warning is
appropriate here, what do you think?
Powered by blists - more mailing lists