[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zte6ROmqrwdCSIn8@ziqianlu-kbl>
Date: Wed, 4 Sep 2024 09:39:16 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Jarkko Sakkinen <jarkko@...nel.org>
CC: Dave Hansen <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
<linux-sgx@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Zhimin Luo
<zhimin.luo@...el.com>
Subject: Re: [PATCH] x86/sgx: Fix deadloop in __sgx_alloc_epc_page()
On Tue, Sep 03, 2024 at 07:05:40PM +0300, Jarkko Sakkinen wrote:
> On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote:
> > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote:
> > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote:
> > > > When current node doesn't have a EPC section configured by firmware and
> > > > all other EPC sections memory are used up, CPU can stuck inside the
> > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen.
> > > > Note how nid_of_current will never equal to nid in that while loop because
> > > ~~~~
> > >
> > > Oh *that* while loop ;-) Please be more specific.
> >
> > What about:
> > Note how nid_of_current will never be equal to nid in the while loop that
> > searches an available EPC page from remote nodes because nid_of_current is
> > not set in sgx_numa_mask.
>
> That would work I think!
While rewriting the changelog, I find it more natural to explain this
"while loop" when I first mentioned it, i.e.
When the current node doesn't have an EPC section configured by firmware
and all other EPC sections are used up, CPU can get stuck inside the
while loop that looks for an available EPC page from remote nodes
indefinitely, leading to a soft lockup. Note how nid_of_current will
never be equal to nid in that while loop because nid_of_current is not
set in sgx_numa_mask.
I hope this looks fine to you.
> >
> > > > nid_of_current is not set in sgx_numa_mask.
> > > >
> > > > Also worth mentioning is that it's perfectly fine for firmware to not
> > > > seup an EPC section on a node. Setting an EPC section on each node can
> > > > be good for performance but that's not a requirement functionality wise.
> > >
> > > This lacks any description of what is done to __sgx_alloc_epc_page().
> >
> > Will add what Dave suggested on how the problem is fixed to the changelog.
>
> Great. I think the code change is correct reflecting these additions.
> I'll look the next version as a whole but with high probability I can
> ack that as long as the commit message has these updates.
Thanks.
> >
> > > >
> > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > > > Reported-by: Zhimin Luo <zhimin.luo@...el.com>
> > > > Tested-by: Zhimin Luo <zhimin.luo@...el.com>
> > > > Signed-off-by: Aaron Lu <aaron.lu@...el.com>
> >
> > Thanks,
> > Aaron
>
> BR, Jarkko
Powered by blists - more mailing lists