lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zte6ROmqrwdCSIn8@ziqianlu-kbl>
Date: Wed, 4 Sep 2024 09:39:16 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Jarkko Sakkinen <jarkko@...nel.org>
CC: Dave Hansen <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
	<linux-sgx@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Zhimin Luo
	<zhimin.luo@...el.com>
Subject: Re: [PATCH] x86/sgx: Fix deadloop in __sgx_alloc_epc_page()

On Tue, Sep 03, 2024 at 07:05:40PM +0300, Jarkko Sakkinen wrote:
> On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote:
> > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote:
> > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote:
> > > > When current node doesn't have a EPC section configured by firmware and
> > > > all other EPC sections memory are used up, CPU can stuck inside the
> > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen.
> > > > Note how nid_of_current will never equal to nid in that while loop because
> > >                                                      ~~~~
> > > 
> > > Oh *that* while loop ;-) Please be more specific.
> >
> > What about:
> > Note how nid_of_current will never be equal to nid in the while loop that
> > searches an available EPC page from remote nodes because nid_of_current is
> > not set in sgx_numa_mask.
> 
> That would work I think!

While rewriting the changelog, I find it more natural to explain this
"while loop" when I first mentioned it, i.e.

    When the current node doesn't have an EPC section configured by firmware
    and all other EPC sections are used up, CPU can get stuck inside the
    while loop that looks for an available EPC page from remote nodes
    indefinitely, leading to a soft lockup. Note how nid_of_current will
    never be equal to nid in that while loop because nid_of_current is not
    set in sgx_numa_mask.

I hope this looks fine to you.

> >
> > > > nid_of_current is not set in sgx_numa_mask.
> > > >
> > > > Also worth mentioning is that it's perfectly fine for firmware to not
> > > > seup an EPC section on a node. Setting an EPC section on each node can
> > > > be good for performance but that's not a requirement functionality wise.
> > > 
> > > This lacks any description of what is done to __sgx_alloc_epc_page().
> >
> > Will add what Dave suggested on how the problem is fixed to the changelog.
> 
> Great. I think the code change is correct reflecting these additions.
> I'll look the next version as a whole but with high probability I can
> ack that as long as the commit message has these updates.

Thanks.

> >
> > > >
> > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > > > Reported-by: Zhimin Luo <zhimin.luo@...el.com>
> > > > Tested-by: Zhimin Luo <zhimin.luo@...el.com>
> > > > Signed-off-by: Aaron Lu <aaron.lu@...el.com>
> >
> > Thanks,
> > Aaron
> 
> BR, Jarkko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ