lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6ccb705bc4345420e6c730245f871ba1d9413203.camel@intel.com>
Date:   Thu, 27 Jul 2023 23:21:26 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "jarkko@...nel.org" <jarkko@...nel.org>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>,
        "haitao.huang@...ux.intel.com" <haitao.huang@...ux.intel.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>
CC:     "kristen@...ux.intel.com" <kristen@...ux.intel.com>,
        "Mehta, Sohil" <sohil.mehta@...el.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>
Subject: Re: [PATCH v5] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG
 race

On Thu, 2023-07-27 at 09:16 -0500, Haitao Huang wrote:
> On Wed, 26 Jul 2023 21:50:02 -0500, Huang, Kai <kai.huang@...el.com> wrote:
> 
> > On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote:
> > > Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC
> > 
> > If I read correctly, Dave suggested to not use "high" (heavy in this  
> > sentence)
> > or "low" pressure:
> > 
> > https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@hhuan26-mobl.amr.corp.intel.com/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc
> > 
> > And I agree.  For instance, consider this happens to one extremely  
> > "small"
> > enclave, while there's a new "big" enclave starts to run.  I don't think  
> > we
> > should say this is "under heavy load".  Just stick to the fact that the
> > reclaimer may reclaim the SECS page.
> > 
> Mybe I have some confusion here but I did not think Dave had issues with  
> 'heavy load'. When this happens, the last page causing #PF (page A below)  
> should be the the "youngest" in PTE and it got paged out together with the  
> SECS before the #PF is even handled. Based on that the ksgxd moves 'young'  
> pages to the back of the queue for reclaiming, for that to happen, almost  
> all EPC pages must be paged out for all enclaves at that time, so it means  
> heavy load to me.  And that's also consistent with my tests.

I already provided an example: swapping out an "extreme small" enclave.

Anyway, no big deal to me.

> 
> > > page for an enclave and set encl->secs.epc_page to NULL. But the SECS
> > > EPC page is used for EAUG in the SGX page fault handler without checking
> > > for NULL and reloading.
> > > 
> > > Fix this by checking if SECS is loaded before EAUG and loading it if it
> > > was reclaimed.
> > > 
> > > The SECS page holds global enclave metadata. It can only be reclaimed
> > > when there are no other enclave pages remaining. At that point,
> > > virtually nothing can be done with the enclave until the SECS page is
> > > paged back in.
> ...
> > > But it is still possible for a #PF for a non-SECS page to race
> > > with paging out the SECS page: when the last resident non-SECS page A
> > > triggers a #PF in a non-resident page B, and then page A and the SECS
> > > both are paged out before the #PF on B is handled.
> > > 
> > > Hitting this bug requires that race triggered with a #PF for EAUG.
> > 
> > The above race can happen for the normal ELDU path too, thus I suppose  
> > it will
> > be better to mention why the normal ELDU path doesn't have this issue: it
> > already does what this fix does.
> > 
> Should we focus on the bug and fix itself instead of explaining a non-bug  
> case?
> And the simple changes in this patch clearly show that too if people look  
> for that.

So you spent a lot of text explaining the race condition, but such race
condition applies to both ELDU and EAUG.  I personally went to see the code
whether ELDU has such issue too, and it turned out only EAUG has issue.  If you
mention this in the changelog perhaps I wouldn't need to go to read the code.

Anyway, just my 2cents.

And I don't want to let those block this patch, so feel free to add my tag:

Reviewed-by: Kai Huang <kai.huang@...el.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ