lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Jul 2023 13:11:36 -0500
From:   "Haitao Huang" <haitao.huang@...ux.intel.com>
To:     "Jarkko Sakkinen" <jarkko@...nel.org>, dave.hansen@...ux.intel.com,
        linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Ingo Molnar" <mingo@...hat.com>, "Borislav Petkov" <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        "Dave Hansen" <dave.hansen@...el.com>
Cc:     kai.huang@...el.com, reinette.chatre@...el.com,
        kristen@...ux.intel.com, seanjc@...gle.com, stable@...r.kernel.org
Subject: Re: [PATCH] x86/sgx: fix a NULL pointer

On Tue, 18 Jul 2023 09:27:49 -0500, Dave Hansen <dave.hansen@...el.com>  
wrote:

> On 7/17/23 13:29, Haitao Huang wrote:
>> Under heavy load, the SGX EPC reclaimers (current ksgxd or future EPC
>> cgroup worker) may reclaim the SECS EPC page for an enclave and set
>> encl->secs.epc_page to NULL. But the SECS EPC page is used for EAUG in
>> the SGX #PF handler without checking for NULL and reloading.
>>
>> Fix this by checking if SECS is loaded before EAUG and load it if it was
>> reclaimed.
>
> It would be nice to see a _bit_ more theory of the bug in here.
>
> What is an SECS page and why is it special in a reclaim context?  Why is
> this so hard to hit?  What led you to discover this issue now?  What is
> EAUG?

Let me know if this clarify things.

The SECS page holds global states of an enclave, and all reclaimable pages  
tracked by the SGX EPC reclaimer (ksgxd) are considered 'child' pages of  
the SECS page corresponding to that enclave.  The reclaimer only reclaims  
the SECS page when all its children are reclaimed. That can happen on  
system under high EPC pressure where multiple large enclaves demanding  
much more EPC page than physically available. In a rare case, the  
reclaimer may reclaim all EPC pages of an enclave and it SECS page,  
setting encl->secs.epc_page to NULL, right before the #PF handler get the  
chance to handle a #PF for that enclave. In that case, if that #PF happens  
to require kernel to invoke the EAUG instruction to add a new EPC page for  
the enclave, then a NULL pointer results as current code does not check if  
encl->secs.epc_page is NULL before using it.

The bug is easier to reproduce with the EPC cgroup implementation when a  
low EPC limit is set for a group of enclave hosting processes. Without the  
EPC cgroup it's hard to trigger the reclaimer to reclaim all child pages  
of an SECS page. And it'd also require a machine configured with large RAM  
relative to EPC so no OOM killer triggered before this happens.

Thanks
Haitao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ