linux-kernel - Re: [PATCH v5] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c57d8cc4e4bfbef028a1f226ec2c4691a7c100fe.camel@intel.com>
Date:   Thu, 27 Jul 2023 02:50:02 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "jarkko@...nel.org" <jarkko@...nel.org>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>,
        "haitao.huang@...ux.intel.com" <haitao.huang@...ux.intel.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>
CC:     "kristen@...ux.intel.com" <kristen@...ux.intel.com>,
        "Mehta, Sohil" <sohil.mehta@...el.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>
Subject: Re: [PATCH v5] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG
 race

On Wed, 2023-07-26 at 18:02 -0700, Haitao Huang wrote:
> Under heavy load, the SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC

If I read correctly, Dave suggested to not use "high" (heavy in this sentence)
or "low" pressure:

https://lore.kernel.org/lkml/op.179a4xs0wjvjmi@hhuan26-mobl.amr.corp.intel.com/T/#m9120eac6a4a94daa7c9fcc47709f241cd181e5dc

And I agree.  For instance, consider this happens to one extremely "small"
enclave, while there's a new "big" enclave starts to run.  I don't think we
should say this is "under heavy load".  Just stick to the fact that the
reclaimer may reclaim the SECS page.

> page for an enclave and set encl->secs.epc_page to NULL. But the SECS
> EPC page is used for EAUG in the SGX page fault handler without checking
> for NULL and reloading.
> 
> Fix this by checking if SECS is loaded before EAUG and loading it if it
> was reclaimed.
> 
> The SECS page holds global enclave metadata. It can only be reclaimed
> when there are no other enclave pages remaining. At that point,
> virtually nothing can be done with the enclave until the SECS page is
> paged back in.
> 
> An enclave can not run nor generate page faults without a resident SECS
> page. 
> 

I am not sure whether "nor generate page faults without a resident SECS page" is
accurate?  When SECS is swapped out, I suppose the first EENTER should trigger a
#PF on the TSC page and in the #PF handler the SECS will be swapped in first.

I guess you can just remove this sentence?

> But it is still possible for a #PF for a non-SECS page to race
> with paging out the SECS page: when the last resident non-SECS page A
> triggers a #PF in a non-resident page B, and then page A and the SECS
> both are paged out before the #PF on B is handled.
> 
> Hitting this bug requires that race triggered with a #PF for EAUG.

The above race can happen for the normal ELDU path too, thus I suppose it will
be better to mention why the normal ELDU path doesn't have this issue: it
already does what this fix does.

> Following is a trace when it happens.
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
> Call Trace:
>  ? __kmem_cache_alloc_node+0x16a/0x440
>  ? xa_load+0x6e/0xa0
>  sgx_vma_fault+0x119/0x230
>  __do_fault+0x36/0x140
>  do_fault+0x12f/0x400
>  __handle_mm_fault+0x728/0x1110
>  handle_mm_fault+0x105/0x310
>  do_user_addr_fault+0x1ee/0x750
>  ? __this_cpu_preempt_check+0x13/0x20
>  exc_page_fault+0x76/0x180
>  asm_exc_page_fault+0x27/0x30
> 
> Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
> Cc: stable@...r.kernel.org # v6.0+
> Signed-off-by: Haitao Huang <haitao.huang@...ux.intel.com>
> Reviewed-by: Jarkko Sakkinen <jarkko@...nel.org>
> Acked-by: Reinette Chatre <reinette.chatre@...el.com>

With above addressed, feel free to add:

Reviewed-by: Kai Huang <kai.huang@...el.com>