linux-kernel - Re: [PATCH] x86/sgx: fix a NULL pointer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <op.18adwup7wjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date:   Tue, 18 Jul 2023 11:39:56 -0500
From:   "Haitao Huang" <haitao.huang@...ux.intel.com>
To:     "Jarkko Sakkinen" <jarkko@...nel.org>, dave.hansen@...ux.intel.com,
        linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Ingo Molnar" <mingo@...hat.com>, "Borislav Petkov" <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        "Dave Hansen" <dave.hansen@...el.com>
Cc:     kai.huang@...el.com, reinette.chatre@...el.com,
        kristen@...ux.intel.com, seanjc@...gle.com, stable@...r.kernel.org
Subject: Re: [PATCH] x86/sgx: fix a NULL pointer

On Tue, 18 Jul 2023 09:30:11 -0500, Dave Hansen <dave.hansen@...el.com>  
wrote:

> On 7/17/23 13:29, Haitao Huang wrote:
> ...
>> @@ -248,11 +258,9 @@ static struct sgx_encl_page  
>> *__sgx_encl_load_page(struct sgx_encl *encl,
>>  		return entry;
>>  	}
>>
>> -	if (!(encl->secs.epc_page)) {
>> -		epc_page = sgx_encl_eldu(&encl->secs, NULL);
>> -		if (IS_ERR(epc_page))
>> -			return ERR_CAST(epc_page);
>> -	}
>> +	epc_page = sgx_encl_load_secs(encl);
>> +	if (IS_ERR(epc_page))
>> +		return ERR_CAST(epc_page);
>>
>>  	epc_page = sgx_encl_eldu(entry, encl->secs.epc_page);
>>  	if (IS_ERR(epc_page))
>> @@ -339,6 +347,13 @@ static vm_fault_t sgx_encl_eaug_page(struct  
>> vm_area_struct *vma,
>>
>>  	mutex_lock(&encl->lock);
>>
>> +	epc_page = sgx_encl_load_secs(encl);
>> +	if (IS_ERR(epc_page)) {
>> +		if (PTR_ERR(epc_page) == -EBUSY)
>> +			vmret =  VM_FAULT_NOPAGE;
>> +		goto err_out_unlock;
>> +	}
>
> Whenever I see one of these "make sure it isn't NULL", I always jump to
> asking what *keeps* it from becoming NULL again.  In both cases here, I
> think that's encl->lock.
>
Yes, encl->lock protects all enclave states, the xarray holding  
encl_pages, SECS, VAs, etc.

> A comment would be really nice here, maybe on sgx_encl_load_secs().   
> Maybe:
>
> /*
>  * Ensure the SECS page is not swapped out.  Must be called with
>  * encl->lock to protect _____ and ensure the SECS page is not
>  * swapped out again.
>  */
>
Thanks for the suggestion. Lock should be held for the duration of SECS  
usage.
So something like this?
/*
  * Ensure the SECS page is not swapped out.  Must be called with
  * encl->lock to protect the enclave states including SECS and
  * ensure the SECS page is not swapped out again while being used.
  */


>> diff --git a/arch/x86/kernel/cpu/sgx/main.c  
>> b/arch/x86/kernel/cpu/sgx/main.c
>> index 166692f2d501..4662a364ce62 100644
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -257,6 +257,10 @@ static void sgx_reclaimer_write(struct  
>> sgx_epc_page *epc_page,
>>
>>  	mutex_lock(&encl->lock);
>>
>> +	/* Should not be possible */
>> +	if (WARN_ON(!(encl->secs.epc_page)))
>> +		goto out;
>
> That comment isn't super helpful.  We generally don't WARN_ON() things
> that should happen.  *Why* is it not possible?
>

When this part of code is reached, the reclaimer is holding at least one  
reclaimable EPC page to reclaim for the enclave and the code below only  
reclaims SECS when no reclaimable EPCs (number of SECS children being  
zero) of the enclave left. So it should not be possible.
I'll remove this change because this is really not needed for fixing the  
bug as Kai pointed out.

I added this for sanity check when implementing multiple EPC tracking  
lists for cgroups. At one point there were list corruption issues if  
moving EPCs between lists not managed well. With those straightened out,  
and clear definitions of EPC states for moving them from one list to  
another, I no longer see much value to keep this even in later cgroup  
patches.

Thanks
Haitao