linux-kernel - Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4db8493b-35a2-474f-997c-5e6ac1b8bd11@intel.com>
Date: Fri, 23 Feb 2024 10:26:05 +1300
From: "Huang, Kai" <kai.huang@...el.com>
To: Haitao Huang <haitao.huang@...ux.intel.com>, "hpa@...or.com"
	<hpa@...or.com>, "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
	"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, "x86@...nel.org"
	<x86@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "jarkko@...nel.org" <jarkko@...nel.org>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mkoutny@...e.com" <mkoutny@...e.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "Mehta, Sohil" <sohil.mehta@...el.com>, "tj@...nel.org"
	<tj@...nel.org>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de"
	<bp@...en8.de>
CC: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
	<anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
	"kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
	<yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup
 try_charge()



On 23/02/2024 6:09 am, Haitao Huang wrote:
> On Wed, 21 Feb 2024 05:06:02 -0600, Huang, Kai <kai.huang@...el.com> wrote:
> 
>>
>>> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
>>> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool 
>>> reclaim)
>>>  {
>>> -    return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, 
>>> PAGE_SIZE);
>>> +    for (;;) {
>>> +        if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
>>> +                    PAGE_SIZE))
>>> +            break;
>>> +
>>> +        if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
>>> +            return -ENOMEM;
>>> +
>>> +        if (signal_pending(current))
>>> +            return -ERESTARTSYS;
>>> +
>>> +        if (!reclaim) {
>>> +            queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work);
>>> +            return -EBUSY;
>>> +        }
>>> +
>>> +        if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
>>> +            /* All pages were too young to reclaim, try again a 
>>> little later */
>>> +            schedule();
>>> +    }
>>> +
>>> +    return 0;
>>>  }
>>>
>>
>> Seems this code change is 90% similar to the existing code in the
>> sgx_alloc_epc_page():
>>
>>     ...
>>     for ( ; ; ) {
>>                 page = __sgx_alloc_epc_page();
>>                 if (!IS_ERR(page)) {
>>                         page->owner = owner;
>>                         break;
>>                 }
>>
>>                 if (list_empty(&sgx_active_page_list))
>>                         return ERR_PTR(-ENOMEM);
>>
>>                 if (!reclaim) {
>>                         page = ERR_PTR(-EBUSY);
>>                         break;
>>                 }
>>
>>                 if (signal_pending(current)) {
>>                         page = ERR_PTR(-ERESTARTSYS);
>>                         break;
>>                 }
>>
>>                 sgx_reclaim_pages();
>>                 cond_resched();
>>         }
>>     ...
>>
>> Is it better to move the logic/code change in try_charge() out to
>> sgx_alloc_epc_page() to unify them?
>>
>> IIUC, the logic is quite similar: When you either failed to allocate 
>> one page,
>> or failed to charge one page, you try to reclaim EPC page(s) from the 
>> current
>> EPC cgroup, either directly or indirectly.
>>
>> No?
> 
> Only these lines are the same:
>                  if (!reclaim) {
>                          page = ERR_PTR(-EBUSY);
>                          break;
>                  }
> 
>                  if (signal_pending(current)) {
>                          page = ERR_PTR(-ERESTARTSYS);
>                          break;
>                  }
> 
> In sgx_alloc_epc_page() we do global reclamation but here we do 
> per-cgroup reclamation. 

But why?  If we failed to allocate, shouldn't we try to reclaim from the 
_current_ EPC cgroup instead of global?  E.g., I thought one enclave in 
one EPC cgroup requesting insane amount of EPC shouldn't impact enclaves 
inside other cgroups?

That's why the logic of other lines is different
> though they look similar due to similar function names. For the global 
> reclamation we need consider case in that cgroup is not enabled. 
> Similarly list_empty(&sgx_active_page_list) would have to be changed to 
> check root cgroup if cgroups enabled otherwise check global LRU.  The 
> (!reclaim) case is also different.  

W/o getting clear on my above question, so far I am not convinced why 
such difference cannot be hide inside wrapper function(s).

So I don't see an obvious good way
> to abstract those to get meaningful savings.
> 
> Thanks
> Haitao