linux-kernel - Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <op.2jkfeezjwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Thu, 22 Feb 2024 16:57:40 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "hpa@...or.com" <hpa@...or.com>, "tim.c.chen@...ux.intel.com"
 <tim.c.chen@...ux.intel.com>, "linux-sgx@...r.kernel.org"
 <linux-sgx@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "jarkko@...nel.org" <jarkko@...nel.org>, "cgroups@...r.kernel.org"
 <cgroups@...r.kernel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, "mkoutny@...e.com" <mkoutny@...e.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>, "Mehta, Sohil"
 <sohil.mehta@...el.com>, "tj@...nel.org" <tj@...nel.org>, "mingo@...hat.com"
 <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>, "Huang, Kai"
 <kai.huang@...el.com>
Cc: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
 "seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
 <anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
 "kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
 <yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
 "chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup
 try_charge()

On Thu, 22 Feb 2024 15:26:05 -0600, Huang, Kai <kai.huang@...el.com> wrote:

>
>
> On 23/02/2024 6:09 am, Haitao Huang wrote:
>> On Wed, 21 Feb 2024 05:06:02 -0600, Huang, Kai <kai.huang@...el.com>  
>> wrote:
>>
>>>
>>>> -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
>>>> +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool  
>>>> reclaim)
>>>>  {
>>>> -    return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,  
>>>> PAGE_SIZE);
>>>> +    for (;;) {
>>>> +        if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
>>>> +                    PAGE_SIZE))
>>>> +            break;
>>>> +
>>>> +        if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
>>>> +            return -ENOMEM;
>>>> +
>>>> +        if (signal_pending(current))
>>>> +            return -ERESTARTSYS;
>>>> +
>>>> +        if (!reclaim) {
>>>> +            queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work);
>>>> +            return -EBUSY;
>>>> +        }
>>>> +
>>>> +        if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false))
>>>> +            /* All pages were too young to reclaim, try again a  
>>>> little later */
>>>> +            schedule();
>>>> +    }
>>>> +
>>>> +    return 0;
>>>>  }
>>>>
>>>
>>> Seems this code change is 90% similar to the existing code in the
>>> sgx_alloc_epc_page():
>>>
>>>     ...
>>>     for ( ; ; ) {
>>>                 page = __sgx_alloc_epc_page();
>>>                 if (!IS_ERR(page)) {
>>>                         page->owner = owner;
>>>                         break;
>>>                 }
>>>
>>>                 if (list_empty(&sgx_active_page_list))
>>>                         return ERR_PTR(-ENOMEM);
>>>
>>>                 if (!reclaim) {
>>>                         page = ERR_PTR(-EBUSY);
>>>                         break;
>>>                 }
>>>
>>>                 if (signal_pending(current)) {
>>>                         page = ERR_PTR(-ERESTARTSYS);
>>>                         break;
>>>                 }
>>>
>>>                 sgx_reclaim_pages();
>>>                 cond_resched();
>>>         }
>>>     ...
>>>
>>> Is it better to move the logic/code change in try_charge() out to
>>> sgx_alloc_epc_page() to unify them?
>>>
>>> IIUC, the logic is quite similar: When you either failed to allocate  
>>> one page,
>>> or failed to charge one page, you try to reclaim EPC page(s) from the  
>>> current
>>> EPC cgroup, either directly or indirectly.
>>>
>>> No?
>>  Only these lines are the same:
>>                  if (!reclaim) {
>>                          page = ERR_PTR(-EBUSY);
>>                          break;
>>                  }
>>                   if (signal_pending(current)) {
>>                          page = ERR_PTR(-ERESTARTSYS);
>>                          break;
>>                  }
>>  In sgx_alloc_epc_page() we do global reclamation but here we do  
>> per-cgroup reclamation.
>
> But why?  If we failed to allocate, shouldn't we try to reclaim from the  
> _current_ EPC cgroup instead of global?  E.g., I thought one enclave in  
> one EPC cgroup requesting insane amount of EPC shouldn't impact enclaves  
> inside other cgroups?
>
Right. When code reaches to here, we already passed reclaim per cgroup.  
The cgroup may not at or reach limit but system has run out of physical  
EPC.

Thanks
Haitao