[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2d0n8tjtwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Mon, 06 Nov 2023 20:08:43 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "mingo@...hat.com" <mingo@...hat.com>,
"jarkko@...nel.org" <jarkko@...nel.org>,
"x86@...nel.org" <x86@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"mkoutny@...e.com" <mkoutny@...e.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"tj@...nel.org" <tj@...nel.org>,
"Mehta, Sohil" <sohil.mehta@...el.com>,
"bp@...en8.de" <bp@...en8.de>, "Huang, Kai" <kai.huang@...el.com>,
"Haitao Huang" <haitao.huang@...ux.intel.com>
Cc: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
"Christopherson,, Sean" <seanjc@...gle.com>,
"Zhang, Bo" <zhanb@...rosoft.com>,
"kristen@...ux.intel.com" <kristen@...ux.intel.com>,
"anakrish@...rosoft.com" <anakrish@...rosoft.com>,
"sean.j.christopherson@...el.com" <sean.j.christopherson@...el.com>,
"Li, Zhiquan1" <zhiquan1.li@...el.com>,
"yangjie@...rosoft.com" <yangjie@...rosoft.com>
Subject: Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup
functionality
On Mon, 06 Nov 2023 19:16:30 -0600, Haitao Huang
<haitao.huang@...ux.intel.com> wrote:
> On Mon, 06 Nov 2023 16:18:30 -0600, Huang, Kai <kai.huang@...el.com>
> wrote:
>
>>>
>>> > > +/**
>>> > > + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a
>>> single
>>> > > EPC page
>>> > > + *
>>> > > + * Returns EPC cgroup or NULL on success, -errno on failure.
>>> > > + */
>>> > > +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void)
>>> > > +{
>>> > > + struct sgx_epc_cgroup *epc_cg;
>>> > > + int ret;
>>> > > +
>>> > > + if (sgx_epc_cgroup_disabled())
>>> > > + return NULL;
>>> > > +
>>> > > + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg());
>>> > > + ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg,
>>> PAGE_SIZE);
>>> > > +
>>> > > + if (!ret) {
>>> > > + /* No epc_cg returned, release ref from get_current_misc_cg() */
>>> > > + put_misc_cg(epc_cg->cg);
>>> > > + return ERR_PTR(-ENOMEM);
>>> >
>>> > misc_cg_try_charge() returns 0 when successfully charged, no?
>>>
>>> Right. I really made some mess in rebasing :-(
>>>
>>> >
>>> > > + }
>>> > > +
>>> > > + /* Ref released in sgx_epc_cgroup_uncharge() */
>>> > > + return epc_cg;
>>> > > +}
>>> >
>>> > IMHO the above _try_charge() returning a pointer of EPC cgroup is a
>>> > little bit
>>> > odd, because it doesn't match the existing misc_cg_try_charge() which
>>> > returns
>>> > whether the charge is successful or not. sev_misc_cg_try_charge()
>>> > matches
>>> > misc_cg_try_charge() too.
>>> >
>>> > I think it's better to split "getting EPC cgroup" part out as a
>>> separate
>>> > helper,
>>> > and make this _try_charge() match existing pattern:
>>> >
>>> > struct sgx_epc_cgroup *sgx_get_current_epc_cg(void)
>>> > {
>>> > if (sgx_epc_cgroup_disabled())
>>> > return NULL;
>>> >
>>> > return sgx_epc_cgroup_from_misc_cg(get_current_misc_cg());
>>> > }
>>> >
>>> > int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
>>> > {
>>> > if (!epc_cg)
>>> > return -EINVAL;
>>> >
>>> > return misc_cg_try_charge(epc_cg->cg);
>>> > }
>>> >
>>> > Having sgx_get_current_epc_cg() also makes the caller easier to read,
>>> > because we
>>> > can immediately know we are going to charge the *current* EPC cgroup,
>>> > but not
>>> > some cgroup hidden within sgx_epc_cgroup_try_charge().
>>> >
>>>
>>> Actually, unlike other misc controllers, we need charge and get the
>>> epc_cg
>>> reference at the same time.
>>
>> Can you elaborate?
>>
>> And in practice you always call sgx_epc_cgroup_try_charge() right after
>> sgx_get_current_epc_cg() anyway. The only difference is the whole
>> thing is done
>> in one function or in separate functions.
>>
>> [...]
>>
>
> That's true. I was thinking no need to have them done in separate calls.
> The caller has to check the return value for epc_cg instance first, then
> check result of try_charge. But there is really only one caller,
> sgx_alloc_epc_page() below, so I don't have strong opinions now.
>
> With them separate, the checks will look like this:
> if (epc_cg = sgx_get_current_epc_cg()) // NULL means cgroup disabled,
> should continue for allocation
> {
> if (ret = sgx_epc_cgroup_try_charge())
> return ret
> }
> // continue...
>
> I can go either way.
>
>>
>>> > > struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>>> > > {
>>> > > struct sgx_epc_page *page;
>>> > > + struct sgx_epc_cgroup *epc_cg;
>>> > > +
>>> > > + epc_cg = sgx_epc_cgroup_try_charge();
>>> > > + if (IS_ERR(epc_cg))
>>> > > + return ERR_CAST(epc_cg);
>>> > >
>>> > > for ( ; ; ) {
>>> > > page = __sgx_alloc_epc_page();
>>> > > @@ -580,10 +587,21 @@ struct sgx_epc_page *sgx_alloc_epc_page(void
>>> > > *owner, bool reclaim)
>>> > > break;
>>> > > }
>>> > >
>>> > > + /*
>>> > > + * Need to do a global reclamation if cgroup was not full but
>>> free
>>> > > + * physical pages run out, causing __sgx_alloc_epc_page() to
>>> fail.
>>> > > + */
>>> > > sgx_reclaim_pages();
>>> >
>>> > What's the final behaviour? IIUC it should be reclaiming from the
>>> > *current* EPC
>>> > cgroup? If so shouldn't we just pass the @epc_cg to it here?
>>> >
>>> > I think we can make this patch as "structure" patch w/o actually
>>> having
>>> > EPC
>>> > cgroup enabled, i.e., sgx_get_current_epc_cg() always return NULL.
>>> >
>>> > And we can have one patch to change sgx_reclaim_pages() to take the
>>> > 'struct
>>> > sgx_epc_lru_list *' as argument:
>>> >
>>> > void sgx_reclaim_pages_lru(struct sgx_epc_lru_list * lru)
>>> > {
>>> > ...
>>> > }
>>> >
>>> > Then here we can have something like:
>>> >
>>> > void sgx_reclaim_pages(struct sgx_epc_cg *epc_cg)
>>> > {
>>> > struct sgx_epc_lru_list *lru = epc_cg ? &epc_cg->lru :
>>> > &sgx_global_lru;
>>> >
>>> > sgx_reclaim_pages_lru(lru);
>>> > }
>>> >
>>> > Makes sense?
>>> >
>>>
>>> This is purely global reclamation. No cgroup involved.
>>
>> Again why? Here you are allocating one EPC page for enclave in a
>> particular EPC
>> cgroup. When that fails, shouldn't you try only to reclaim from the
>> *current*
>> EPC cgroup? Or at least you should try to reclaim from the *current*
>> EPC cgroup
>> first?
>>
>
> Later sgx_epc_cg_try_charge will take a 'reclaim' flag, if true, cgroup
> reclaims synchronously, otherwise in background and returns -EBUSY in
> that case. This function also returns if no valid epc_cg pointer
> returned.
>
> All reclamation for *current* cgroup is done in sgx_epc_cg_try_charge().
>
> So, by reaching to this point, a valid epc_cg pointer was returned,
> that means allocation is allowed for the cgroup (it has reclaimed if
> necessary, and its usage is not above limit after charging).
>
> But the system level free count may be low (e.g., limits of all cgroups
> may add up to be more than capacity). so we need to do a global
> reclamation here, which may involve reclaiming a few pages (from current
> or other groups) so the system can be at a performant state with minimal
> free count. (current behavior of ksgxd).
>
I should have sticked to the orignial comment added in code. Actually
__sgx_alloc_epc_page() can fail if system runs out of EPC. That's the
really reason for global reclaim. The free count enforcement is near the
end of this method after should_reclaim() check.
Haitao
Powered by blists - more mailing lists