linux-kernel - Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2e0xhigjwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date:   Mon, 27 Nov 2023 00:01:56 +0800
From:   "Haitao Huang" <haitao.huang@...ux.intel.com>
To:     "tj@...nel.org" <tj@...nel.org>,
        "jarkko@...nel.org" <jarkko@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mkoutny@...e.com" <mkoutny@...e.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        "Mehta, Sohil" <sohil.mehta@...el.com>,
        "bp@...en8.de" <bp@...en8.de>, "Huang, Kai" <kai.huang@...el.com>
Cc:     "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "Zhang, Bo" <zhanb@...rosoft.com>,
        "kristen@...ux.intel.com" <kristen@...ux.intel.com>,
        "yangjie@...rosoft.com" <yangjie@...rosoft.com>,
        "sean.j.christopherson@...el.com" <sean.j.christopherson@...el.com>,
        "Li, Zhiquan1" <zhiquan1.li@...el.com>,
        "anakrish@...rosoft.com" <anakrish@...rosoft.com>
Subject: Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup
 functionality

On Mon, 20 Nov 2023 11:16:42 +0800, Huang, Kai <kai.huang@...el.com> wrote:

>> > >
>> >
>> > That's true. I was thinking no need to have them done in separate  
>> calls.
>> > The caller has to check the return value for epc_cg instance first,  
>> then
>> > check result of try_charge. But there is really only one caller,
>> > sgx_alloc_epc_page() below, so I don't have strong opinions now.
>> >
>> > With them separate, the checks will look like this:
>> > if (epc_cg = sgx_get_current_epc_cg()) // NULL means cgroup disabled,
>> > should continue for allocation
>> > {
>> > 	if (ret =  sgx_epc_cgroup_try_charge())
>> > 		return ret
>> > }
>> > // continue...
>> >
>> > I can go either way.
>
> Let's keep this aligned with other _try_charge() variants: return 'int'  
> to
> indicate whether the charge is done or not.
>

Fine with me if no objections from maintainers.

>> >
>> > >
>> > > > > >  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool  
>> reclaim)
>> > > > > >  {
>> > > > > >  	struct sgx_epc_page *page;
>> > > > > > +	struct sgx_epc_cgroup *epc_cg;
>> > > > > > +
>> > > > > > +	epc_cg = sgx_epc_cgroup_try_charge();
>> > > > > > +	if (IS_ERR(epc_cg))
>> > > > > > +		return ERR_CAST(epc_cg);
>> > > > > >
>> > > > > >  	for ( ; ; ) {
>> > > > > >  		page = __sgx_alloc_epc_page();
>> > > > > > @@ -580,10 +587,21 @@ struct sgx_epc_page  
>> *sgx_alloc_epc_page(void
>> > > > > > *owner, bool reclaim)
>> > > > > >  			break;
>> > > > > >  		}
>> > > > > >
>> > > > > > +		/*
>> > > > > > +		 * Need to do a global reclamation if cgroup was not full  
>> but
>> > > > free
>> > > > > > +		 * physical pages run out, causing __sgx_alloc_epc_page()  
>> to
>> > > > fail.
>> > > > > > +		 */
>> > > > > >  		sgx_reclaim_pages();
>> > > > >
>> > > > > What's the final behaviour?  IIUC it should be reclaiming from  
>> the
>> > > > > *current* EPC
>> > > > > cgroup?  If so shouldn't we just pass the @epc_cg to it here?
>> > > > >
>> > > > > I think we can make this patch as "structure" patch w/o actually
>> > > > having
>> > > > > EPC
>> > > > > cgroup enabled, i.e., sgx_get_current_epc_cg() always return  
>> NULL.
>> > > > >
>> > > > > And we can have one patch to change sgx_reclaim_pages() to take  
>> the
>> > > > > 'struct
>> > > > > sgx_epc_lru_list *' as argument:
>> > > > >
>> > > > > 	void sgx_reclaim_pages_lru(struct sgx_epc_lru_list * lru)
>> > > > > 	{
>> > > > > 		...
>> > > > > 	}
>> > > > >
>> > > > > Then here we can have something like:
>> > > > >
>> > > > > 	void sgx_reclaim_pages(struct sgx_epc_cg *epc_cg)
>> > > > > 	{
>> > > > > 		struct sgx_epc_lru_list *lru =			epc_cg ? &epc_cg->lru :
>> > > > > &sgx_global_lru;
>> > > > >
>> > > > > 		sgx_reclaim_pages_lru(lru);
>> > > > > 	}
>> > > > >
>> > > > > Makes sense?
>> > > > >

The reason we 'isolate' first then do real 'reclaim' is that the actual  
reclaim is expensive and especially for eblock, etrack, etc.

>> > > >
>> > > > This is purely global reclamation. No cgroup involved.
>> > >
>> > > Again why?  Here you are allocating one EPC page for enclave in a
>> > > particular EPC
>> > > cgroup.  When that fails, shouldn't you try only to reclaim from the
>> > > *current*
>> > > EPC cgroup?  Or at least you should try to reclaim from the  
>> *current*
>> > > EPC cgroup
>> > > first?
>> > >
>> >
>> > Later sgx_epc_cg_try_charge will take a 'reclaim' flag, if true,  
>> cgroup
>> > reclaims synchronously, otherwise in background and returns -EBUSY in
>> > that case. This function also returns if no valid epc_cg pointer
>> > returned.
>> >
>> > All reclamation for *current* cgroup is done in  
>> sgx_epc_cg_try_charge().
>
> This is fine, but I believe my question above is about where to reclaim  
> when
> "allocation" fails,  but not "try charge" fails.
>
I mean "will be done" :-) Currently no reclaim in try_charge.

> And for "reclaim for current cgroup when charge fails", I don't think  
> its even
> necessary in this initial implementation of EPC cgroup.  You can just  
> fail the
> allocation when charge fails (reaching the limit).  Trying to reclaim  
> when limit
> is hit can be done later.
>

Yes. It is done later.

> Please see Dave and Michal's replies here:
>
> https://lore.kernel.org/lkml/7a1a5125-9da2-47b6-ba0f-cf24d84df16b@intel.com/#t
> https://lore.kernel.org/lkml/yz44wukoic3syy6s4fcrngagurkjhe2hzka6kvxbajdtro3fwu@zd2ilht7wcw3/
>
>> >
>> > So, by reaching to this point,  a valid epc_cg pointer was returned,
>> > that means allocation is allowed for the cgroup (it has reclaimed if
>> > necessary, and its usage is not above limit after charging).
>
> I found memory cgroup uses different logic -- allocation first and then  
> charge:
>
> For instance:
>
> static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> {
> 	......
>
>         folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
>         if (!folio)
>                 goto oom;
>        if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL))
>                 goto oom_free_page;
> 	
> 	......    }
>
> Why EPC needs to "charge first" and "then allocate"?
>

EPC allocation can involve reclaiming which is more expensive than regular  
RAM reclamation. Also misc only has max hard limit.
Thanks
Haitao
>> >
>> > But the system level free count may be low (e.g., limits of all  
>> cgroups
>> > may add up to be more than capacity). so we need to do a global
>> > reclamation here, which may involve reclaiming a few pages (from  
>> current
>> > or other groups) so the system can be at a performant state with  
>> minimal
>> > free count. (current behavior of ksgxd).
>> >
>> I should have sticked to the orignial comment added in code. Actually
>> __sgx_alloc_epc_page() can fail if system runs out of EPC. That's the
>> really reason for global reclaim.
>
> I don't see how this can work.  With EPC cgroup likely all EPC pages  
> will go to
> the individual LRU of each cgroup, and the sgx_global_lru will basically  
> empty.
> How can you reclaim from the sgx_global_lru?

Currently, nothing in cgroup LRU, all EPCs pages are in global list.