linux-kernel - Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aaaa54ed-7fd7-404c-853f-90f2e32ae004@intel.com>
Date: Fri, 23 Feb 2024 11:24:47 +1300
From: "Huang, Kai" <kai.huang@...el.com>
To: Haitao Huang <haitao.huang@...ux.intel.com>, "Mehta, Sohil"
	<sohil.mehta@...el.com>, "mingo@...hat.com" <mingo@...hat.com>,
	"jarkko@...nel.org" <jarkko@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "hpa@...or.com"
	<hpa@...or.com>, "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
	"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, "mkoutny@...e.com"
	<mkoutny@...e.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"tj@...nel.org" <tj@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>
CC: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
	<anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
	"kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
	<yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for
 cgroup



On 23/02/2024 9:12 am, Haitao Huang wrote:
> On Wed, 21 Feb 2024 04:48:58 -0600, Huang, Kai <kai.huang@...el.com> wrote:
> 
>> On Wed, 2024-02-21 at 00:23 -0600, Haitao Huang wrote:
>>> StartHi Kai
>>> On Tue, 20 Feb 2024 03:52:39 -0600, Huang, Kai <kai.huang@...el.com> 
>>> wrote:
>>> [...]
>>> >
>>> > So you introduced the work/workqueue here but there's no place which
>>> > actually
>>> > queues the work.  IMHO you can either:
>>> >
>>> > 1) move relevant code change here; or
>>> > 2) focus on introducing core functions to reclaim certain pages from a
>>> > given EPC
>>> > cgroup w/o workqueue and introduce the work/workqueue in later patch.
>>> >
>>> > Makes sense?
>>> >
>>>
>>> Starting in v7, I was trying to split the big patch, #10 in v6 as you 
>>> and
>>> others suggested. My thought process was to put infrastructure needed 
>>> for
>>> per-cgroup reclaim in the front, then turn on per-cgroup reclaim in [v9
>>> 13/15] in the end.
>>
>> That's reasonable for sure.
>>
> 
> Thanks for the confirmation :-)
> 
>>>
>>> Before that, all reclaimables are tracked in the global LRU so really
>>> there is no "reclaim certain pages from a  given EPC cgroup w/o 
>>> workqueue"
>>> or reclaim through workqueue before that point, as suggested in #2. This
>>> patch puts down the implementation for both flows but neither used 
>>> yet, as
>>> stated in the commit message.
>>
>> I know it's not used yet.  The point is how to split patches to make 
>> them more
>> self-contain and easy to review.
> 
> I would think this patch already self-contained in that all are 
> implementation of cgroup reclamation building blocks utilized later. But 
> I'll try to follow your suggestions below to split further (would prefer 
> not to merge in general unless there is strong reasons).
> 
>>
>> For #2, sorry for not being explicit -- I meant it seems it's more 
>> reasonable to
>> split in this way:
>>
>> Patch 1)
>>   a). change to sgx_reclaim_pages();
> 
> I'll still prefer this to be a separate patch. It is self-contained IMHO.
> We were splitting the original patch because it was too big. I don't 
> want to merge back unless there is a strong reason.
> 
>>   b). introduce sgx_epc_cgroup_reclaim_pages();
> 
> Ok.

If I got you right, I believe you want to have a cgroup variant function 
following the same behaviour of the one for global reclaim, i.e., the 
_current_ sgx_reclaim_pages(), which always tries to scan and reclaim 
SGX_NR_TO_SCAN pages each time.

And this cgroup variant function, sgx_epc_cgroup_reclaim_pages(), tries 
to scan and reclaim SGX_NR_TO_SCAN pages each time "_across_ the cgroup 
and all the descendants".

And you want to implement sgx_epc_cgroup_reclaim_pages() in this way due 
to WHATEVER reasons.

In that case, the change to sgx_reclaim_pages() and the introduce of 
sgx_epc_cgroup_reclaim_pages() should really be together because they 
are completely tied together in terms of implementation.

In this way you can just explain clearly in _ONE_ patch why you choose 
this implementation, and for reviewer it's also easier to review because 
we can just discuss in one patch.

Makes sense?

> 
>>   c). introduce sgx_epc_cgroup_reclaim_work_func() (use a better 
>> name),     which just takes an EPC cgroup as input w/o involving any 
>> work/workqueue.
> 
> This is for the workqueue use only. So I think it'd be better be with 
> patch #2 below?

There are multiple levels of logic here IMHO:

   1. a) and b) above focus on "each reclaim" a given EPC cgroup
   2. c) is about a loop of above to bring given cgroup's usage to limit
   3. workqueue is one (probably best) way to do c) in async way
   4. the logic where 1) (direct reclaim) and 3) (indirect) are triggered

To me, it's clear 1) should be in one patch as stated above.

Also, to me 3) and 4) are better to be together since they give you a 
clear view on how the direct/indirect reclaim are triggered.

2) could be flexible depending on how you see it.  If you prefer viewing 
it from low-level implementation of reclaiming pages from cgroup, then 
it's also OK to be together with 1).  If you want to treat it as a part 
of _async_ way of bring down usage to limit, then _MAYBE_ it's also OK 
to be with 3) and 4).

But to me 2) can be together with 1) or even a separate patch because 
it's still kinda of low-level reclaiming details.  3) and 4) shouldn't 
contain such detail but should focus on how direct/indirect reclaim is done.

[...]

> 
> To be honest, the part I'm feeling most confusing is this 
> self-contained-ness. It seems depend on how you look at things.

Completely understand.  But I think our discussion should be helpful to 
both of us and others.