[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2jrquskiwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Mon, 26 Feb 2024 15:48:18 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "Huang, Kai" <kai.huang@...el.com>, "tj@...nel.org" <tj@...nel.org>,
"jarkko@...nel.org" <jarkko@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "hpa@...or.com"
<hpa@...or.com>, "mingo@...hat.com" <mingo@...hat.com>,
"tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>, "mkoutny@...e.com"
<mkoutny@...e.com>, "Mehta, Sohil" <sohil.mehta@...el.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>, "bp@...en8.de" <bp@...en8.de>,
"Dave Hansen" <dave.hansen@...el.com>
Cc: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
"seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
<anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
"kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
<yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
"chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup
try_charge()
Hi Dave,
On Mon, 26 Feb 2024 08:04:54 -0600, Dave Hansen <dave.hansen@...el.com>
wrote:
> On 2/26/24 03:36, Huang, Kai wrote:
>>> In case of overcomitting, even if we always reclaim from the same
>>> cgroup
>>> for each fault, one group may still interfere the other: e.g.,
>>> consider an
>>> extreme case in that group A used up almost all EPC at the time group B
>>> has a fault, B has to fail allocation and kill enclaves.
>> If the admin allows group A to use almost all EPC, to me it's fair to
>> say he/she
>> doesn't want to run anything inside B at all and it is acceptable
>> enclaves in B
>> to be killed.
>
> Folks, I'm having a really hard time following this thread. It sounds
> like there's disagreement about when to do system-wide reclaim. Could
> someone remind me of the choices that we have? (A proposed patch would
> go a _long_ way to helping me understand)
>
In case of overcomitting, i.e., sum of limits greater than the EPC
capacity, if one group has a fault, and its usage is not above its own
limit (try_charge() passes), yet total usage of the system has exceeded
the capacity, whether we do global reclaim or just reclaim pages in the
current faulting group.
> Also, what does the core mm memcg code do?
>
I'm not sure. I'll try to find out but it'd be appreciated if someone more
knowledgeable can comment on this. memcg also has the protection mechanism
(i.e., min, low settings) to guarantee some allocation per group so its
approach might not be applicable to misc controller here.
> Last, what is the simplest (least amount of code) thing that the SGX
> cgroup controller could implement here?
>
>
I still think the current approach of doing global reclaim is reasonable
and simple: try_charge() checks cgroup limit and reclaim within the group
if needed, then do EPC page allocation, reclaim globally if allocation
fails due to global usage reaches the capacity.
I'm not sure how not doing global reclaiming in this case would bring any
benefit. Please see my response to Kai's example cases.
Thanks
Haitao
Powered by blists - more mailing lists