linux-kernel - Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f242cfb2-e8ab-49f3-9b8e-1236dd361b64@intel.com>
Date: Fri, 23 Feb 2024 11:31:18 +1300
From: "Huang, Kai" <kai.huang@...el.com>
To: Haitao Huang <haitao.huang@...ux.intel.com>, "Mehta, Sohil"
	<sohil.mehta@...el.com>, "mingo@...hat.com" <mingo@...hat.com>,
	"jarkko@...nel.org" <jarkko@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "hpa@...or.com"
	<hpa@...or.com>, "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
	"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, "mkoutny@...e.com"
	<mkoutny@...e.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"tj@...nel.org" <tj@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>
CC: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
	<anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
	"kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
	<yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for
 cgroup



On 23/02/2024 6:20 am, Haitao Huang wrote:
> On Wed, 21 Feb 2024 05:00:27 -0600, Huang, Kai <kai.huang@...el.com> wrote:
> 
>> On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
>>> [...]
>>> >
>>> > Here the @nr_to_scan is reduced by the number of pages that are
>>> > isolated, but
>>> > not actually reclaimed (which is reflected by @cnt).
>>> >
>>> > IIUC, looks you want to make this function do "each cycle" as what you
>>> > mentioned
>>> > in the v8 [1]:
>>> >
>>> >     I tested with that approach and found we can only target number of
>>> > pages
>>> >     attempted to reclaim not pages actually reclaimed due to the
>>> > uncertainty
>>> >     of how long it takes to reclaim pages. Besides targeting number of
>>> >     scanned pages for each cycle is also what the ksgxd does.
>>> >
>>> >     If we target actual number of pages, sometimes it just takes 
>>> too long.
>>> > I
>>> >     saw more timeouts with the default time limit when running 
>>> parallel
>>> >     selftests.
>>> >
>>> > I am not sure what does "sometimes it just takes too long" mean, but
>>> > what I am
>>> > thinking is you are trying to do some perfect but yet complicated code
>>> > here.
>>>
>>> I think what I observed was that the try_charge() would block too long
>>> before getting chance of schedule() to yield, causing more timeouts than
>>> necessary.
>>> I'll do some re-test to be sure.
>>
>> Looks this is a valid information that can be used to justify whatever 
>> you are
>> implementing in the EPC cgroup reclaiming function(s).
>>
> I'll add some comments. Was assuming this is just following the old 
> design as ksgxd.
> There were some comments at the beginning of 
> sgx_epc_cgrooup_reclaim_page().
>          /*
>           * Attempting to reclaim only a few pages will often fail and is
>           * inefficient, while reclaiming a huge number of pages can 
> result in
>           * soft lockups due to holding various locks for an extended 
> duration.
>           */
>          unsigned int nr_to_scan = SGX_NR_TO_SCAN;
> 
> I think it can be improved to emphasize we only "attempt" to finish 
> scanning fixed number of pages for reclamation, not enforce number of 
> pages successfully reclaimed.

Not sure need to be this comment, but at somewhere just state you are 
trying to follow the ksgxd() (the current sgx_reclaim_pages()), but 
trying to do it "_across_ given cgroup and all the descendants".

That's the reason you made @nr_to_scan as a pointer.

And also some text to explain why to follow ksgxd() -- not wanting to 
block longer due to loop over descendants etc -- so we can focus on 
discussing whether such justification is reasonable.