linux-kernel - Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2jjzsnz0wjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Thu, 22 Feb 2024 11:20:37 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "Mehta, Sohil" <sohil.mehta@...el.com>, "mingo@...hat.com"
 <mingo@...hat.com>, "jarkko@...nel.org" <jarkko@...nel.org>, "x86@...nel.org"
 <x86@...nel.org>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "hpa@...or.com"
 <hpa@...or.com>, "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
 "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, "mkoutny@...e.com"
 <mkoutny@...e.com>, "tglx@...utronix.de" <tglx@...utronix.de>, "tj@...nel.org"
 <tj@...nel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>, "Huang, Kai"
 <kai.huang@...el.com>
Cc: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
 "seanjc@...gle.com" <seanjc@...gle.com>, "anakrish@...rosoft.com"
 <anakrish@...rosoft.com>, "Zhang, Bo" <zhanb@...rosoft.com>,
 "kristen@...ux.intel.com" <kristen@...ux.intel.com>, "yangjie@...rosoft.com"
 <yangjie@...rosoft.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
 "chrisyan@...rosoft.com" <chrisyan@...rosoft.com>
Subject: Re: [PATCH v9 08/15] x86/sgx: Implement EPC reclamation flows for
 cgroup

On Wed, 21 Feb 2024 05:00:27 -0600, Huang, Kai <kai.huang@...el.com> wrote:

> On Wed, 2024-02-21 at 00:44 -0600, Haitao Huang wrote:
>> [...]
>> >
>> > Here the @nr_to_scan is reduced by the number of pages that are
>> > isolated, but
>> > not actually reclaimed (which is reflected by @cnt).
>> >
>> > IIUC, looks you want to make this function do "each cycle" as what you
>> > mentioned
>> > in the v8 [1]:
>> >
>> > 	I tested with that approach and found we can only target number of
>> > pages
>> > 	attempted to reclaim not pages actually reclaimed due to the
>> > uncertainty
>> > 	of how long it takes to reclaim pages. Besides targeting number of
>> > 	scanned pages for each cycle is also what the ksgxd does.
>> >
>> > 	If we target actual number of pages, sometimes it just takes too  
>> long.
>> > I
>> > 	saw more timeouts with the default time limit when running parallel
>> > 	selftests.
>> >
>> > I am not sure what does "sometimes it just takes too long" mean, but
>> > what I am
>> > thinking is you are trying to do some perfect but yet complicated code
>> > here.
>>
>> I think what I observed was that the try_charge() would block too long
>> before getting chance of schedule() to yield, causing more timeouts than
>> necessary.
>> I'll do some re-test to be sure.
>
> Looks this is a valid information that can be used to justify whatever  
> you are
> implementing in the EPC cgroup reclaiming function(s).
>
I'll add some comments. Was assuming this is just following the old design  
as ksgxd.
There were some comments at the beginning of  
sgx_epc_cgrooup_reclaim_page().
         /*
          * Attempting to reclaim only a few pages will often fail and is
          * inefficient, while reclaiming a huge number of pages can result  
in
          * soft lockups due to holding various locks for an extended  
duration.
          */
         unsigned int nr_to_scan = SGX_NR_TO_SCAN;

I think it can be improved to emphasize we only "attempt" to finish  
scanning fixed number of pages for reclamation, not enforce number of  
pages successfully reclaimed.

>>
>> >
>> > For instance, I don't think selftest reflect the real workload, and I
>> > believe
>> > adjusting the limit of a given EPC cgroup shouldn't be a frequent
>> > operation,
>> > thus it is acceptable to use some easy-maintain code but less perfect
>> > code.
>> >
>> > Here I still think having @nr_to_scan as a pointer is  
>> over-complicated.
>> > For
>> > example, we can still let sgx_reclaim_pages() to always scan
>> > SGX_NR_TO_SCAN
>> > pages, but give up when there's enough pages reclaimed or when the EPC
>> > cgroup
>> > and its descendants have been looped:
>> >
>> > unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root)
>> > {
>> > 	unsigned int cnt = 0;
>> > 	...
>> >
>> > 	css_for_each_descendant_pre(pos, css_root) {
>> > 		...
>> > 		epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos));
>> > 		cnt += sgx_reclaim_pages(&epc_cg->lru);
>> >
>> > 		if (cnt >= SGX_NR_TO_SCAN)
>> > 			break;
>> > 	}
>> >
>> > 	...
>> > 	return cnt;
>> > }
>> >
>> > Yeah it may reclaim more than SGX_NR_TO_SCAN when the loop actually
>> > reaches any
>> > descendants, but that should be rare and we don't care that much, do  
>> we?
>> >
>> I assume you meant @cnt here to be number of pages actually reclaimed.
>
> Yes.
>
>> This could cause  sgx_epc_cgroup_reclaim_pages() block too long as @cnt
>> may always be zero (all pages are too young) and you have to loop all
>> descendants.
>
> I am not sure whether this is a valid point.
>
> For example, your change in patch 10 "x86/sgx: Add EPC reclamation in  
> cgroup
> try_charge()" already loops all descendants in below code:
>
> +		if (sgx_epc_cgroup_lru_empty(epc_cg->cg))
> +			return -ENOMEM;
>

I meant looping all descendants for reclamation which is expensive and we  
want to avoid. Not just checking emptiness of LRUs.

> Anyway, I can see all these can be justification to your  
> design/implementation.
> My point is please put these justification in changelog/comments so that  
> we can
> actually understand.
>
Yes, will add clarifying comments.
Thanks