linux-kernel - Re: [PATCH v16 09/16] x86/sgx: Add basic EPC reclamation flow for cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D3QWDRKVYJ25.2TPQTKQ3CWDDZ@kernel.org>
Date: Tue, 27 Aug 2024 21:15:38 +0300
From: "Jarkko Sakkinen" <jarkko@...nel.org>
To: "Haitao Huang" <haitao.huang@...ux.intel.com>,
 <dave.hansen@...ux.intel.com>, <kai.huang@...el.com>, <tj@...nel.org>,
 <mkoutny@...e.com>, <chenridong@...wei.com>,
 <linux-kernel@...r.kernel.org>, <linux-sgx@...r.kernel.org>,
 <x86@...nel.org>, <cgroups@...r.kernel.org>, <tglx@...utronix.de>,
 <mingo@...hat.com>, <bp@...en8.de>, <hpa@...or.com>,
 <sohil.mehta@...el.com>, <tim.c.chen@...ux.intel.com>
Cc: <zhiquan1.li@...el.com>, <kristen@...ux.intel.com>, <seanjc@...gle.com>,
 <zhanb@...rosoft.com>, <anakrish@...rosoft.com>,
 <mikko.ylinen@...ux.intel.com>, <yangjie@...rosoft.com>,
 <chrisyan@...rosoft.com>
Subject: Re: [PATCH v16 09/16] x86/sgx: Add basic EPC reclamation flow for
 cgroup

On Wed Aug 21, 2024 at 4:53 AM EEST, Haitao Huang wrote:
> Currently in the EPC page allocation, the kernel simply fails the
> allocation when the current EPC cgroup fails to charge due to its usage
> reaching limit.  This is not ideal.  When that happens, a better way is
> to reclaim EPC page(s) from the current EPC cgroup to reduce its usage
> so the new allocation can succeed.
>
> Currently, all EPC pages are tracked in a single global LRU, and the
> "global EPC reclamation" supports the following 3 cases:
>
> 1) On-demand asynchronous reclamation: For allocation requests that can
>    not wait for reclamation but can be retried, an asynchronous
>    reclamation is triggered, in which the global reclaimer, ksgxd, keeps
>    reclaiming EPC pages until the free page count is above a minimal
>    threshold.
>
> 2) On-demand synchronous reclamation: For allocations that can wait for
>    reclamation, the EPC page allocator, sgx_alloc_epc_page() reclaims
>    EPC page(s) immediately until at least one free page is available for
>    allocation.
>
> 3) Preemptive reclamation: For some allocation requests, e.g.,
>    allocation for reloading a reclaimed page to change its permissions
>    or page type, the kernel invokes sgx_reclaim_direct() to preemptively
>    reclaim EPC page(s) as a best effort to minimize on-demand
>    reclamation for subsequent allocations.
>
> Similarly, a "per-cgroup reclamation" is needed to support the above 3
> cases as well:
>
> 1) For on-demand asynchronous reclamation, a per-cgroup reclamation
>    needs to be invoked to maintain a minimal difference between the
>    usage and the limit for each cgroup, analogous to the minimal free
>    page threshold  maintained by the global reclaimer.
>
> 2) For on-demand synchronous reclamation, sgx_cgroup_try_charge() needs
>    to invoke the per-cgroup reclamation until the cgroup usage become
>    at least one page lower than its limit.
>
> 3) For preemptive reclamation, sgx_reclaim_direct() needs to invoke the
>    per-cgroup reclamation to minimize per-cgroup on-demand reclamation
>    for subsequent allocations.
>
> To support the per-cgroup reclamation, introduce a "per-cgroup LRU" to
> track all EPC pages belong to the owner cgroup to utilize the existing
> sgx_reclaim_pages().
>
> Currently, the global reclamation treats all EPC pages equally as it
> scans all EPC pages in FIFO order in the global LRU.  The "per-cgroup
> reclamation" needs to somehow achieve the same fairness of all EPC pages
> that are tracked in the multiple LRUs of the given cgroup and all the
> descendants to reflect the nature of the cgroup.
>
> The idea is to achieve such fairness by scanning "all EPC cgroups" of
> the subtree (the given cgroup and all the descendants) equally in turns,
> and in the scan to each cgroup, apply the existing sgx_reclaim_pages()
> to its LRU. This basic flow is encapsulated in a new function,
> sgx_cgroup_reclaim_pages().
>
> Export sgx_reclaim_pages() for use in sgx_cgroup_reclaim_pages(). And
> modify sgx_reclaim_pages() to return the number of pages scanned so
> sgx_cgroup_reclaim_pages() can track scanning progress and determine
> whether enough scanning is done or to continue the scanning for next
> descendant.
>
> Whenever reclaiming in a subtree of a given root is needed, start the
> scanning from the next descendant where scanning was stopped at last
> time.  To keep track of the next descendant cgroup to scan, add a new
> field, next_cg, in the sgx_cgroup struct.  Create an iterator function,
> sgx_cgroup_next_get(), atomically returns a valid reference of the
> descendant for next round of scanning and advances @next_cg to next
> valid descendant in a preorder walk. This iterator function is used in
> sgx_cgroup_reclaim_pages() to iterate descendants for scanning.
> Separately also advances @next_cg to next valid descendant when the
> cgroup referenced by @next_cg is to be freed.
>
> Add support for on-demand synchronous reclamation in
> sgx_cgroup_try_charge(), applying sgx_cgroup_reclaim_pages() iteratively
> until cgroup usage is lower than its limit.
>
> Later patches will reuse sgx_cgroup_reclaim_pages() to add support for
> asynchronous and preemptive reclamation.
>
> Note all reclaimable EPC pages are still tracked in the global LRU thus
> no per-cgroup reclamation is actually active at the moment: -ENOMEM is
> returned by __sgx_cgroup_try_charge() when LRUs are empty. Per-cgroup
> tracking and reclamation will be turned on in the end after all
> necessary infrastructure is in place.
>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@...el.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
> Co-developed-by: Kristen Carlson Accardi <kristen@...ux.intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@...ux.intel.com>
> Signed-off-by: Haitao Huang <haitao.huang@...ux.intel.com>

Reviewed-by: Jarkko Sakkinen <jarkko@...nel.org>

BR, Jarkko