[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2dz4d5b2wjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Mon, 06 Nov 2023 12:59:55 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "hpa@...or.com" <hpa@...or.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"jarkko@...nel.org" <jarkko@...nel.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mkoutny@...e.com" <mkoutny@...e.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"Mehta, Sohil" <sohil.mehta@...el.com>,
"tj@...nel.org" <tj@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>, "Huang, Kai" <kai.huang@...el.com>
Cc: "mikko.ylinen@...ux.intel.com" <mikko.ylinen@...ux.intel.com>,
"Christopherson,, Sean" <seanjc@...gle.com>,
"Zhang, Bo" <zhanb@...rosoft.com>,
"kristen@...ux.intel.com" <kristen@...ux.intel.com>,
"yangjie@...rosoft.com" <yangjie@...rosoft.com>,
"sean.j.christopherson@...el.com" <sean.j.christopherson@...el.com>,
"Li, Zhiquan1" <zhiquan1.li@...el.com>,
"anakrish@...rosoft.com" <anakrish@...rosoft.com>
Subject: Re: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup
functionality
On Mon, 06 Nov 2023 06:09:45 -0600, Huang, Kai <kai.huang@...el.com> wrote:
> On Mon, 2023-10-30 at 11:20 -0700, Haitao Huang wrote:
>> From: Kristen Carlson Accardi <kristen@...ux.intel.com>
>>
>> Implement support for cgroup control of SGX Enclave Page Cache (EPC)
>> memory using the misc cgroup controller. EPC memory is independent
>> from normal system memory, e.g. must be reserved at boot from RAM and
>> cannot be converted between EPC and normal memory while the system is
>> running. EPC is managed by the SGX subsystem and is not accounted by
>> the memory controller.
>>
>> Much like normal system memory, EPC memory can be overcommitted via
>> virtual memory techniques and pages can be swapped out of the EPC to
>> their backing store (normal system memory, e.g. shmem). The SGX EPC
>> subsystem is analogous to the memory subsystem and the SGX EPC
>> controller
>> is in turn analogous to the memory controller; it implements limit and
>> protection models for EPC memory.
>
> Nit:
>
> The above two paragraphs talk about what is EPC and EPC resource control
> needs
> to be done separately, etc, but IMHO it lacks some background about
> "why" EPC
> resource control is needed, e.g, from use case's perspective.
>
>>
>> The misc controller provides a mechanism to set a hard limit of EPC
>> usage via the "sgx_epc" resource in "misc.max". The total EPC memory
>> available on the system is reported via the "sgx_epc" resource in
>> "misc.capacity".
>
> Please separate what the current misc cgroup provides, and how this
> patch is
> going to utilize.
>
> Please describe the changes in imperative mood. E.g, "report total EPC
> memory
> via ...", instead of "... is reported via ...".
>
Will update
>>
>> This patch was modified from the previous version to only add basic EPC
>> cgroup structure, accounting allocations for cgroup usage
>> (charge/uncharge), setup misc cgroup callbacks, set total EPC capacity.
>
> This isn't changelog material. Please focus on describing the high
> level design
> and why you chose such design.
>
>>
>> For now, the EPC cgroup simply blocks additional EPC allocation in
>> sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are
>> still tracked in the global active list, only reclaimed by the global
>> reclaimer when the total free page count is lower than a threshold.
>>
>> Later patches will reorganize the tracking and reclamation code in the
>> globale reclaimer and implement per-cgroup tracking and reclaiming.
>>
>> Co-developed-by: Sean Christopherson <sean.j.christopherson@...el.com>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
>> Signed-off-by: Kristen Carlson Accardi <kristen@...ux.intel.com>
>> Co-developed-by: Haitao Huang <haitao.huang@...ux.intel.com>
>> Signed-off-by: Haitao Huang <haitao.huang@...ux.intel.com>
>> ---
>> V6:
>> - Split the original large patch"Limit process EPC usage with misc
>> cgroup controller" and restructure it (Kai)
>> ---
>> arch/x86/Kconfig | 13 ++++
>> arch/x86/kernel/cpu/sgx/Makefile | 1 +
>> arch/x86/kernel/cpu/sgx/epc_cgroup.c | 103 +++++++++++++++++++++++++++
>> arch/x86/kernel/cpu/sgx/epc_cgroup.h | 36 ++++++++++
>> arch/x86/kernel/cpu/sgx/main.c | 28 ++++++++
>> arch/x86/kernel/cpu/sgx/sgx.h | 3 +
>> 6 files changed, 184 insertions(+)
>> create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 66bfabae8814..e17c5dc3aea4 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1921,6 +1921,19 @@ config X86_SGX
>>
>> If unsure, say N.
>>
>> +config CGROUP_SGX_EPC
>> + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC)
>> for Intel SGX"
>> + depends on X86_SGX && CGROUP_MISC
>> + help
>> + Provides control over the EPC footprint of tasks in a cgroup via
>> + the Miscellaneous cgroup controller.
>> +
>> + EPC is a subset of regular memory that is usable only by SGX
>> + enclaves and is very limited in quantity, e.g. less than 1%
>> + of total DRAM.
>> +
>> + Say N if unsure.
>> +
>> config X86_USER_SHADOW_STACK
>> bool "X86 userspace shadow stack"
>> depends on AS_WRUSS
>> diff --git a/arch/x86/kernel/cpu/sgx/Makefile
>> b/arch/x86/kernel/cpu/sgx/Makefile
>> index 9c1656779b2a..12901a488da7 100644
>> --- a/arch/x86/kernel/cpu/sgx/Makefile
>> +++ b/arch/x86/kernel/cpu/sgx/Makefile
>> @@ -4,3 +4,4 @@ obj-y += \
>> ioctl.o \
>> main.o
>> obj-$(CONFIG_X86_SGX_KVM) += virt.o
>> +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o
>> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> new file mode 100644
>> index 000000000000..500627d0563f
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
>> @@ -0,0 +1,103 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright(c) 2022 Intel Corporation.
>> +
>> +#include <linux/atomic.h>
>> +#include <linux/kernel.h>
>> +#include "epc_cgroup.h"
>> +
>> +static inline struct sgx_epc_cgroup
>> *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg)
>> +{
>> + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv);
>> +}
>> +
>> +static inline bool sgx_epc_cgroup_disabled(void)
>> +{
>> + return !cgroup_subsys_enabled(misc_cgrp_subsys);
>
> From below, the root EPC cgroup is dynamically allocated. Shouldn't it
> also
> check whether the root EPC cgroup is valid?
>
Good point. I think I'll go with the static instance approach below.
>> +}
>> +
>> +/**
>> + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single
>> EPC page
>> + *
>> + * Returns EPC cgroup or NULL on success, -errno on failure.
>> + */
>> +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void)
>> +{
>> + struct sgx_epc_cgroup *epc_cg;
>> + int ret;
>> +
>> + if (sgx_epc_cgroup_disabled())
>> + return NULL;
>> +
>> + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg());
>> + ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
>> +
>> + if (!ret) {
>> + /* No epc_cg returned, release ref from get_current_misc_cg() */
>> + put_misc_cg(epc_cg->cg);
>> + return ERR_PTR(-ENOMEM);
>
> misc_cg_try_charge() returns 0 when successfully charged, no?
Right. I really made some mess in rebasing :-(
>
>> + }
>> +
>> + /* Ref released in sgx_epc_cgroup_uncharge() */
>> + return epc_cg;
>> +}
>
> IMHO the above _try_charge() returning a pointer of EPC cgroup is a
> little bit
> odd, because it doesn't match the existing misc_cg_try_charge() which
> returns
> whether the charge is successful or not. sev_misc_cg_try_charge()
> matches
> misc_cg_try_charge() too.
>
> I think it's better to split "getting EPC cgroup" part out as a separate
> helper,
> and make this _try_charge() match existing pattern:
>
> struct sgx_epc_cgroup *sgx_get_current_epc_cg(void)
> {
> if (sgx_epc_cgroup_disabled())
> return NULL;
>
> return sgx_epc_cgroup_from_misc_cg(get_current_misc_cg());
> }
>
> int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg)
> {
> if (!epc_cg)
> return -EINVAL;
>
> return misc_cg_try_charge(epc_cg->cg);
> }
>
> Having sgx_get_current_epc_cg() also makes the caller easier to read,
> because we
> can immediately know we are going to charge the *current* EPC cgroup,
> but not
> some cgroup hidden within sgx_epc_cgroup_try_charge().
>
Actually, unlike other misc controllers, we need charge and get the epc_cg
reference at the same time. That's why it was returning the pointer. How
about rename them sgx_{charge_and_get, uncharge_and_put}_epc_cg()? In
final version, there is a __sgx_epc_cgroup_try_charge() that wraps
misc_cg_try_charge().
>> +
>> +/**
>> + * sgx_epc_cgroup_uncharge() - hierarchically uncharge EPC pages
>> + * @epc_cg: the charged epc cgroup
>> + */
>> +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg)
>> +{
>> + if (sgx_epc_cgroup_disabled())
>> + return;
>
> If with above change, check !epc_cg instead.
>
>> +
>> + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE);
>> +
>> + /* Ref got from sgx_epc_cgroup_try_charge() */
>> + put_misc_cg(epc_cg->cg);
>> +}
>>
>> +
>> +static void sgx_epc_cgroup_free(struct misc_cg *cg)
>> +{
>> + struct sgx_epc_cgroup *epc_cg;
>> +
>> + epc_cg = sgx_epc_cgroup_from_misc_cg(cg);
>> + if (!epc_cg)
>> + return;
>> +
>> + kfree(epc_cg);
>> +}
>> +
>> +static int sgx_epc_cgroup_alloc(struct misc_cg *cg);
>> +
>> +const struct misc_operations_struct sgx_epc_cgroup_ops = {
>> + .alloc = sgx_epc_cgroup_alloc,
>> + .free = sgx_epc_cgroup_free,
>> +};
>> +
>> +static int sgx_epc_cgroup_alloc(struct misc_cg *cg)
>> +{
>> + struct sgx_epc_cgroup *epc_cg;
>> +
>> + epc_cg = kzalloc(sizeof(*epc_cg), GFP_KERNEL);
>> + if (!epc_cg)
>> + return -ENOMEM;
>> +
>> + cg->res[MISC_CG_RES_SGX_EPC].misc_ops = &sgx_epc_cgroup_ops;
>> + cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg;
>> + epc_cg->cg = cg;
>> + return 0;
>> +}
>> +
>> +static int __init sgx_epc_cgroup_init(void)
>> +{
>> + struct misc_cg *cg;
>> +
>> + if (!boot_cpu_has(X86_FEATURE_SGX))
>> + return 0;
>> +
>> + cg = misc_cg_root();
>> + BUG_ON(!cg);
>
> BUG_ON() will catch some eyeball, but it cannot be NULL in practice IIUC.
>
> I am not sure whether you can just make misc @root_cg visible (instead
> of having
> the misc_cg_root() helper) and directly use @root_cg here to avoid using
> the
> BUG(). No opinion here.
>
I can remove BUG_ON(). It should never happen anyways.
>> +
>> + return sgx_epc_cgroup_alloc(cg);
>
> As mentioned above the memory allocation can fail, in which case EPC
> cgroup is
> effectively disabled IIUC?
>
> One way is to manually check whether root EPC cgroup is valid in
> sgx_epc_cgroup_disabled(). Alternatively, you can have a static root
> EPC cgroup
> here:
>
> static struct sgx_epc_cgroup root_epc_cg;
>
> In this way you can have a sgx_epc_cgroup_init(&epc_cg), and call it from
> sgx_epc_cgroup_alloc().
>
Yeah, I think that is reasonable.
>> +}
>> +subsys_initcall(sgx_epc_cgroup_init);
>> diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
>> b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
>> new file mode 100644
>> index 000000000000..c3abfe82be15
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
>> @@ -0,0 +1,36 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2022 Intel Corporation. */
>> +#ifndef _INTEL_SGX_EPC_CGROUP_H_
>> +#define _INTEL_SGX_EPC_CGROUP_H_
>> +
>> +#include <asm/sgx.h>
>> +#include <linux/cgroup.h>
>> +#include <linux/list.h>
>> +#include <linux/misc_cgroup.h>
>> +#include <linux/page_counter.h>
>> +#include <linux/workqueue.h>
>> +
>> +#include "sgx.h"
>> +
>> +#ifndef CONFIG_CGROUP_SGX_EPC
>> +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES
>
> Do you need this macro?
I remember I got some compiling error without it but I don't see why it
should be needed. I'll double check next round. thanks.
>
>> +struct sgx_epc_cgroup;
>> +
>> +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void)
>> +{
>> + return NULL;
>> +}
>> +
>> +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup
>> *epc_cg) { }
>> +#else
>> +struct sgx_epc_cgroup {
>> + struct misc_cg *cg;
>> +};
>> +
>> +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void);
>> +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg);
>> +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root);
>
> Why do you need sgx_epc_cgroup_lru_empty() here?
>
leftover from rebasing. Will remove.
>> +
>> +#endif
>> +
>> +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */
>> diff --git a/arch/x86/kernel/cpu/sgx/main.c
>> b/arch/x86/kernel/cpu/sgx/main.c
>> index 166692f2d501..07606f391540 100644
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -6,6 +6,7 @@
>> #include <linux/highmem.h>
>> #include <linux/kthread.h>
>> #include <linux/miscdevice.h>
>> +#include <linux/misc_cgroup.h>
>> #include <linux/node.h>
>> #include <linux/pagemap.h>
>> #include <linux/ratelimit.h>
>> @@ -17,6 +18,7 @@
>> #include "driver.h"
>> #include "encl.h"
>> #include "encls.h"
>> +#include "epc_cgroup.h"
>>
>> struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>> static int sgx_nr_epc_sections;
>> @@ -559,6 +561,11 @@ int sgx_unmark_page_reclaimable(struct
>> sgx_epc_page *page)
>> struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>> {
>> struct sgx_epc_page *page;
>> + struct sgx_epc_cgroup *epc_cg;
>> +
>> + epc_cg = sgx_epc_cgroup_try_charge();
>> + if (IS_ERR(epc_cg))
>> + return ERR_CAST(epc_cg);
>>
>> for ( ; ; ) {
>> page = __sgx_alloc_epc_page();
>> @@ -580,10 +587,21 @@ struct sgx_epc_page *sgx_alloc_epc_page(void
>> *owner, bool reclaim)
>> break;
>> }
>>
>> + /*
>> + * Need to do a global reclamation if cgroup was not full but free
>> + * physical pages run out, causing __sgx_alloc_epc_page() to fail.
>> + */
>> sgx_reclaim_pages();
>
> What's the final behaviour? IIUC it should be reclaiming from the
> *current* EPC
> cgroup? If so shouldn't we just pass the @epc_cg to it here?
>
> I think we can make this patch as "structure" patch w/o actually having
> EPC
> cgroup enabled, i.e., sgx_get_current_epc_cg() always return NULL.
>
> And we can have one patch to change sgx_reclaim_pages() to take the
> 'struct
> sgx_epc_lru_list *' as argument:
>
> void sgx_reclaim_pages_lru(struct sgx_epc_lru_list * lru)
> {
> ...
> }
>
> Then here we can have something like:
>
> void sgx_reclaim_pages(struct sgx_epc_cg *epc_cg)
> {
> struct sgx_epc_lru_list *lru = epc_cg ? &epc_cg->lru :
> &sgx_global_lru;
>
> sgx_reclaim_pages_lru(lru);
> }
>
> Makes sense?
>
This is purely global reclamation. No cgroup involved. You can see it
later in changes in patch 10/12. For now I just make a comment there but
no real changes. Cgroup reclamation will be done as part of _try_charge
call.
>> cond_resched();
>> }
>>
>> + if (!IS_ERR(page)) {
>> + WARN_ON_ONCE(page->epc_cg);
>> + page->epc_cg = epc_cg;
>> + } else {
>> + sgx_epc_cgroup_uncharge(epc_cg);
>> + }
>> +
>> if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
>> wake_up(&ksgxd_waitq);
>>
>> @@ -604,6 +622,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>> struct sgx_epc_section *section = &sgx_epc_sections[page->section];
>> struct sgx_numa_node *node = section->node;
>>
>> + if (page->epc_cg) {
>> + sgx_epc_cgroup_uncharge(page->epc_cg);
>> + page->epc_cg = NULL;
>> + }
>> +
>> spin_lock(&node->lock);
>>
>> page->owner = NULL;
>> @@ -643,6 +666,7 @@ static bool __init sgx_setup_epc_section(u64
>> phys_addr, u64 size,
>> section->pages[i].flags = 0;
>> section->pages[i].owner = NULL;
>> section->pages[i].poison = 0;
>> + section->pages[i].epc_cg = NULL;
>> list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list);
>> }
>>
>> @@ -787,6 +811,7 @@ static void __init arch_update_sysfs_visibility(int
>> nid) {}
>> static bool __init sgx_page_cache_init(void)
>> {
>> u32 eax, ebx, ecx, edx, type;
>> + u64 capacity = 0;
>> u64 pa, size;
>> int nid;
>> int i;
>> @@ -837,6 +862,7 @@ static bool __init sgx_page_cache_init(void)
>>
>> sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
>> sgx_numa_nodes[nid].size += size;
>> + capacity += size;
>>
>> sgx_nr_epc_sections++;
>> }
>> @@ -846,6 +872,8 @@ static bool __init sgx_page_cache_init(void)
>> return false;
>> }
>>
>> + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity);
>> +
>> return true;
>> }
>
> I would separate setting up capacity as a separate patch.
I thought about that, but again it was only 3-4 lines all in this function
and it's also necessary part of basic setup for misc controller...
>
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h
>> b/arch/x86/kernel/cpu/sgx/sgx.h
>> index d2dad21259a8..b1786774b8d2 100644
>> --- a/arch/x86/kernel/cpu/sgx/sgx.h
>> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
>> @@ -29,12 +29,15 @@
>> /* Pages on free list */
>> #define SGX_EPC_PAGE_IS_FREE BIT(1)
>>
>> +struct sgx_epc_cgroup;
>> +
>> struct sgx_epc_page {
>> unsigned int section;
>> u16 flags;
>> u16 poison;
>> struct sgx_encl_page *owner;
>> struct list_head list;
>> + struct sgx_epc_cgroup *epc_cg;
>> };
>>
>
> Adding @epc_cg unconditionally means even with !CONFIG_CGROUP_SGX_EPC
> the memory
> is still occupied. IMHO that would bring non-trivial memory waste as
> it's 8-
> bytes for each EPC page.
>
Ok, I'll add ifdef
Powered by blists - more mailing lists