[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2d2fqmpswjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Tue, 07 Nov 2023 19:00:12 -0600
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: dave.hansen@...ux.intel.com, tj@...nel.org, mkoutny@...e.com,
linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
x86@...nel.org, cgroups@...r.kernel.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, hpa@...or.com,
sohil.mehta@...el.com, "Jarkko Sakkinen" <jarkko@...nel.org>,
"Haitao Huang" <haitao.huang@...ux.intel.com>
Cc: zhiquan1.li@...el.com, kristen@...ux.intel.com, seanjc@...gle.com,
zhanb@...rosoft.com, anakrish@...rosoft.com,
mikko.ylinen@...ux.intel.com, yangjie@...rosoft.com
Subject: Re: [PATCH v6 00/12] Add Cgroup support for SGX EPC memory
On Mon, 06 Nov 2023 09:48:36 -0600, Haitao Huang
<haitao.huang@...ux.intel.com> wrote:
> On Sun, 05 Nov 2023 21:26:44 -0600, Jarkko Sakkinen <jarkko@...nel.org>
> wrote:
>
>> On Mon, 2023-10-30 at 11:20 -0700, Haitao Huang wrote:
>>> SGX Enclave Page Cache (EPC) memory allocations are separate from
>>> normal RAM allocations, and
>>> are managed solely by the SGX subsystem. The existing cgroup memory
>>> controller cannot be used
>>> to limit or account for SGX EPC memory, which is a desirable feature
>>> in some environments,
>>> e.g., support for pod level control in a Kubernates cluster on a VM or
>>> baremetal host [1,2].
>>> This patchset implements the support for sgx_epc memory within the
>>> misc cgroup controller. The
>>> user can use the misc cgroup controller to set and enforce a max limit
>>> on total EPC usage per
>>> cgroup. The implementation reports current usage and events of
>>> reaching the limit per cgroup as
>>> well as the total system capacity.
>>> With the EPC misc controller enabled, every EPC page allocation is
>>> accounted for a cgroup's
>>> usage, reflected in the 'sgx_epc' entry in the 'misc.current'
>>> interface file of the cgroup.
>>> Much like normal system memory, EPC memory can be overcommitted via
>>> virtual memory techniques
>>> and pages can be swapped out of the EPC to their backing store (normal
>>> system memory allocated
>>> via shmem, accounted by the memory controller). When the EPC usage of
>>> a cgroup reaches its hard
>>> limit ('sgx_epc' entry in the 'misc.max' file), the cgroup starts a
>>> reclamation process to swap
>>> out some EPC pages within the same cgroup and its descendant to their
>>> backing store. Although
>>> the SGX architecture supports swapping for all pages, to avoid extra
>>> complexities, this
>>> implementation does not support swapping for certain page types, e.g.
>>> Version Array(VA) pages,
>>> and treat them as unreclaimable pages. When the limit is reached but
>>> nothing left in the
>>> cgroup for reclamation, i.e., only unreclaimable pages left, any new
>>> EPC allocation in the
>>> cgroup will result in an ENOMEM error.
>>>
>>> The EPC pages allocated for guest VMs by the virtual EPC driver are
>>> not reclaimable by the host
>>> kernel [5]. Therefore they are also treated as unreclaimable from
>>> cgroup's point of view. And
>>> the virtual EPC driver translates an ENOMEM error resulted from an EPC
>>> allocation request into
>>> a SIGBUS to the user process.
>>>
>>> This work was originally authored by Sean Christopherson a few years
>>> ago, and previously
>>> modified by Kristen C. Accardi to utilize the misc cgroup controller
>>> rather than a custom
>>> controller. I have been updating the patches based on review comments
>>> since V2 [3, 4, 10],
>>> simplified the implementation/design and fixed some stability issues
>>> found from testing.
>>> The patches are organized as following:
>>> - Patches 1-3 are prerequisite misc cgroup changes for adding new
>>> APIs, structs, resource
>>> types.
>>> - Patch 4 implements basic misc controller for EPC without reclamation.
>>> - Patches 5-9 prepare for per-cgroup reclamation.
>>> * Separate out the existing infrastructure of tracking reclaimable
>>> pages
>>> from the global reclaimer(ksgxd) to a newly created LRU list
>>> struct.
>>> * Separate out reusable top-level functions for reclamation.
>>> - Patch 10 adds support for per-cgroup reclamation.
>>> - Patch 11 adds documentation for the EPC cgroup.
>>> - Patch 12 adds test scripts.
>>>
>>> I appreciate your review and providing tags if appropriate.
>>>
>>> ---
>>> V6:
>>> - Dropped OOM killing path, only implement non-preemptive enforcement
>>> of max limit (Dave, Michal)
>>> - Simplified reclamation flow by taking out sgx_epc_reclaim_control,
>>> forced reclamation by
>>> ignoring 'age".
>>> - Restructured patches: split misc API + resource types patch and the
>>> big EPC cgroup patch
>>> (Kai, Michal)
>>> - Dropped some Tested-by/Reviewed-by tags due to significant changes
>>> - Added more selftests
>>>
>>> v5:
>>> - Replace the manual test script with a selftest script.
>>> - Restore the "From" tag for some patches to Sean (Kai)
>>> - Style fixes (Jarkko)
>>>
>>> v4:
>>> - Collected "Tested-by" from Mikko. I kept it for now as no functional
>>> changes in v4.
>>> - Rebased on to v6.6_rc1 and reordered patches as described above.
>>> - Separated out the bug fixes [7,8,9]. This series depend on those
>>> patches. (Dave, Jarkko)
>>> - Added comments in commit message to give more preview what's to come
>>> next. (Jarkko)
>>> - Fixed some documentation error, gap, style (Mikko, Randy)
>>> - Fixed some comments, typo, style in code (Mikko, Kai)
>>> - Patch format and background for reclaimable vs unreclaimable (Kai,
>>> Jarkko)
>>> - Fixed typo (Pavel)
>>> - Exclude the previous fixes/enhancements for self-tests. Patch 18 now
>>> depends on series [6]
>>> - Use the same to list for cover and all patches. (Solo)
>>> v3:
>>> - Added EPC states to replace flags in sgx_epc_page struct. (Jarkko)
>>> - Unrolled wrappers for cond_resched, list (Dave)
>>> - Separate patches for adding reclaimable and unreclaimable lists.
>>> (Dave)
>>> - Other improvements on patch flow, commit messages, styles. (Dave,
>>> Jarkko)
>>> - Simplified the cgroup tree walking with plain
>>> css_for_each_descendant_pre.
>>> - Fixed race conditions and crashes.
>>> - OOM killer to wait for the victim enclave pages being reclaimed.
>>> - Unblock the user by handling misc_max_write callback asynchronously.
>>> - Rebased onto 6.4 and no longer base this series on the MCA patchset.
>>> - Fix an overflow in misc_try_charge.
>>> - Fix a NULL pointer in SGX PF handler.
>>> - Updated and included the SGX selftest patches previously reviewed.
>>> Those
>>> patches fix issues triggered in high EPC pressure required for cgroup
>>> testing.
>>> - Added test scripts to help setup and test SGX EPC cgroups.
>>>
>>> [1]https://lore.kernel.org/all/DM6PR21MB11772A6ED915825854B419D6C4989@DM6PR21MB1177.namprd21.prod.outlook.com/
>>> [2]https://lore.kernel.org/all/ZD7Iutppjj+muH4p@himmelriiki/
>>> [3]https://lore.kernel.org/all/20221202183655.3767674-1-kristen@linux.intel.com/
>>> [4]https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
>>> [5]Documentation/arch/x86/sgx.rst, Section "Virtual EPC"
>>> [6]https://lore.kernel.org/linux-sgx/20220905020411.17290-1-jarkko@kernel.org/
>>> [7]https://lore.kernel.org/linux-sgx/ZLcXmvDKheCRYOjG@slm.duckdns.org/
>>> [8]https://lore.kernel.org/linux-sgx/20230721120231.13916-1-haitao.huang@linux.intel.com/
>>> [9]https://lore.kernel.org/linux-sgx/20230728051024.33063-1-haitao.huang@linux.intel.com/
>>> [10]https://lore.kernel.org/all/20230923030657.16148-1-haitao.huang@linux.intel.com/
>>>
>>> Haitao Huang (2):
>>> x86/sgx: Introduce EPC page states
>>> selftests/sgx: Add scripts for EPC cgroup testing
>>>
>>> Kristen Carlson Accardi (5):
>>> cgroup/misc: Add per resource callbacks for CSS events
>>> cgroup/misc: Export APIs for SGX driver
>>> cgroup/misc: Add SGX EPC resource type
>>> x86/sgx: Implement basic EPC misc cgroup functionality
>>> x86/sgx: Implement EPC reclamation for cgroup
>>>
>>> Sean Christopherson (5):
>>> x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list
>>> x86/sgx: Use sgx_epc_lru_list for existing active page list
>>> x86/sgx: Use a list to track to-be-reclaimed pages
>>> x86/sgx: Restructure top-level EPC reclaim function
>>> Docs/x86/sgx: Add description for cgroup support
>>>
>>> Documentation/arch/x86/sgx.rst | 74 ++++
>>> arch/x86/Kconfig | 13 +
>>> arch/x86/kernel/cpu/sgx/Makefile | 1 +
>>> arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>>> arch/x86/kernel/cpu/sgx/epc_cgroup.c | 319 ++++++++++++++++++
>>> arch/x86/kernel/cpu/sgx/epc_cgroup.h | 49 +++
>>> arch/x86/kernel/cpu/sgx/main.c | 245 +++++++++-----
>>> arch/x86/kernel/cpu/sgx/sgx.h | 88 ++++-
>>> include/linux/misc_cgroup.h | 42 +++
>>> kernel/cgroup/misc.c | 52 ++-
>>> .../selftests/sgx/run_epc_cg_selftests.sh | 196 +++++++++++
>>> .../selftests/sgx/watch_misc_for_tests.sh | 13 +
>>> 12 files changed, 996 insertions(+), 98 deletions(-)
>>> create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
>>> create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h
>>> create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
>>> create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh
>>>
>>
>> Is this expected to work on NUC7?
>>
>> Planning to test this next week (no time this week).
>>
>> BR, Jarkko
>
> I don't see a reason why it would not be working on a NUC. I'll try to
> get access to one and test it too.
Tried on a NUC with about 90M EPC. The selftests worked fine.
BR
Haitao
Powered by blists - more mailing lists