[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34b1acd1-e769-0dc2-a225-8ce3d2b6a085@intel.com>
Date: Tue, 12 Jan 2021 16:24:01 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Jarkko Sakkinen <jarkko@...nel.org>, x86@...nel.org
Cc: linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
Sean Christopherson <seanjc@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, Jiri Kosina <trivial@...nel.org>
Subject: Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation
On 12/16/20 5:50 AM, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.
It makes it harder to comment when I'm not on cc.
Hint, hint... ;)
We need a bit more information here as well. What's the relationship
between NUMA nodes and sections? How does the BIOS tell us which NUMA
nodes a section is in? Is it the same or different from normal RAM and
PMEM?
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index c519fc5f6948..0da510763c47 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -13,6 +13,13 @@
> #include "encl.h"
> #include "encls.h"
>
> +struct sgx_numa_node {
> + struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS];
> + int nr_sections;
> +};
So, we have a 'NUMA node' structure already: pg_data_t. Why don't we
just hang the epc sections off there?
> +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES];
Hmm... Time to see if I can still do math.
#define SGX_MAX_EPC_SECTIONS 8
(sizeof(struct sgx_epc_section *) + sizeof(int)) * 8 * MAX_NUMNODES
CONFIG_NODES_SHIFT=10 (on Fedora)
#define MAX_NUMNODES (1 << NODES_SHIFT)
12*8*1024 = ~100k. Yikes. For *EVERY* system that enables SGX,
regardless if they are NUMA or not.'
Trivial is great, this may be too trivial.
Adding a list_head to pg_data_t and sgx_epc_section would add
SGX_MAX_EPC_SECTIONS*sizeof(list_head)=192 bytes plus 16 bytes per
*present* NUMA node.
> /**
> * __sgx_alloc_epc_page() - Allocate an EPC page
> *
> @@ -485,14 +511,19 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> */
> struct sgx_epc_page *__sgx_alloc_epc_page(void)
> {
> - struct sgx_epc_section *section;
> struct sgx_epc_page *page;
> + int nid = numa_node_id();
> int i;
>
> - for (i = 0; i < sgx_nr_epc_sections; i++) {
> - section = &sgx_epc_sections[i];
> + page = __sgx_alloc_epc_page_from_node(nid);
> + if (page)
> + return page;
>
> - page = __sgx_alloc_epc_page_from_section(section);
> + for (i = 0; i < sgx_nr_numa_nodes; i++) {
> + if (i == nid)
> + continue;
Yikes. That's a horribly inefficient loop. Consider if nodes 0 and
1023 were the only ones with EPC. What would this loop do? I think
it's a much better idea to keep a nodemask_t of nodes that have EPC.
Then, just do bitmap searches.
> + page = __sgx_alloc_epc_page_from_node(i);
> if (page)
> return page;
> }
> @@ -661,11 +692,28 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
> ((high & GENMASK_ULL(19, 0)) << 32);
> }
>
> +static int __init sgx_pfn_to_nid(unsigned long pfn)
> +{
> + pg_data_t *pgdat;
> + int nid;
> +
> + for (nid = 0; nid < nr_node_ids; nid++) {
> + pgdat = NODE_DATA(nid);
> +
> + if (pfn >= pgdat->node_start_pfn &&
> + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
> + return nid;
> + }
> +
> + return 0;
> +}
I'm not positive this works. I *thought* these ->node_start_pfn and
->node_spanned_pages are really only guaranteed to cover memory which is
managed by the kernel and has 'struct page' for it.
EPC doesn't have a 'struct page', so won't necessarily be covered by the
pgdat-> and zone-> ranges. I *think* you may have to go all the way
back to the ACPI SRAT for this.
It would also be *possible* to have an SRAT constructed like this:
0->1GB System RAM - Node 0
1->2GB Reserved - Node 1
2->3GB System RAM - Node 0
Where the 1->2GB is EPC. The Node 0 pg_data_t would be:
pgdat->node_start_pfn = 0
pgdat->node_spanned_pages = 3GB
Powered by blists - more mailing lists