lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16b26a00-eb6b-7c19-6c33-144efe516b6b@intel.com>
Date:   Thu, 14 Jan 2021 10:35:03 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Jarkko Sakkinen <jarkko@...nel.org>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        linux-sgx@...r.kernel.org, Sean Christopherson <seanjc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, Jiri Kosina <trivial@...nel.org>
Subject: Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation

On 1/14/21 9:54 AM, Jarkko Sakkinen wrote:
> On Tue, Jan 12, 2021 at 04:24:01PM -0800, Dave Hansen wrote:
>> We need a bit more information here as well.  What's the relationship
>> between NUMA nodes and sections?  How does the BIOS tell us which NUMA
>> nodes a section is in?  Is it the same or different from normal RAM and
>> PMEM?
> 
> How does it go with pmem?

I just wanted to point out PMEM as being referred to by the SRAT, but as
something which is *not* "System RAM".  There might be some overlap in
NUMA for PMEM and NUMA for SGX memory since neither is enumerated as
"System RAM".

...
>> I'm not positive this works.  I *thought* these ->node_start_pfn and
>> ->node_spanned_pages are really only guaranteed to cover memory which is
>> managed by the kernel and has 'struct page' for it.
>>
>> EPC doesn't have a 'struct page', so won't necessarily be covered by the
>> pgdat-> and zone-> ranges.  I *think* you may have to go all the way
>> back to the ACPI SRAT for this.
>>
>> It would also be *possible* to have an SRAT constructed like this:
>>
>> 0->1GB System RAM - Node 0
>> 1->2GB Reserved   - Node 1
>> 2->3GB System RAM - Node 0
>>
>> Where the 1->2GB is EPC.  The Node 0 pg_data_t would be:
>>
>> 	pgdat->node_start_pfn = 0
>> 	pgdat->node_spanned_pages = 3GB
> 
> If I've understood the current Linux memory architecture correctly.
> 
> - Memory is made available through mm/memory_hotplug.c, which is populated
>   by drivers/acpi/acpi_memhotplug.c.
> - drivers/acpi/numa/srat.c provides the conversion API from proximity node to
>   logical node but I'm not *yet* sure how the interaction goes with memory
>   hot plugging
> 
> I'm not sure of I'm following the idea of alternative SRAT construciton.
> So are you saying that srat.c would somehow group pxm's with EPC to
> specific node numbers?

Basically, go look at the "SRAT:" messages in boot.  Are there SRAT
entries that cover all the EPC?  For instance, take this SRAT:

[    0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x00000000-0xcfffffff]
[    0.000000] ACPI: SRAT: Node 1 PXM 2 [mem 0x100000000-0x82fffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 1 [mem 0x830000000-0xe2fffffff]

If EPC were at 0x100000000, we would be in good shape.  It is covered by
an SRAT entry that Linux parses as RAM.  But, if it were at 0xd0000000,
it would be in an SRAT "hole", uncovered by an SRAT entry.  In this
case, since 'Node 1" spans that hole the "Node 1" pgdat would span this
hole.  But, if some memory was removed from the system, "Node 1" might
no longer span that hole and EPC in this hole would not be assignable to
Node 1.

Please just make sure that there *ARE* SRAT entries that cover EPC
memory ranges.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ