linux-kernel - Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <874jfl90y3.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Wed, 10 Jan 2024 14:06:44 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Gregory Price <gregory.price@...verge.com>,  Srinivasulu Thanneeru
 <sthanneeru@...ron.com>,  Srinivasulu Opensrc
 <sthanneeru.opensrc@...ron.com>,  "linux-cxl@...r.kernel.org"
 <linux-cxl@...r.kernel.org>,  "linux-mm@...ck.org" <linux-mm@...ck.org>,
  "aneesh.kumar@...ux.ibm.com" <aneesh.kumar@...ux.ibm.com>,
  "dan.j.williams@...el.com" <dan.j.williams@...el.com>,  "mhocko@...e.com"
 <mhocko@...e.com>,  "tj@...nel.org" <tj@...nel.org>,
  "john@...alactic.com" <john@...alactic.com>,  Eishan Mirakhur
 <emirakhur@...ron.com>,  Vinicius Tavares Petrucci
 <vtavarespetr@...ron.com>,  Ravis OpenSrc <Ravis.OpenSrc@...ron.com>,
  "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,  "Johannes
 Weiner" <hannes@...xchg.org>,  Wei Xu <weixugc@...gle.com>,  Hao Xiang
 <hao.xiang@...edance.com>,  "Ho-Ren (Jack) Chuang"
 <horenchuang@...edance.com>
Subject: Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers

Jonathan Cameron <Jonathan.Cameron@...wei.com> writes:

> On Tue, 09 Jan 2024 11:41:11 +0800
> "Huang, Ying" <ying.huang@...el.com> wrote:
>
>> Gregory Price <gregory.price@...verge.com> writes:
>> 
>> > On Thu, Jan 04, 2024 at 02:05:01PM +0800, Huang, Ying wrote:  
>> >> >
>> >> > From  https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf
>> >> > abstract_distance_offset: override by users to deal with firmware issue.
>> >> >
>> >> > say firmware can configure the cxl node into wrong tiers, similar to
>> >> > that it may also configure all cxl nodes into single memtype, hence
>> >> > all these nodes can fall into a single wrong tier.
>> >> > In this case, per node adistance_offset would be good to have ?  
>> >> 
>> >> I think that it's better to fix the error firmware if possible.  And
>> >> these are only theoretical, not practical issues.  Do you have some
>> >> practical issues?
>> >> 
>> >> I understand that users may want to move nodes between memory tiers for
>> >> different policy choices.  For that, memory_type based adistance_offset
>> >> should be good.
>> >>   
>> >
>> > There's actually an affirmative case to change memory tiering to allow
>> > either movement of nodes between tiers, or at least base placement on
>> > HMAT information. Preferably, membership would be changable to allow
>> > hotplug/DCD to be managed (there's no guarantee that the memory passed
>> > through will always be what HMAT says on initial boot).  
>> 
>> IIUC, from Jonathan Cameron as below, the performance of memory
>> shouldn't change even for DCD devices.
>> 
>> https://lore.kernel.org/linux-mm/20231103141636.000007e4@Huawei.com/
>> 
>> It's possible to change the performance of a NUMA node changed, if we
>> hot-remove a memory device, then hot-add another different memory
>> device.  It's hoped that the CDAT changes too.
>
> Not supported, but ACPI has _HMA methods to in theory allow changing
> HMAT values based on firmware notifications...  So we 'could' make
> it work for HMAT based description.
>
> Ultimately my current thinking is we'll end up emulating CXL type3
> devices (hiding topology complexity) and you can update CDAT but
> IIRC that is only meant to be for degraded situations - so if you
> want multiple performance regions, CDAT should describe them form the start.

Thank you very much for input!  So, to support degraded performance, we
will need to move a NUMA node between memory tiers.  And, per my
understanding, we should do that in kernel.

>> 
>> So, all in all, HMAT + CDAT can help us to put the memory device in
>> appropriate memory tiers.  Now, we have HMAT support in upstream.  We
>> will working on CDAT support.
>> 

--
Best Regards,
Huang, Ying