[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87czglhaso.fsf@linux.ibm.com>
Date: Tue, 10 May 2022 17:14:23 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
To: Wei Xu <weixugc@...gle.com>,
Hesham Almatary <hesham.almatary@...wei.com>
Cc: Yang Shi <shy828301@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Huang Ying <ying.huang@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Linux MM <linux-mm@...ck.org>,
Greg Thelen <gthelen@...gle.com>,
Jagdish Gediya <jvgediya@...ux.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Alistair Popple <apopple@...dia.com>,
Davidlohr Bueso <dave@...olabs.net>,
Michal Hocko <mhocko@...nel.org>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Brice Goglin <brice.goglin@...il.com>,
Feng Tang <feng.tang@...el.com>,
Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: RFC: Memory Tiering Kernel Interfaces
Wei Xu <weixugc@...gle.com> writes:
> On Mon, May 9, 2022 at 7:32 AM Hesham Almatary
> <hesham.almatary@...wei.com> wrote:
>>
....
> > nearest lower tier before demoting to lower lower tiers.
>> There might still be simple cases/topologies where we might want to "skip"
>> the very next lower tier. For example, assume we have a 3 tiered memory
>> system as follows:
>>
>> node 0 has a CPU and DDR memory in tier 0, node 1 has GPU and DDR memory
>> in tier 0,
>> node 2 has NVMM memory in tier 1, node 3 has some sort of bigger memory
>> (could be a bigger DDR or something) in tier 2. The distances are as
>> follows:
>>
>> -------------- --------------
>> | Node 0 | | Node 1 |
>> | ------- | | ------- |
>> | | DDR | | | | DDR | |
>> | ------- | | ------- |
>> | | | |
>> -------------- --------------
>> | 20 | 120 |
>> v v |
>> ---------------------------- |
>> | Node 2 PMEM | | 100
>> ---------------------------- |
>> | 100 |
>> v v
>> --------------------------------------
>> | Node 3 Large mem |
>> --------------------------------------
>>
>> node distances:
>> node 0 1 2 3
>> 0 10 20 20 120
>> 1 20 10 120 100
>> 2 20 120 10 100
>> 3 120 100 100 10
>>
>> /sys/devices/system/node/memory_tiers
>> 0-1
>> 2
>> 3
>>
>> N_TOPTIER_MEMORY: 0-1
>>
>>
>> In this case, we want to be able to "skip" the demotion path from Node 1
>> to Node 2,
>>
>> and make demotion go directely to Node 3 as it is closer, distance wise.
>> How can
>>
>> we accommodate this scenario (or at least not rule it out as future
>> work) with the current RFC?
>
> This is an interesting example. I think one way to support this is to
> allow all the lower tier nodes to be the demotion targets of a node in
> the higher tier. We can then use the allocation fallback order to
> select the best demotion target.
>
> For this example, we will have the demotion targets of each node as:
>
> node 0: allowed=2-3, order (based on allocation fallback order): 2, 3
> node 1: allowed=2-3, order (based on allocation fallback order): 3, 2
> node 2: allowed = 3, order (based on allocation fallback order): 3
> node 3: allowed = empty
>
> What do you think?
>
Can we simplify this further with
tier 0 - > empty (no HBM/GPU)
tier 1 -> Node0, Node1
tier 2 -> Node2, Node3
Hence
node 0: allowed=2-3, order (based on allocation fallback order): 2, 3
node 1: allowed=2-3, order (based on allocation fallback order): 3, 2
node 2: allowed = empty
node 3: allowed = empty
-aneesh
Powered by blists - more mailing lists