[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o7gbz5h9.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Fri, 03 Nov 2023 15:00:18 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Ravi Jonnalagadda <ravis.opensrc@...ron.com>
Cc: <akpm@...ux-foundation.org>, <aneesh.kumar@...ux.ibm.com>,
<apopple@...dia.com>, <dave.hansen@...el.com>,
<gourry.memverge@...il.com>, <gregkh@...uxfoundation.org>,
<gregory.price@...verge.com>, <hannes@...xchg.org>,
<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>, <mhocko@...e.com>, <rafael@...nel.org>,
<shy828301@...il.com>, <tim.c.chen@...el.com>, <weixugc@...gle.com>
Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave
Ravi Jonnalagadda <ravis.opensrc@...ron.com> writes:
> Should Node based interleave solution be considered complex or not would probably
> depend on number of numa nodes that would be present in the system and whether
> we are able to setup the default weights correctly to obtain optimum bandwidth
> expansion.
Node based interleave is more complex than tier based interleave.
Because you have less tiers than nodes in general.
>>
>>> Pros and Cons of Memory Tier based interleave:
>>> Pros:
>>> 1. Programming weight per initiator would apply for all the nodes in the tier.
>>> 2. Weights can be calculated considering the cumulative bandwidth of all
>>> the nodes in the tier and need to be programmed once for all the nodes in a
>>> given tier.
>>> 3. It may be useful in cases where numa nodes with similar latency and bandwidth
>>> characteristics increase, possibly with pooling use cases.
>>
>>4. simpler.
>>
>>> Cons:
>>> 1. If nodes with different bandwidth and latency characteristics are placed
>>> in same tier as seen in the current mainline kernel, it will be difficult to
>>> apply a correct interleave weight policy.
>>> 2. There will be a need for functionality to move nodes between different tiers
>>> or create new tiers to place such nodes for programming correct interleave weights.
>>> We are working on a patch to support it currently.
>>
>>Thanks! If we have such system, we will need this.
>>
>>> 3. For systems where each numa node is having different characteristics,
>>> a single node might end up existing in different memory tier, which would be
>>> equivalent to node based interleaving.
>>
>>No. A node can only exist in one memory tier.
>
> Sorry for the confusion what i meant was, if each node is having different
> characteristics, to program the memory tier weights correctly we need to place
> each node in a separate tier of it's own. So each memory tier will contain
> only a single node and the solution would resemble node based interleaving.
>
>>
>>> On newer systems where all CXL memory from different devices under a
>>> port are combined to form single numa node, this scenario might be
>>> applicable.
>>
>>You mean the different memory ranges of a NUMA node may have different
>>performance? I don't think that we can deal with this.
>
> Example Configuration: On a server that we are using now, four different
> CXL cards are combined to form a single NUMA node and two other cards are
> exposed as two individual numa nodes.
> So if we have the ability to combine multiple CXL memory ranges to a
> single NUMA node the number of NUMA nodes in the system would potentially
> decrease even if we can't combine the entire range to form a single node.
Sorry, I misunderstand your words. Yes, it's possible that there one
tier for each node in some systems. But I guess we will have less
tiers than nodes in general.
--
Best Regards,
Huang, Ying
>>
>>> 4. Users may need to keep track of different memory tiers and what nodes are present
>>> in each tier for invoking interleave policy.
>>
>>I don't think this is a con. With node based solution, you need to know
>>your system too.
>>
>>>>
>>>>> Could you elaborate on the 'get what you pay for' usecase you
>>>>> mentioned?
>>>>
>>
>>--
>>Best Regards,
>>Huang, Ying
> --
> Best Regards,
> Ravi Jonnalagadda
Powered by blists - more mailing lists