[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b049e0d8-a9d2-456f-aa97-148f4a6d8071@sk.com>
Date: Tue, 4 Mar 2025 21:53:13 +0900
From: Honggyu Kim <honggyu.kim@...com>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: kernel_team@...ynix.com, gourry@...rry.net, harry.yoo@...cle.com,
ying.huang@...ux.alibaba.com, gregkh@...uxfoundation.org, rakie.kim@...com,
akpm@...ux-foundation.org, rafael@...nel.org, lenb@...nel.org,
dan.j.williams@...el.com, Jonathan.Cameron@...wei.com, dave.jiang@...el.com,
horen.chuang@...ux.dev, hannes@...xchg.org, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-mm@...ck.org, kernel-team@...a.com,
yunjeong.mun@...com
Subject: Re: [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for
memoryless nodes
Hi Joshua,
On 3/4/2025 6:56 AM, Joshua Hahn wrote:
> On Thu, 27 Feb 2025 12:20:03 +0900 Honggyu Kim <honggyu.kim@...com> wrote:
>
> Hi Honggyu, thank you for taking time to review my patch, as always!
My pleasure!
> I thought I had sent this, but it seems like it was left in my draft
> without being sent.
>
> I will follow Gregory's advice and we will drop the patch from this series,
> and send the first patch only (with Yunjeong's changes). Thanks again!
It'd be great if you could add her with the following.
Co-developed-by: Yunjeong Mun <yunjeong.mun@...com>
>
>>
>> On 2/27/2025 11:32 AM, Honggyu Kim wrote:
>>> Hi Joshua,
>>>
>>> On 2/27/2025 6:35 AM, Joshua Hahn wrote:
>>>> We should never try to allocate memory from a memoryless node. Creating a
>>>> sysfs knob to control its weighted interleave weight does not make sense,
>>>> and can be unsafe.
>>>>
>>>> Only create weighted interleave weight knobs for nodes with memory.
>>>>
>>>> Signed-off-by: Joshua Hahn <joshua.hahnjy@...il.com>
>>>> ---
>>>> mm/mempolicy.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>>> index 4cc04ff8f12c..50cbb7c047fa 100644
>>>> --- a/mm/mempolicy.c
>>>> +++ b/mm/mempolicy.c
>>>> @@ -3721,7 +3721,7 @@ static int add_weighted_interleave_group(struct
>>>> kobject *root_kobj)
>>>> return err;
>>>> }
>>>> - for_each_node_state(nid, N_POSSIBLE) {
>>>
>>> Actually, we're aware of this issue and currently trying to fix this.
>>> In our system, we've attached 4ch of CXL memory for each socket as
>>> follows.
>>>
>>> node0 node1
>>> +-------+ UPI +-------+
>>> | CPU 0 |-+-----+-| CPU 1 |
>>> +-------+ +-------+
>>> | DRAM0 | | DRAM1 |
>>> +---+---+ +---+---+
>>> | |
>>> +---+---+ +---+---+
>>> | CXL 0 | | CXL 4 |
>>> +---+---+ +---+---+
>>> | CXL 1 | | CXL 5 |
>>> +---+---+ +---+---+
>>> | CXL 2 | | CXL 6 |
>>> +---+---+ +---+---+
>>> | CXL 3 | | CXL 7 |
>>> +---+---+ +---+---+
>>> node2 node3
>>>
>>> The 4ch of CXL memory are detected as a single NUMA node in each socket,
>>> but it shows as follows with the current N_POSSIBLE loop.
>>>
>>> $ ls /sys/kernel/mm/mempolicy/weighted_interleave/
>>> node0 node1 node2 node3 node4 node5
>>> node6 node7 node8 node9 node10 node11
FYI, we used to set node2 and node3 only for weights for CXL memory here
and ignored node{4-11}. That sounds silly but it worked.
>
> I see. For my education, would you mind explaining how the numbering works
> here? I am not very familiar with this setup, and not sure how you would
> figure out what node is which, just by looking at the numbering.
Regarding the numbering, I'm not 100% sure, but I guess there could be a
logical NUMA node that combines 4ch of CXL memory and 4 nodes for CXL
memory so in total 5 nodes per socket.
I don't have much knowledge on this but maybe this is related to PXM
(Proximity Domain).
>
>>>> + for_each_node_state(nid, N_MEMORY) {
>>
>> Thinking it again, we can leave it as a separate patch but add our patch
>> on top of it.
>
> That sounds good to me.
>
>> The only concern I have is having only N_MEMORY patch hides weight
>> setting knobs for CXL memory and it makes there is no way to set weight
>> values to CXL memory in my system.
>
> You can use weighted interleave auto-tuning : -)
Not possible because using N_MEMORY doesn't provide "node" knobs for CXL
memory at all as follows.
$ ls /sys/kernel/mm/mempolicy/weighted_interleave/
node0 node1
We need node2 and node3 for CXL memory here.
> In all seriousness, this makes sense. It seems pretty problematic that
> the knobs aren't created for the CXL channels,
Yeah, it's even worse than the current status.
> and I'm not sure that hiding> it is the correct approach here (it was not my intent, either).
It isn't your problem but we shouldn't hide those nodes until it is
correctly fixed with hot plugging event handler.
>
>> IMHO, this and our patch is better to be submitted together.
>
> That sounds good. We can hold off on this patch then, and just consider
> the first patch of this series. Thank you for letting me know!
The N_POSSIBLE and N_MEMORY stuffs should had been fixed earlier than
this work. I will take a few days if we can submit it together.
>
> Thank you for always reviewing my patches. Have a great day!
> Joshua
Thanks for your work and have a great day you too!
Kind regards,
Honggyu
>
> Sent using hkml (https://github.com/sjp38/hackermail)
>
Powered by blists - more mailing lists