[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240117114926.1895-1-honggyu.kim@sk.com>
Date: Wed, 17 Jan 2024 20:49:25 +0900
From: Honggyu Kim <honggyu.kim@...com>
To: sj@...nel.org
Cc: akpm@...ux-foundation.org,
apopple@...dia.com,
baolin.wang@...ux.alibaba.com,
damon@...ts.linux.dev,
dave.jiang@...el.com,
honggyu.kim@...com,
hyeongtak.ji@...com,
kernel_team@...ynix.com,
linmiaohe@...wei.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org,
lizhijian@...fujitsu.com,
mathieu.desnoyers@...icios.com,
mhiramat@...nel.org,
rakie.kim@...com,
rostedt@...dmis.org,
surenb@...gle.com,
yangx.jy@...itsu.com,
ying.huang@...el.com,
ziy@...dia.com
Subject: Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
Hi SeongJae,
Thanks very much for your comments in details.
On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@...nel.org> wrote:
> Thank you so much for this great patches and the above nice test results. I
> believe the test setup and results make sense, and merging a revised version of
> this patchset would provide real benefits to the users.
Glad to hear that!
> In a high level, I think it might better to separate DAMON internal changes
> from DAMON external changes.
I agree. I can't guarantee but I can move all the external changes
inside mm/damon, but will try that as much as possible.
> For DAMON part changes, I have no big concern other than trivial coding style
> level comments.
Sure. I will fix those.
> For DAMON-external changes that implementing demote_pages() and
> promote_pages(), I'm unsure if the implementation is reusing appropriate
> functions, and if those are placee in right source file. Especially, I'm
> unsure if vmscan.c is the right place for promotion code. Also I don't know if
> there is a good agreement on the promotion/demotion target node decision. That
> should be because I'm not that familiar with the areas and the files, but I
> feel this might because our discussions on the promotion and the demotion
> operations are having rooms for being more matured. Because I'm not very
> faimiliar with the part, I'd like to hear others' comments, too.
I would also like to hear others' comments, but this might not be needed
if most of external code can be moved to mm/damon.
> To this end, I feel the problem might be able te be simpler, because this
> patchset is trying to provide two sophisticated operations, while I think a
> simpler approach might be possible. My humble simpler idea is adding a DAMOS
> operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> instead of the promote/demote. Because the general pages migration can handle
> multiple cases including the promote/demote in my humble assumption.
My initial implementation was similar but I found that it's not accurate
enough due to the nature of inaccuracy of DAMON regions. I saw that
many pages were demoted and promoted back and forth because migration
target regions include both hot and cold pages together.
So I have implemented the demotion and promotion logics based on the
shrink_folio_list, which contains many corner case handling logics for
reclaim.
Having the current demotion and promotion logics makes the hot/cold
migration pretty accurate as expected. We made a simple program called
"hot_cold" and it receives 2 arguments for hot size and cold size in MB.
For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
of cold memory. It basically allocates 2 large blocks of memory with
mmap, then repeat memset for the initial 200MB to make it accessed in an
infinite loop.
Let's say there are 3 nodes in the system and the first node0 and node1
are the first tier, and node2 is the second tier.
$ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-1
$ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
2
Here is the result of partitioning hot/cold memory and I put execution
command at the right side of numastat result. I initially ran each
hot_cold program with preferred setting so that they initially allocate
memory on one of node0 or node2, but they gradually migrated based on
their access frequencies.
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
--------------- ------ ------ ------ -----
754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
--------------- ------ ------ ------ -----
Total 2938 0 5756 8695
The final node placement result shows that DAMON accurately migrated
pages by their hotness for multiple processes.
> In more detail, users could decide which is the appropriate node for promotion
> or demotion and use the new DAMOS action to do promotion and demotion. Users
> would requested to decide which node is the proper promotion/demotion target
> nodes, but that decision wouldn't be that hard in my opinion.
>
> For this, 'struct damos' would need to be updated for such argument-dependent
> actions, like 'struct damos_filter' is haing a union.
That might be a better solution. I will think about it.
> In future, we could extend the operation to the promotion and the demotion
> after the dicussion around the promotion and demotion is matured, if required.
> And assuming DAMON be extended for originating CPU-aware access monitoring, the
> new DAMOS action would also cover more use cases such as general NUMA nodes
> balancing (extending DAMON for CPU-aware monitoring would required), and some
> complex configurations where having both CPU affinity and tiered memory. I
> also think that may well fit with my RFC idea[2] for tiered memory management.
>
> Looking forward to opinions from you and others. I admig I miss many things,
> and more than happy to be enlightened.
>
> [1] https://lwn.net/Articles/944007/
> [2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/
Thanks very much for your comments. I will need a few more days for the
update but will try to address your concerns as much as possible.
Thanks,
Honggyu
Powered by blists - more mailing lists