linux-kernel - Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240117114926.1895-1-honggyu.kim@sk.com>
Date: Wed, 17 Jan 2024 20:49:25 +0900
From: Honggyu Kim <honggyu.kim@...com>
To: sj@...nel.org
Cc: akpm@...ux-foundation.org,
	apopple@...dia.com,
	baolin.wang@...ux.alibaba.com,
	damon@...ts.linux.dev,
	dave.jiang@...el.com,
	honggyu.kim@...com,
	hyeongtak.ji@...com,
	kernel_team@...ynix.com,
	linmiaohe@...wei.com,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	linux-trace-kernel@...r.kernel.org,
	lizhijian@...fujitsu.com,
	mathieu.desnoyers@...icios.com,
	mhiramat@...nel.org,
	rakie.kim@...com,
	rostedt@...dmis.org,
	surenb@...gle.com,
	yangx.jy@...itsu.com,
	ying.huang@...el.com,
	ziy@...dia.com
Subject: Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

Thanks very much for your comments in details.

On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@...nel.org> wrote:

> Thank you so much for this great patches and the above nice test results.  I
> believe the test setup and results make sense, and merging a revised version of
> this patchset would provide real benefits to the users.

Glad to hear that!

> In a high level, I think it might better to separate DAMON internal changes
> from DAMON external changes.

I agree.  I can't guarantee but I can move all the external changes
inside mm/damon, but will try that as much as possible.

> For DAMON part changes, I have no big concern other than trivial coding style
> level comments.

Sure.  I will fix those.

> For DAMON-external changes that implementing demote_pages() and
> promote_pages(), I'm unsure if the implementation is reusing appropriate
> functions, and if those are placee in right source file.  Especially, I'm
> unsure if vmscan.c is the right place for promotion code.  Also I don't know if
> there is a good agreement on the promotion/demotion target node decision.  That
> should be because I'm not that familiar with the areas and the files, but I
> feel this might because our discussions on the promotion and the demotion
> operations are having rooms for being more matured.  Because I'm not very
> faimiliar with the part, I'd like to hear others' comments, too.

I would also like to hear others' comments, but this might not be needed
if most of external code can be moved to mm/damon.

> To this end, I feel the problem might be able te be simpler, because this
> patchset is trying to provide two sophisticated operations, while I think a
> simpler approach might be possible.  My humble simpler idea is adding a DAMOS
> operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> instead of the promote/demote.  Because the general pages migration can handle
> multiple cases including the promote/demote in my humble assumption.

My initial implementation was similar but I found that it's not accurate
enough due to the nature of inaccuracy of DAMON regions.  I saw that
many pages were demoted and promoted back and forth because migration
target regions include both hot and cold pages together.

So I have implemented the demotion and promotion logics based on the
shrink_folio_list, which contains many corner case handling logics for
reclaim.

Having the current demotion and promotion logics makes the hot/cold
migration pretty accurate as expected.  We made a simple program called
"hot_cold" and it receives 2 arguments for hot size and cold size in MB.
For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
of cold memory.  It basically allocates 2 large blocks of memory with
mmap, then repeat memset for the initial 200MB to make it accessed in an
infinite loop.

Let's say there are 3 nodes in the system and the first node0 and node1
are the first tier, and node2 is the second tier.

  $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
  0-1

  $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
  2

Here is the result of partitioning hot/cold memory and I put execution
command at the right side of numastat result.  I initially ran each
hot_cold program with preferred setting so that they initially allocate
memory on one of node0 or node2, but they gradually migrated based on
their access frequencies.

  $ numastat -c -p hot_cold
  Per-node process memory usage (in MBs) 
  PID              Node 0 Node 1 Node 2 Total 
  ---------------  ------ ------ ------ ----- 
  754 (hot_cold)     1800      0   2000  3800    <- hot_cold 1800 2000 
  1184 (hot_cold)     300      0    500   800    <- hot_cold 300 500 
  1818 (hot_cold)     801      0   3199  4000    <- hot_cold 800 3200 
  30289 (hot_cold)      4      0      5    10    <- hot_cold 3 5 
  30325 (hot_cold)     31      0     51    81    <- hot_cold 30 50 
  ---------------  ------ ------ ------ ----- 
  Total              2938      0   5756  8695

The final node placement result shows that DAMON accurately migrated
pages by their hotness for multiple processes.

> In more detail, users could decide which is the appropriate node for promotion
> or demotion and use the new DAMOS action to do promotion and demotion.  Users
> would requested to decide which node is the proper promotion/demotion target
> nodes, but that decision wouldn't be that hard in my opinion.
> 
> For this, 'struct damos' would need to be updated for such argument-dependent
> actions, like 'struct damos_filter' is haing a union.

That might be a better solution.  I will think about it.

> In future, we could extend the operation to the promotion and the demotion
> after the dicussion around the promotion and demotion is matured, if required.
> And assuming DAMON be extended for originating CPU-aware access monitoring, the
> new DAMOS action would also cover more use cases such as general NUMA nodes
> balancing (extending DAMON for CPU-aware monitoring would required), and some
> complex configurations where having both CPU affinity and tiered memory.  I
> also think that may well fit with my RFC idea[2] for tiered memory management.
> 
> Looking forward to opinions from you and others.  I admig I miss many things,
> and more than happy to be enlightened.
> 
> [1] https://lwn.net/Articles/944007/
> [2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/

Thanks very much for your comments.  I will need a few more days for the
update but will try to address your concerns as much as possible.

Thanks,
Honggyu