linux-kernel - Re: [RFC PATCH v1 0/4] mm/damon: Support hot application detections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260204073133.16471-1-sj@kernel.org>
Date: Tue,  3 Feb 2026 23:31:32 -0800
From: SeongJae Park <sj@...nel.org>
To: Gutierrez Asier <gutierrez.asier@...wei-partners.com>
Cc: SeongJae Park <sj@...nel.org>,
	artem.kuzin@...wei.com,
	stepanov.anatoly@...wei.com,
	wangkefeng.wang@...wei.com,
	yanquanmin1@...wei.com,
	zuoze1@...wei.com,
	damon@...ts.linux.dev,
	akpm@...ux-foundation.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/4] mm/damon: Support hot application detections

On Tue, 3 Feb 2026 16:03:04 +0300 Gutierrez Asier <gutierrez.asier@...wei-partners.com> wrote:

> Hi SeongJae!
> 
> On 2/3/2026 4:10 AM, SeongJae Park wrote:
> > Hello Asier,
> > 
> > 
> > Thank you for sharing this nice RFC patch series!
> > 
> > On Mon, 2 Feb 2026 14:56:45 +0000 <gutierrez.asier@...wei-partners.com> wrote:
> > 
> >> From: Asier Gutierrez <gutierrez.asier@...wei-partners.com>
> >>
> >> Overview
> >> ----------
> >>
> >> This patch set introduces a new dynamic mechanism for detecting hot applications
> >> and hot regions in those applications.
> >>
> >> Motivation
> >> -----------
> >>
> >> Currently DAMON requires the system administrator to provide information about
> >> which application needs to be monitored and all the parameters. Ideally this
> >> should be done automatically, with minimal intervention from the system
> >> administrator.
> >>
> >>
> >> Since TLB is a bottleneck for many systems, a way to optimize TLB misses (or
> >> hits) is to use huge pages. Unfortunately, using "always" in THP leads to memory
> >> fragmentation and memory waste. For this reason, most application guides and
> >> system administrators suggest to disable THP.
> >>
> >> We would like to detect: 1. which applications are hot in the system and 2.
> >> which memory regions are hot in order to collapse those regions.
> >>
> >>
> >> Solution
> >> -----------
> >>
> >>      ┌────────────┐           ┌────────────┐
> >>      │Damon_module│           │Task_monitor│
> >>      └──────┬─────┘           └──────┬─────┘
> >>             │         start          │
> >>             │───────────────────────>│
> >>             │                        │
> >>             │                        │────┐
> >>             │                        │    │ calculate task load
> >>             │                        │<───┘
> >>             │                        │
> >>             │                        │────┐
> >>             │                        │    │ sort tasks
> >>             │                        │<───┘
> >>             │                        │
> >>             │                        │────┐
> >>             │                        │    │ start kdamond for top 3 tasks
> >>             │                        │<───┘
> >>      ┌──────┴─────┐           ┌──────┴─────┐
> >>      │Damon_module│           │Task_monitor│
> >>      └────────────┘           └────────────┘
> >>
> >>
> >> We calculate the task load base on the sum of all the utime for all the threads
> >> in a given task. Once we get total utime, we use the exponential load average
> >> provided by calc_load. The tasks that become cold, the kdamond will be stopped
> >> for them.
> > 
> > Sounds interesting, and this high level idea makes sense to me. :)
> > 
> > I'd like to further learn a few things.  Is there a reason to think the top 3
> > tasks are enough number of tasks?  Also, what if a region was hot and
> > successfully promoted to use huge pages, but later be cold?  Should we also
> > have a DAMOS scheme for splitting such no-more-hot huge pages?
> 
> No specific reason. This was just for the RFC. We could move this to a parameter
> somehow.

Makes sense.  Depending on the test results with 3 tasks default value, I think
we could just keep it a hard-coded default one.  If it turns out it is not
working good for different cases, we could make it a tunable parameter or
internally auto-tuned.

> 
> In case of a region turning cold, I haven't worked on it. In turning hot means
> that we collapse the hot region, we should do the opposite (split) in case the
> area turns cold. I haven't thought about it, but that a good catch. Thanks!

You're welcome!

> 
> >>
> >> In each kdamond, we start with a high min_access value. Our goal is to find the
> >> "maximum" min_access value at which point the DAMON action is applied. In each
> >> cycle, if no action is applied, we lower the min_access.
> > 
> > Sounds like a nice auto-tuning.  And we have DAMOS quota goal for that kind of
> > auto-tuning.  Have you considered using that?

Maybe you missed the above question?

> > 
> >>
> >> Regarding the action, we introduce a new action: DAMOS_COLLAPSE. This allows us
> >> collapse synchronously and avoid polluting khugepaged and other parts of the MM
> >> subsystem with DAMON stuff. DAMOS_HUGEPAGE eventually calls hugepage_madvise,
> >> which needs the correct vm_flags_t set.
> >>
> >> Benchmark
> >> -----------
> > 
> > Seems you forgot writing this section up.  Or, you don't have benchmark results
> > yet, but only mistakenly wrote the above section header?  Either is fine, as
> > this is just an RFC.  Nevertheless, test results and your expected use case of
> > this patch series will be very helpful.
> > 
> > 
> > Thanks,
> > SJ
> > 
> > [...]
> > 
> 
> Sure, will add the benchmark results in the next RFC version.

Looking forward!  I'm particularly interested in your expected or planned use
case, including why you implement the top n processes logic inside the kernel
instead of putting it on the user space.  I'm also interested in how well the
test setup is representing the realistic use case, and how good the results is.
That will help us deciding important things including whether this can be
merged, and if some corner cases handling should be made before or after
merging it, earlier.


Thanks,
SJ