linux-kernel - Fwd: Two simple ideas for DAMON accuracy improvement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <20241027204910.155254-1-sj@kernel.org>
Date: Sun, 27 Oct 2024 13:49:10 -0700
From: SeongJae Park <sj@...nel.org>
To: damon@...ts.linux.dev
Cc: SeongJae Park <sj@...nel.org>,
	kernel-team@...a.com,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Fwd: Two simple ideas for DAMON accuracy improvement

Forgot Cc-ing linux-mm@ and linux-kernel@.  Forwarding.  Sorry for noise.

Thanks,
SJ

=== >8 ===
From: SeongJae Park <sj@...nel.org>
To: damon@...ts.linux.dev
CC: SeongJae Park <sj@...nel.org>, kernel-team@...a.com
Subject: Two simple ideas for DAMON accuracy improvement
Message-Id: <20241026215311.148363-1-sj@...nel.org>
Date: Sat, 26 Oct 2024 14:53:11 -0700
Local-Date: 2024-10-26 14:53:11-07:00

Hello DAMON community,

There were a number of grateful questions, concerns, and improvement ideas
around monitoring output accuracy of DAMON.  I always admitted the fact that
DAMON has many rooms for improvement, but was bit awary at changes for some
reasons.  Now I think it caused some unnecessarily long delay.  Sorry about
that.  Now I want to invest some time on the topic.  So starting by sharing
below two simple ideas first.

User-defined Regions Split Factor
---------------------------------

DAMON's "Adasptive Regions Adjustment (ARA)" mechanism splits each region into
randomly sized sub regions, show their access temperature, and merge back
adjacent regions having similar temperature.  The split factor is hard-coded as
two.  Increasing the number make DAMON regions more quickly converges in right
shape.  However, it makes number of DAMON regions in usual situation higher,
and therefore induce more overhead.  It will still keep the user-defined upper
limit (max_nr_regions), though.

The optimum value of the split factor would depend on the use case.  We will
therefore add another knob to let users set the factor on runtime.  The default
value will be two, so this will not introduce any regression or behavioral
change to existing users.

Periodic Fine-grain Split of Aged Regions
-----------------------------------------

If a region is continuously changing its boundary and access temperature, it
means it is converging, or the access pattern of the workload is not
stabilized.  Either case, this is a healthy signal.

If a region is consistently showing same access pattern for long time, it may
because the access pattern is stabilized, and the region is correctly
converged.  However, it might be because the access pattern is changed, but the
converging is slow.

To avoid the too slow converging of aged regions, we will let users
periodically increase the split factor for regions that kept current access
pattern for long time (high 'age').  Users will be able to set the 'age'
offset, the split factor for the aged regions, and time interval between the
periodic fine-grain split of the regions.  For example, users can ask DAMON to
"split regions keeping current access pattern for ten minutes or higher to five
sub-regions every minute".

The feature will be ignored unless users explicitly set those, so that it does
not introduce any regression of behavioral change to existing users.

Discussions
-----------

Someone might worry if these are adding too much knobs.  As I shared the long
term plan on last LPC[1], we will keep supporting those new knobs in long term,
and may introduce auto-tuning feature in future.  By letting these user-tunable
first, we can collect experiment results and use those for the future
improvements.  Anyway, these changes will not introduce any regresion or
behavioral change to existing users based on the idea, so I believe these are
safe to be added.

One of the factors that made my work on this topic was absence of a formal
DAMON accuracy evaluation method.  Using damon-tests, we were able to do the
evaluation by drawing heatmaps of test workloads and comparing those from
different versions of DAMON.  Comparing several DAMOS schemes results on test
workloads were also one way for that.  But, those are not formal.  We still
don't have a formal way for accuracy evaluation.  However, the two features
will introduce no regression to existing users, so I believe this is the path
forward for now.

I believe implementing the features would be not difficult.  So unless someone
voluntarily steps up, I will start implementation of the features, targeting
v6.14 merge window.

I'm looking forward to any comments.

[1] https://lpc.events/event/18/contributions/1768/

Thanks,
SJ