[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220318104948.26387-1-sj@kernel.org>
Date: Fri, 18 Mar 2022 10:49:48 +0000
From: sj@...nel.org
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: sj@...nel.org, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/damon: Make the sampling more accurate
On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>
> On 3/18/2022 5:40 PM, sj@...nel.org wrote:
> > Hi Baolin,
> >
> > On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
> >
> >> When I try to sample the physical address with DAMON to migrate pages
> >> on tiered memory system, I found it will demote some cold regions mistakenly.
> >> Now we will choose an physical address in the region randomly, but if
> >> its corresponding page is not an online LRU page, we will ignore the
> >> accessing status in this cycle of sampling, and actually will be treated
> >> as a non-accessed region. Suppose a region including some non-LRU pages,
> >> it will be treated as a cold region with a high probability, and may be
> >> merged with adjacent cold regions, but there are some pages may be
> >> accessed we missed.
> >>
> >> So instead of ignoring the access status of this region if we did not find
> >> a valid page according to current sampling address, we can use last valid
> >> sampling address to help to make the sampling more accurate, then we can do
> >> a better decision.
> >
> > Well... Offlined pages are also a valid part of the memory region, so treating
> > those as not accessed and making the memory region containing the offlined
> > pages looks colder seems legal to me. IOW, this approach could make memory
> > regions containing many non-online-LRU pages as hot.
>
> IMO I don't think this is a problem, since if this region containing
> many non-online-LRU pages is treated as hot, which means threre are aome
> pages are hot, right? We can find them and promote them to fast memory
> (or do other schemes). Meanwhile, for non-online-LRU pages, we can
> filter them and do nothing for them, since we can not get a valid page
> struct for them.
For some of DAMOS actions that you mentioned, that could make sense. However,
that wouldn't make much sense for some other cases, especially for manual
DAMON-based access pattern profiling.
After all, we already have a mechanism for this case: adaptive regions
adjustment (or, regions split/merge). That mechanism will eventually separate
out hot oneline-LRU pages in the memory regions. Before the region is
adjusted, reporting the whole region as hot looks like a right result to me.
Of course, I admit that it could take too much time to converge to the optimal
regions, and there are many rooms for improvement of the regions adjustment
mechanism. I think we should pursue the direction (improving the regions
adjustment mechanism).
FYI, I have some rough ideas for improving the mechanism including partitioning
regions into more than 2 sub-regions if we belive it is not making a good
progress. Nevertheless, I'd like to first make a methodology for evaluating
current accuracy. For that, I am planning to implement a page-granularity
access monitoring.
Thanks,
SJ
[...]
Powered by blists - more mailing lists