[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ecc17ef5-1ae8-823c-e4f6-0a1dc4d71201@linux.alibaba.com>
Date: Fri, 18 Mar 2022 22:12:00 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: sj@...nel.org
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/damon: Make the sampling more accurate
On 3/18/2022 8:15 PM, sj@...nel.org wrote:
> On Fri, 18 Mar 2022 19:58:07 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>
>>
>>
>> On 3/18/2022 6:49 PM, sj@...nel.org wrote:
>>> On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>
>>>>
>>>> On 3/18/2022 5:40 PM, sj@...nel.org wrote:
>>>>> Hi Baolin,
>>>>>
>>>>> On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>>>
>>>>>> When I try to sample the physical address with DAMON to migrate pages
>>>>>> on tiered memory system, I found it will demote some cold regions mistakenly.
>>>>>> Now we will choose an physical address in the region randomly, but if
>>>>>> its corresponding page is not an online LRU page, we will ignore the
>>>>>> accessing status in this cycle of sampling, and actually will be treated
>>>>>> as a non-accessed region. Suppose a region including some non-LRU pages,
>>>>>> it will be treated as a cold region with a high probability, and may be
>>>>>> merged with adjacent cold regions, but there are some pages may be
>>>>>> accessed we missed.
>>>>>>
>>>>>> So instead of ignoring the access status of this region if we did not find
>>>>>> a valid page according to current sampling address, we can use last valid
>>>>>> sampling address to help to make the sampling more accurate, then we can do
>>>>>> a better decision.
>>>>>
>>>>> Well... Offlined pages are also a valid part of the memory region, so treating
>>>>> those as not accessed and making the memory region containing the offlined
>>>>> pages looks colder seems legal to me. IOW, this approach could make memory
>>>>> regions containing many non-online-LRU pages as hot.
>>>>
>>>> IMO I don't think this is a problem, since if this region containing
>>>> many non-online-LRU pages is treated as hot, which means threre are aome
>>>> pages are hot, right? We can find them and promote them to fast memory
>>>> (or do other schemes). Meanwhile, for non-online-LRU pages, we can
>>>> filter them and do nothing for them, since we can not get a valid page
>>>> struct for them.
>>>
>>> For some of DAMOS actions that you mentioned, that could make sense. However,
>>> that wouldn't make much sense for some other cases, especially for manual
>>> DAMON-based access pattern profiling.
>>
>> I am not sure about this case, could you elaborate on how this can worse
>> the case you mentioned?
>
> For an example, let's suppose a user using DAMON to know the working set size
> of the system. And further suppose there is a region that containing many
> offlined pages and one online hot page. With this patch, once DAMON sampled
> the one hot page, the entire region will be reported as hot, though the other
> offlined pages has not accessed. As a result, the user will think the working
> set size is bigger than real.
OK, sounds reasonable. Seems I need add a flag to indicate if we should
ignore offline or non-lru pages when monitoring for some schemes, which
can help to do a good decision.
Powered by blists - more mailing lists