linux-kernel - Re: [PATCH] mm/damon: Make the sampling more accurate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ecc17ef5-1ae8-823c-e4f6-0a1dc4d71201@linux.alibaba.com>
Date:   Fri, 18 Mar 2022 22:12:00 +0800
From:   Baolin Wang <baolin.wang@...ux.alibaba.com>
To:     sj@...nel.org
Cc:     akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/damon: Make the sampling more accurate



On 3/18/2022 8:15 PM, sj@...nel.org wrote:
> On Fri, 18 Mar 2022 19:58:07 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
> 
>>
>>
>> On 3/18/2022 6:49 PM, sj@...nel.org wrote:
>>> On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>
>>>>
>>>> On 3/18/2022 5:40 PM, sj@...nel.org wrote:
>>>>> Hi Baolin,
>>>>>
>>>>> On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>>>
>>>>>> When I try to sample the physical address with DAMON to migrate pages
>>>>>> on tiered memory system, I found it will demote some cold regions mistakenly.
>>>>>> Now we will choose an physical address in the region randomly, but if
>>>>>> its corresponding page is not an online LRU page, we will ignore the
>>>>>> accessing status in this cycle of sampling, and actually will be treated
>>>>>> as a non-accessed region. Suppose a region including some non-LRU pages,
>>>>>> it will be treated as a cold region with a high probability, and may be
>>>>>> merged with adjacent cold regions, but there are some pages may be
>>>>>> accessed we missed.
>>>>>>
>>>>>> So instead of ignoring the access status of this region if we did not find
>>>>>> a valid page according to current sampling address, we can use last valid
>>>>>> sampling address to help to make the sampling more accurate, then we can do
>>>>>> a better decision.
>>>>>
>>>>> Well...  Offlined pages are also a valid part of the memory region, so treating
>>>>> those as not accessed and making the memory region containing the offlined
>>>>> pages looks colder seems legal to me.  IOW, this approach could make memory
>>>>> regions containing many non-online-LRU pages as hot.
>>>>
>>>> IMO I don't think this is a problem, since if this region containing
>>>> many non-online-LRU pages is treated as hot, which means threre are aome
>>>> pages are hot, right? We can find them and promote them to fast memory
>>>> (or do other schemes). Meanwhile, for non-online-LRU pages, we can
>>>> filter them and do nothing for them, since we can not get a valid page
>>>> struct for them.
>>>
>>> For some of DAMOS actions that you mentioned, that could make sense.  However,
>>> that wouldn't make much sense for some other cases, especially for manual
>>> DAMON-based access pattern profiling.
>>
>> I am not sure about this case, could you elaborate on how this can worse
>> the case you mentioned?
> 
> For an example, let's suppose a user using DAMON to know the working set size
> of the system.  And further suppose there is a region that containing many
> offlined pages and one online hot page.  With this patch, once DAMON sampled
> the one hot page, the entire region will be reported as hot, though the other
> offlined pages has not accessed.  As a result, the user will think the working
> set size is bigger than real.

OK, sounds reasonable. Seems I need add a flag to indicate if we should 
ignore offline or non-lru pages when monitoring for some schemes, which 
can help to do a good decision.