linux-kernel - Re: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9db4baa1-2efe-8c01-e2e0-f275cc192fec@linux.alibaba.com>
Date:   Wed, 22 Dec 2021 17:15:18 +0800
From:   Baolin Wang <baolin.wang@...ux.alibaba.com>
To:     SeongJae Park <sj@...nel.org>
Cc:     akpm@...ux-foundation.org, ying.huang@...el.com,
        dave.hansen@...ux.intel.com, ziy@...dia.com, shy828301@...il.com,
        zhongjiang-ali@...ux.alibaba.com, xlpang@...ux.alibaba.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on
 tiered memory system



On 12/22/2021 4:43 PM, SeongJae Park wrote:
> On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
> 
>> Hi SeongJae,
>>
>> On 12/21/2021 9:24 PM, SeongJae Park Wrote:
>>> Hi Baolin,
>>>
>>>
>>> Thank you for this great patch!  I left some comments below.
>>>
>>> On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>
>>>> On tiered memory system, the reclaim path in shrink_page_list() already
>>>> support demoting pages to slow memory node instead of discarding the
>>>> pages. However, at that time the fast memory node memory wartermark is
>>>> already tense, which will increase the memory allocation latency during
>>>> page demotion.
>>>>
>>>> We can rely on the DAMON in user space to help to monitor the cold
>>>> memory on fast memory node, and demote the cold pages to slow memory
>>>> node proactively to keep the fast memory node in a healthy state.
>>>> Thus this patch introduces a new scheme named DAMOS_DEMOTE to support
>>>> this feature.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>> ---

[snip]

>>
>>>
>>>> +		list_add(&page->lru, &pagelist);
>>>> +		target_nid = next_demotion_node(page_to_nid(page));
>>>> +		nr_pages = thp_nr_pages(page);
>>>> +
>>>> +		ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
>>>> +				    target_nid, MIGRATE_SYNC, MR_DEMOTION,
>>>> +				    &nr_succeeded);
>>>> +		if (ret) {
>>>> +			if (!list_empty(&pagelist)) {
>>>> +				list_del(&page->lru);
>>>> +				mod_node_page_state(page_pgdat(page),
>>>> +						    NR_ISOLATED_ANON + page_is_file_lru(page),
>>>> +						    -nr_pages);
>>>> +				putback_lru_page(page);
>>>> +			}
>>>> +		} else {
>>>> +			__count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
>>>> +			demoted_pages += nr_succeeded;
>>>> +		}
>>>
>>> Why should we do above work for each page on our own instead of exporting and
>>> calling 'demote_page_list()'?
>>
>> Cuase demote_page_list() is used for demote cold pages which are from
>> one same fast memory node, they can have one same target demotion node.
>>
>> But for the regions for the process, we can get some cold pages from
>> different fast memory nodes (suppose we have mulptiple dram node on the
>> system), so its target demotion node may be different. Thus we should
>> migrate each cold pages with getting the correct target demotion node.
> 
> Thank you for the kind explanation.  But, I still want to reuse the code rather
> than copying if possible.  How about below dumb ideas off the top of my head?
> 
> 1. Call demote_page_list() for each page from here

Sounds reasonable.

> 2. Call demote_page_list() for each page from damos_migrate_pmd_entry()

We are under mmap lock in damos_migrate_pmd_entry(), it is not suitable 
to do page migration.

> 3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists,
>     each per fast memory node, and calling demote_page_list() here for each
>     per-fast-memory-node demote_pages list.

Which will bring more complexity I think, and we should avoid doing 
migration under mmap lock.

> 4. Make demote_page_list() handles this case and use it.  e.g., NULL pgdat
>     parameter means the pages are not from same node.

Thanks for your suggestion, actually after more thinking, I think we can 
reuse the demote_page_list() and it can be easy to change. Somethink 
like below on top of my current patch, how do you think? Thanks.

--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -796,7 +796,7 @@ static unsigned long damos_demote(struct 
damon_target *target,
         struct mm_struct *mm;
         LIST_HEAD(demote_pages);
         LIST_HEAD(pagelist);
-       int target_nid, nr_pages, ret = 0;
+       int nr_pages;
         unsigned int nr_succeeded, demoted_pages = 0;
         struct page *page, *next;

@@ -818,14 +818,11 @@ static unsigned long damos_demote(struct 
damon_target *target,
                 return 0;

         list_for_each_entry_safe(page, next, &demote_pages, lru) {
-               list_add(&page->lru, &pagelist);
-               target_nid = next_demotion_node(page_to_nid(page));
                 nr_pages = thp_nr_pages(page);
+               list_add(&page->lru, &pagelist);

-               ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
-                                   target_nid, MIGRATE_SYNC, MR_DEMOTION,
-                                   &nr_succeeded);
-               if (ret) {
+               nr_succeeded = demote_page_list(pagelist, page_pgdat(page));
+               if (!nr_succeeded) {
                         if (!list_empty(&pagelist)) {
                                 list_del(&page->lru);
                                 mod_node_page_state(page_pgdat(page),
@@ -834,11 +831,9 @@ static unsigned long damos_demote(struct 
damon_target *target,
                                 putback_lru_page(page);
                         }
                 } else {
-                       __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
                         demoted_pages += nr_succeeded;
                 }

-               INIT_LIST_HEAD(&pagelist);
                 cond_resched();
         }