[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9db4baa1-2efe-8c01-e2e0-f275cc192fec@linux.alibaba.com>
Date: Wed, 22 Dec 2021 17:15:18 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: SeongJae Park <sj@...nel.org>
Cc: akpm@...ux-foundation.org, ying.huang@...el.com,
dave.hansen@...ux.intel.com, ziy@...dia.com, shy828301@...il.com,
zhongjiang-ali@...ux.alibaba.com, xlpang@...ux.alibaba.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on
tiered memory system
On 12/22/2021 4:43 PM, SeongJae Park wrote:
> On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>
>> Hi SeongJae,
>>
>> On 12/21/2021 9:24 PM, SeongJae Park Wrote:
>>> Hi Baolin,
>>>
>>>
>>> Thank you for this great patch! I left some comments below.
>>>
>>> On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
>>>
>>>> On tiered memory system, the reclaim path in shrink_page_list() already
>>>> support demoting pages to slow memory node instead of discarding the
>>>> pages. However, at that time the fast memory node memory wartermark is
>>>> already tense, which will increase the memory allocation latency during
>>>> page demotion.
>>>>
>>>> We can rely on the DAMON in user space to help to monitor the cold
>>>> memory on fast memory node, and demote the cold pages to slow memory
>>>> node proactively to keep the fast memory node in a healthy state.
>>>> Thus this patch introduces a new scheme named DAMOS_DEMOTE to support
>>>> this feature.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>> ---
[snip]
>>
>>>
>>>> + list_add(&page->lru, &pagelist);
>>>> + target_nid = next_demotion_node(page_to_nid(page));
>>>> + nr_pages = thp_nr_pages(page);
>>>> +
>>>> + ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
>>>> + target_nid, MIGRATE_SYNC, MR_DEMOTION,
>>>> + &nr_succeeded);
>>>> + if (ret) {
>>>> + if (!list_empty(&pagelist)) {
>>>> + list_del(&page->lru);
>>>> + mod_node_page_state(page_pgdat(page),
>>>> + NR_ISOLATED_ANON + page_is_file_lru(page),
>>>> + -nr_pages);
>>>> + putback_lru_page(page);
>>>> + }
>>>> + } else {
>>>> + __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
>>>> + demoted_pages += nr_succeeded;
>>>> + }
>>>
>>> Why should we do above work for each page on our own instead of exporting and
>>> calling 'demote_page_list()'?
>>
>> Cuase demote_page_list() is used for demote cold pages which are from
>> one same fast memory node, they can have one same target demotion node.
>>
>> But for the regions for the process, we can get some cold pages from
>> different fast memory nodes (suppose we have mulptiple dram node on the
>> system), so its target demotion node may be different. Thus we should
>> migrate each cold pages with getting the correct target demotion node.
>
> Thank you for the kind explanation. But, I still want to reuse the code rather
> than copying if possible. How about below dumb ideas off the top of my head?
>
> 1. Call demote_page_list() for each page from here
Sounds reasonable.
> 2. Call demote_page_list() for each page from damos_migrate_pmd_entry()
We are under mmap lock in damos_migrate_pmd_entry(), it is not suitable
to do page migration.
> 3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists,
> each per fast memory node, and calling demote_page_list() here for each
> per-fast-memory-node demote_pages list.
Which will bring more complexity I think, and we should avoid doing
migration under mmap lock.
> 4. Make demote_page_list() handles this case and use it. e.g., NULL pgdat
> parameter means the pages are not from same node.
Thanks for your suggestion, actually after more thinking, I think we can
reuse the demote_page_list() and it can be easy to change. Somethink
like below on top of my current patch, how do you think? Thanks.
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -796,7 +796,7 @@ static unsigned long damos_demote(struct
damon_target *target,
struct mm_struct *mm;
LIST_HEAD(demote_pages);
LIST_HEAD(pagelist);
- int target_nid, nr_pages, ret = 0;
+ int nr_pages;
unsigned int nr_succeeded, demoted_pages = 0;
struct page *page, *next;
@@ -818,14 +818,11 @@ static unsigned long damos_demote(struct
damon_target *target,
return 0;
list_for_each_entry_safe(page, next, &demote_pages, lru) {
- list_add(&page->lru, &pagelist);
- target_nid = next_demotion_node(page_to_nid(page));
nr_pages = thp_nr_pages(page);
+ list_add(&page->lru, &pagelist);
- ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
- target_nid, MIGRATE_SYNC, MR_DEMOTION,
- &nr_succeeded);
- if (ret) {
+ nr_succeeded = demote_page_list(pagelist, page_pgdat(page));
+ if (!nr_succeeded) {
if (!list_empty(&pagelist)) {
list_del(&page->lru);
mod_node_page_state(page_pgdat(page),
@@ -834,11 +831,9 @@ static unsigned long damos_demote(struct
damon_target *target,
putback_lru_page(page);
}
} else {
- __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
demoted_pages += nr_succeeded;
}
- INIT_LIST_HEAD(&pagelist);
cond_resched();
}
Powered by blists - more mailing lists