lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Aug 2020 08:21:55 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     "Huang, Ying" <ying.huang@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     linux-kernel@...r.kernel.org, yang.shi@...ux.alibaba.com,
        rientjes@...gle.com, dan.j.williams@...el.com,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [RFC][PATCH 5/9] mm/migrate: demote pages during reclaim

On 8/20/20 1:06 AM, Huang, Ying wrote:
>> +	/* Migrate pages selected for demotion */
>> +	nr_reclaimed += demote_page_list(&ret_pages, &demote_pages, pgdat, sc);
>> +
>>  	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
>>  
>>  	mem_cgroup_uncharge_list(&free_pages);
>> _
> Generally, it's good to batch the page migration.  But one side effect
> is that, if the pages are failed to be migrated, they will be placed
> back to the LRU list instead of falling back to be reclaimed really.
> This may cause some issue in some situation.  For example, if there's no
> enough space in the PMEM (slow) node, so the page migration fails, OOM
> may be triggered, because the direct reclaiming on the DRAM (fast) node
> may make no progress, while it can reclaim some pages really before.

Yes, agreed.

There are a couple of ways we could fix this.  Instead of splicing
'demote_pages' back into 'ret_pages', we could try to get them back on
'page_list' and goto the beginning on shrink_page_list().  This will
probably yield the best behavior, but might be a bit ugly.

We could also add a field to 'struct scan_control' and just stop trying
to migrate after it has failed one or more times.  The trick will be
picking a threshold that doesn't mess with either the normal reclaim
rate or the migration rate.

This is on my list to fix up next.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ