lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Dec 2018 22:33:26 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     mhocko@...nel.org, vbabka@...e.cz, hannes@...xchg.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if
 priority is low



On 12/20/18 10:04 PM, Hugh Dickins wrote:
> On Thu, 20 Dec 2018, Andrew Morton wrote:
>> Is anyone interested in reviewing this?  Seems somewhat serious.
>> Thanks.
> Somewhat serious, but no need to rush.
>
>> From: Yang Shi <yang.shi@...ux.alibaba.com>
>> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
>>
>> When running a stress test, we occasionally run into the below hang issue:
> Artificial load presumably.
>
>> INFO: task ksmd:205 blocked for more than 360 seconds.
>>        Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
> 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
> ("ksm: introduce ksm_max_page_sharing per page deduplication limit").
>
> The patch below is more economical than Andrea's, but I don't think
> a second workaround should be added, unless Andrea's is shown to be
> insufficient, even with its ksm_max_page_sharing tuned down to suit.
>
> Yang, please try to reproduce on upstream, or backport Andrea's to
> 4.9-stable - thanks.

I believe Andrea's commit could workaround this problem too by limiting 
the number of sharing pages.

However, IMHO, even though we just have a few hundred pages share one 
KSM page, it still sounds not worth reclaiming it in direct reclaim in 
low priority. According to Andrea's commit log, it still takes a few 
msec to walk the rmap for 256 shared pages.

Thanks,
Yang

>
> Hugh
>
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> ksmd            D    0   205      2 0x00000000
>>   ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
>>   ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
>>   00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
>> Call Trace:
>>   [<ffffffff81725e58>] ? __schedule+0x258/0x720
>>   [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
>>   [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
>>   [<ffffffff81726356>] schedule+0x36/0x80
>>   [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
>>   [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
>>   [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
>>   [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
>>   [<ffffffff81726c50>] ? bit_wait+0x60/0x60
>>   [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
>>   [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
>>   [<ffffffff811aff76>] __lock_page+0x86/0xa0
>>   [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
>>   [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
>>   [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
>>   [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
>>   [<ffffffff810ac226>] kthread+0xe6/0x100
>>   [<ffffffff810ac140>] ? kthread_park+0x60/0x60
>>   [<ffffffff8172b196>] ret_from_fork+0x46/0x60
>>
>> ksmd found a suitable KSM page on the stable tree and is trying to lock
>> it.  But it is locked by the direct reclaim path which is walking the
>> page's rmap to get the number of referenced PTEs.
>>
>> The KSM page rmap walk needs to iterate all rmap_items of the page and all
>> rmap anon_vmas of each rmap_item.  So it may take (# rmap_item * #
>> children processes) loops.  This number of loops might be very large in
>> the worst case, and may take a long time.
>>
>> Typically, direct reclaim will not intend to reclaim too many pages, and
>> it is latency sensitive.  So it is not worth doing the long ksm page rmap
>> walk to reclaim just one page.
>>
>> Skip KSM pages in direct reclaim if the reclaim priority is low, but still
>> try to reclaim KSM pages with high priority.
>>
>> Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com
>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>> Cc: Vlastimil Babka <vbabka@...e.cz>
>> Cc: Johannes Weiner <hannes@...xchg.org>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: Michal Hocko <mhocko@...nel.org>
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
>> ---
>>
>>   mm/vmscan.c |   23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> --- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low
>> +++ a/mm/vmscan.c
>> @@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st
>>   			}
>>   		}
>>   
>> -		if (!force_reclaim)
>> -			references = page_check_references(page, sc);
>> +		if (!force_reclaim) {
>> +			/*
>> +			 * Don't try to reclaim KSM page in direct reclaim if
>> +			 * the priority is not high enough.
>> +			 */
>> +			if (PageKsm(page) && !current_is_kswapd() &&
>> +			    sc->priority > (DEF_PRIORITY - 2))
>> +				references = PAGEREF_KEEP;
>> +			else
>> +				references = page_check_references(page, sc);
>> +		}
>>   
>>   		switch (references) {
>>   		case PAGEREF_ACTIVATE:
>> @@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned
>>   			}
>>   		}
>>   
>> +		/*
>> +		 * Skip KSM page in direct reclaim if priority is not
>> +		 * high enough.
>> +		 */
>> +		if (PageKsm(page) && !current_is_kswapd() &&
>> +		    sc->priority > (DEF_PRIORITY - 2)) {
>> +			putback_lru_page(page);
>> +			continue;
>> +		}
>> +
>>   		if (page_referenced(page, 0, sc->target_mem_cgroup,
>>   				    &vm_flags)) {
>>   			nr_rotated += hpage_nr_pages(page);
>> _

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ