lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181119124033.GJ22247@dhcp22.suse.cz>
Date:   Mon, 19 Nov 2018 13:40:33 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Baoquan He <bhe@...hat.com>
Cc:     David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
        pifang@...hat.com, linux-kernel@...r.kernel.org,
        akpm@...ux-foundation.org, aarcange@...hat.com,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: Memory hotplug softlock issue

On Mon 19-11-18 18:52:02, Baoquan He wrote:
[...]

There are few stacks directly in the offline path but those should be
OK.
The real culprit seems to be the swap in code

> [  +1.734416] CPU: 255 PID: 5558 Comm: stress Tainted: G             L    4.20.0-rc2+ #7
> [  +0.007927] Hardware name:  9008/IT91SMUB, BIOS BLXSV512 03/22/2018
> [  +0.006297] Call Trace:
> [  +0.002537]  dump_stack+0x46/0x60
> [  +0.003386]  __migration_entry_wait.cold.65+0x5/0x14
> [  +0.005043]  do_swap_page+0x84e/0x960
> [  +0.003727]  ? arch_tlb_finish_mmu+0x29/0xc0
> [  +0.006412]  __handle_mm_fault+0x933/0x1330
> [  +0.004265]  handle_mm_fault+0xc4/0x250
> [  +0.003915]  __do_page_fault+0x2b7/0x510
> [  +0.003990]  do_page_fault+0x2c/0x110
> [  +0.003729]  ? page_fault+0x8/0x30
> [  +0.003462]  page_fault+0x1e/0x30

There are many traces to this path. We are 
	/*
	 * Once page cache replacement of page migration started, page_count
	 * *must* be zero. And, we don't want to call wait_on_page_locked()
	 * against a page without get_page().
	 * So, we use get_page_unless_zero(), here. Even failed, page fault
	 * will occur again.
	 */
	if (!get_page_unless_zero(page))
		goto out;
	pte_unmap_unlock(ptep, ptl);
	wait_on_page_locked(page);

taking a reference to the page under the migration. I have to think
about this much more but I suspec this is just calling for a problem.

Cc migration experts. For you background information. We are seeing
memory offline not being able to converge because few heavily used pages
fail to migrate away - e.g. http://lkml.kernel.org/r/20181116012433.GU2653@MiWiFi-R3L-srv
A debugging page to dump stack for these pages http://lkml.kernel.org/r/20181116091409.GD14706@dhcp22.suse.cz
shows that references are taken from the swap in code (above). How are
we supposed to converge when the swapin code waits for the migration to
finish with the reference count elevated?
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ