linux-kernel - Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20140718163843.GK29639@cmpxchg.org>
Date:	Fri, 18 Jul 2014 12:38:43 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Richard Yao <ryao@...too.org>
Cc:	linux-kernel@...r.kernel.org, mthode@...ode.org, kernel@...too.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>,
	Glauber Costa <glommer@...nvz.org>,
	Rik van Riel <riel@...hat.com>,
	Vladimir Davydov <vdavydov@...allels.com>,
	Dave Chinner <dchinner@...hat.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>
Subject: Re: [PATCH] mm: vmscan: unlock_page page when forcing reclaim

On Fri, Jul 18, 2014 at 11:48:02AM -0400, Richard Yao wrote:
> A small userland program I wrote to assist me in drive forensic
> operations soft deadlocked on Linux 3.14.4. The stack trace from /proc
> was:
> 
> [<ffffffff8112968e>] sleep_on_page_killable+0xe/0x40
> [<ffffffff81129829>] wait_on_page_bit_killable+0x79/0x80
> [<ffffffff811299a5>] __lock_page_or_retry+0x95/0xc0
> [<ffffffff8112a95b>] filemap_fault+0x21b/0x420
> [<ffffffff8115685e>] __do_fault+0x6e/0x520
> [<ffffffff81156de3>] handle_pte_fault+0xd3/0x1f0
> [<ffffffff81157073>] __handle_mm_fault+0x173/0x290
> [<ffffffff811571d2>] handle_mm_fault+0x42/0xb0
> [<ffffffff81587a11>] __do_page_fault+0x191/0x490
> [<ffffffff81587dec>] do_page_fault+0xc/0x10
> [<ffffffff81584622>] page_fault+0x22/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The program used mmap() to do a linear scan of the device on 64-bit
> hardware. The block device in question was 200GB in size and the system
> had only 8GB of RAM. All IO operations stopped following pageout.
> 
> shrink_page_list() seemed to have raced with filemap_fault() by evicting
> a page when we had an active fault handler. This is possible only
> because 02c6de8d757cb32c0829a45d81c3dfcbcafd998b altered the behavior of
> shrink_page_list() to ignore references. Consequently, we must call
> unlock_page() instead of __clear_page_locked() when doing this so that
> waiters are notified. unlock_page() here will cause active page fault
> handlers to retry (depending on the architecture), which avoids the soft
> deadlock.

I don't really understand how the scenario you describe can happen.

Successfully reclaiming a page means that __remove_mapping() was able
to freeze a page count of 2 (page cache and LRU isolation), but
filemap_fault() increases the refcount on the page before trying to
lock the page.  If __remove_mapping() wins, find_get_page() does not
work and the fault does not lock the page.  If find_get_page() wins,
__remove_mapping() does not work and the reclaimer aborts and does a
regular unlock_page().

page_check_references() is purely about reclaim strategy, it should
not be essential for correctness.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/