linux-kernel - [RFC PATCH 0/2] mempool vs. page allocator interaction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1468831164-26621-1-git-send-email-mhocko@kernel.org>
Date:	Mon, 18 Jul 2016 10:39:22 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	<linux-mm@...ck.org>
Cc:	Mikulas Patocka <mpatocka@...hat.com>,
	Ondrej Kozina <okozina@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	Mel Gorman <mgorman@...e.de>, Neil Brown <neilb@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, dm-devel@...hat.com
Subject: [RFC PATCH 0/2] mempool vs. page allocator interaction

Hi,
there have been two issues identified when investigating dm-crypt
backed swap recently [1]. The first one looks like a regression from
f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free
elements") because swapout path can now deplete all the available memory
reserves. The first patch tries to address that issue by dropping
__GFP_NOMEMALLOC only to TIF_MEMDIE tasks.

The second issue is that dm writeout path which relies on mempool
allocator gets throttled by the direct reclaim in throttle_vm_writeout
which just makes the whole memory pressure problem even worse. The
patch2 just makes sure that we annotate mempool users to be throttled
less by PF_LESS_THROTTLE flag and prevent from throttle_vm_writeout for
that path. mempool users are usually the IO path and throttle them less
sounds like a reasonable way to go.

I do not have any more complicated dm setup available so I would
appreciate if dm people (CCed) could give these two a try.

Also it would be great to iron out concerns from David. He has posted a
deadlock stack trace [2] which has led to f9054c70d28b which is bio
allocation lockup because the TIF_MEMDIE process cannot make a forward
progress without access to memory reserve. This case should be fixed by
patch 1 AFAICS. There are other potential cases when the stuck mempool
is called from PF_MEMALLOC context and blocks the oom victim indirectly
(over a lock) but I believe those are much less likely and we have the
oom reaper to make a forward progress.

Sorry of pulling the discussion outside of the original email thread
but there were more lines of dicussion there and I felt discussing
particualr solution with its justification has a greater chance of
moving towards a solution. I am sending this as an RFC because this
needs a deep review as there might be other side effects I do not see
(especially about patch 2).

Any comments, suggestions are welcome.

---
[1] http://lkml.kernel.org/r/alpine.LRH.2.02.1607111027080.14327@file01.intranet.prod.int.rdu2.redhat.com
[2] http://lkml.kernel.org/r/alpine.DEB.2.10.1607131644590.92037@chino.kir.corp.google.com