lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Nov 2015 16:40:03 +0300
From:	Vladimir Davydov <vdavydov@...tuozzo.com>
To:	Michal Hocko <mhocko@...nel.org>
CC:	Vlastimil Babka <vbabka@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Mel Gorman <mgorman@...hsingularity.net>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated

On Fri, Nov 27, 2015 at 01:50:05PM +0100, Michal Hocko wrote:
> On Thu 26-11-15 11:16:24, Vladimir Davydov wrote:
> > On Wed, Nov 25, 2015 at 07:27:57PM +0300, Vladimir Davydov wrote:
> > > On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote:
> > > > On 11/25/2015 04:36 PM, Vladimir Davydov wrote:
> > > > > Block device drivers often hand off io request processing to kernel
> > > > > threads (example: device mapper). If such a thread calls kmalloc, it can
> > > > > dive into direct reclaim path and end up waiting for too_many_isolated
> > > > > to return false, blocking writeback. This can lead to a dead lock if the
> > > > 
> > > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of
> > > > deadlocks? And/or have mempools?
> > > 
> > > Not necessarily. loopback is an example: it can call
> > > grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL.
> 
> AFAIR loop driver reduces the gfp_maks via inode mapping.

Yeah, it does, missed that, thanks for pointing this out. But it doesn't
really make much difference, because it still can get stuck in
too_many_isolated, although it does reduce the chance of this happening.
When I hit it, DMA only got 3 inactive file pages and 68 isolated file
pages, as I mentioned in the comment to the patch, so even >> 3 wouldn't
save us.

>  
> > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either,
> > because it isn't an allocation context problem: the reclaimer locks up
> > not because it tries to take an fs/io lock the caller holds, but because
> > it waits for isolated pages to be put back, which will never happen,
> > since processes that isolated them depend on the kthread making
> > progress. This is purely a reclaimer heuristic, which kmalloc users are
> > not aware of.
> > 
> > My point is that, in contrast to userspace processes, it is dangerous to
> > throttle kthreads in the reclaimer, because they might be responsible
> > for reclaimer progress (e.g. performing writeback).
> 
> Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC
> instead because it is in fact a reclaimer so it even get to the reclaim.

The driver we use is similar to loop. It works as a proxy to fs it works
on top of. Allowing it to access emergency reserves would deplete them
quickly, just like in case of plain loop.

The problem is not about our driver, in fact. I'm pretty sure one can
hit it when using memcg along with loop or dm-crypt for instance.

> 
> There way too many allocations done from the kernel thread context to be
> not throttled (just look at worker threads).

What about throttling them only once then?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 97ba9e1cde09..9253f4531b9c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1578,6 +1578,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		/* We are about to die and free our memory. Return now. */
 		if (fatal_signal_pending(current))
 			return SWAP_CLUSTER_MAX;
+
+		if (current->flags & PF_KTHREAD)
+			break;
 	}
 
 	lru_add_drain();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ