linux-kernel - RE: [PATCH] mm/vmscan: Do not block forever at shrink_inactive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6B2BA408B38BA1478B473C31C3D2074E31D59D8673@SV-EXCHANGE1.Corp.FC.LOCAL>
Date:	Tue, 20 May 2014 09:12:07 -0700
From:	Motohiro Kosaki <Motohiro.Kosaki@...fujitsu.com>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	"david@...morbit.com" <david@...morbit.com>,
	"riel@...hat.com" <riel@...hat.com>
CC:	Motohiro Kosaki JP <kosaki.motohiro@...fujitsu.com>,
	"fengguang.wu@...el.com" <fengguang.wu@...el.com>,
	"kamezawa.hiroyu@...fujitsu.com" <kamezawa.hiroyu@...fujitsu.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"hch@...radead.org" <hch@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"xfs@....sgi.com" <xfs@....sgi.com>
Subject: RE: [PATCH] mm/vmscan: Do not block forever at
 shrink_inactive_list().



> -----Original Message-----
> From: Tetsuo Handa [mailto:penguin-kernel@...ove.SAKURA.ne.jp]
> Sent: Tuesday, May 20, 2014 11:58 PM
> To: david@...morbit.com; riel@...hat.com
> Cc: Motohiro Kosaki JP; fengguang.wu@...el.com; kamezawa.hiroyu@...fujitsu.com; akpm@...ux-foundation.org;
> hch@...radead.org; linux-kernel@...r.kernel.org; xfs@....sgi.com
> Subject: Re: [PATCH] mm/vmscan: Do not block forever at shrink_inactive_list().
> 
> Today I discussed with Kosaki-san at LinuxCon Japan 2014 about this issue.
> He does not like the idea of adding timeout to throttle loop. As Dave posted a patch that fixes a bug in XFS delayed allocation, I
> updated my patch accordingly.
> 
> Although the bug in XFS was fixed by Dave's patch, other kernel code would have bugs which would fall into this infinite throttle loop.
> But to keep the possibility of triggering OOM killer minimum, can we agree with this updated patch (and in the future adding some
> warning mechanism like /proc/sys/kernel/hung_task_timeout_secs for detecting memory allocation stall)?
> 
> Dave, if you are OK with this updated patch, please let me know commit ID of your patch.
> 
> Regards.
> ----------
> >From 408e65d9025e8e24838e7bf6ac9066ba8a9391a6 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
> Date: Tue, 20 May 2014 23:34:34 +0900
> Subject: [PATCH] mm/vmscan: Do not throttle kswapd at shrink_inactive_list().
> 
> I can observe that commit 35cd7815 "vmscan: throttle direct reclaim when too many pages are isolated already" causes RHEL7
> environment to stall with 0% CPU usage when a certain type of memory pressure is given.
> This is because nobody can reclaim memory due to rules listed below.
> 
>   (a) XFS uses a kernel worker thread for delayed allocation
>   (b) kswapd wakes up the kernel worker thread for delayed allocation
>   (c) the kernel worker thread is throttled due to commit 35cd7815
> 
> This patch and commit XXXXXXXX "xfs: block allocation work needs to be kswapd aware" will solve rule (c).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
> ---
>  mm/vmscan.c |   20 +++++++++++++++-----
>  1 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 32c661d..5c6960e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1460,12 +1460,22 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	struct zone *zone = lruvec_zone(lruvec);
>  	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> 
> -	while (unlikely(too_many_isolated(zone, file, sc))) {
> -		congestion_wait(BLK_RW_ASYNC, HZ/10);
> +	/*
> +	 * Throttle only direct reclaimers. Allocations by kswapd (and
> +	 * allocation workqueue on behalf of kswapd) should not be
> +	 * throttled here; otherwise memory allocation will deadlock.
> +	 */
> +	if (!sc->hibernation_mode && !current_is_kswapd()) {
> +		while (unlikely(too_many_isolated(zone, file, sc))) {
> +			congestion_wait(BLK_RW_ASYNC, HZ/10);
> 
> -		/* We are about to die and free our memory. Return now. */
> -		if (fatal_signal_pending(current))
> -			return SWAP_CLUSTER_MAX;
> +			/*
> +			 * We are about to die and free our memory.
> +			 * Return now.
> +			 */
> +			if (fatal_signal_pending(current))
> +				return SWAP_CLUSTER_MAX;
> +		}
>  	}


Acked-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>


Dave, I don't like Tetsuo's first patch because this too_many_isolated exist to prevent false oom-kill. So, simple timeout
resurrect it. Please let me know if you need further MM enhancement to solve XFS issue. I'd like join and assist this.

Thanks.






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/