linux-ext4 - Re: [BUG] fatal hang untarring 90GB file, possibly writeback related.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1304010654.2081.25.camel@lenovo>
Date:	Thu, 28 Apr 2011 18:10:54 +0100
From:	Colin Ian King <colin.king@...onical.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	James Bottomley <James.Bottomley@...e.de>, Jan Kara <jack@...e.cz>,
	Chris Mason <chris.mason@...cle.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>, mgorman@...ell.com
Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback
 related.

On Thu, 2011-04-28 at 16:08 +0100, Mel Gorman wrote:

[ text deleted ]

> Another consequence of this patch is that when high order allocations
> are in progress (is the test case fork heavy in any way for
> example? alternatively, it might be something in the storage stack
> that requires high-order allocs) we are no longer necessarily going
> to sleep because of should_reclaim_continue() check. This could
> explain kswapd-at-99% but would only apply if CONFIG_COMPACTION is
> set (does unsetting CONFIG_COMPACTION help). If the bug only triggers
> for CONFIG_COMPACTION, does the following *untested* patch help any?

Afraid to report this patch didn't help either.
> 
> (as a warning, I'm offline Friday until Tuesday morning. I'll try
> check mail over the weekend but it's unlikely I'll find a terminal
> or be allowed to use it without an ass kicking)

Ditto, me, to, I will pick this up Tuesday.
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 148c6e6..c74a501 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1842,15 +1842,22 @@ static inline bool should_continue_reclaim(struct zone *zone,
>  		return false;
>  
>  	/*
> -	 * If we failed to reclaim and have scanned the full list, stop.
> -	 * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far
> -	 *       faster but obviously would be less likely to succeed
> -	 *       allocation. If this is desirable, use GFP_REPEAT to decide
> -	 *       if both reclaimed and scanned should be checked or just
> -	 *       reclaimed
> +	 * For direct reclaimers
> +	 *   If we failed to reclaim and have scanned the full list, stop.
> +	 *   The caller will check congestion and sleep if necessary until
> +	 *   some IO completes.
> +	 * For kswapd
> +	 *   Check just nr_reclaimed. If we are failing to reclaim, we
> +	 *   want to stop this reclaim loop, increase the priority and
> +	 *   go to sleep if necessary to allow IO a change to complete.
> +	 *   This avoids kswapd going into a busy loop in shrink_zone()
>  	 */
> -	if (!nr_reclaimed && !nr_scanned)
> -		return false;
> +	if (!nr_reclaimed) {
> +		if (current_is_kswapd())
> +			return false;
> +		else if (!nr_scanned)
> +			return false;
> +	}
>  
>  	/*
>  	 * If we have not reclaimed enough pages for compaction and the
> @@ -1924,8 +1931,13 @@ restart:
>  
>  	/* reclaim/compaction might need reclaim to continue */
>  	if (should_continue_reclaim(zone, nr_reclaimed,
> -					sc->nr_scanned - nr_scanned, sc))
> +					sc->nr_scanned - nr_scanned, sc)) {
> +		/* Throttle direct reclaimers if congested */
> +		if (!current_is_kswapd())
> +			wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
> +
>  		goto restart;
> +	}
>  
>  	throttle_vm_writeout(sc->gfp_mask);
>  }


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html