[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1304010654.2081.25.camel@lenovo>
Date: Thu, 28 Apr 2011 18:10:54 +0100
From: Colin Ian King <colin.king@...onical.com>
To: Mel Gorman <mgorman@...e.de>
Cc: James Bottomley <James.Bottomley@...e.de>, Jan Kara <jack@...e.cz>,
Chris Mason <chris.mason@...cle.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
linux-ext4 <linux-ext4@...r.kernel.org>, mgorman@...ell.com
Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback
related.
On Thu, 2011-04-28 at 16:08 +0100, Mel Gorman wrote:
[ text deleted ]
> Another consequence of this patch is that when high order allocations
> are in progress (is the test case fork heavy in any way for
> example? alternatively, it might be something in the storage stack
> that requires high-order allocs) we are no longer necessarily going
> to sleep because of should_reclaim_continue() check. This could
> explain kswapd-at-99% but would only apply if CONFIG_COMPACTION is
> set (does unsetting CONFIG_COMPACTION help). If the bug only triggers
> for CONFIG_COMPACTION, does the following *untested* patch help any?
Afraid to report this patch didn't help either.
>
> (as a warning, I'm offline Friday until Tuesday morning. I'll try
> check mail over the weekend but it's unlikely I'll find a terminal
> or be allowed to use it without an ass kicking)
Ditto, me, to, I will pick this up Tuesday.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 148c6e6..c74a501 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1842,15 +1842,22 @@ static inline bool should_continue_reclaim(struct zone *zone,
> return false;
>
> /*
> - * If we failed to reclaim and have scanned the full list, stop.
> - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far
> - * faster but obviously would be less likely to succeed
> - * allocation. If this is desirable, use GFP_REPEAT to decide
> - * if both reclaimed and scanned should be checked or just
> - * reclaimed
> + * For direct reclaimers
> + * If we failed to reclaim and have scanned the full list, stop.
> + * The caller will check congestion and sleep if necessary until
> + * some IO completes.
> + * For kswapd
> + * Check just nr_reclaimed. If we are failing to reclaim, we
> + * want to stop this reclaim loop, increase the priority and
> + * go to sleep if necessary to allow IO a change to complete.
> + * This avoids kswapd going into a busy loop in shrink_zone()
> */
> - if (!nr_reclaimed && !nr_scanned)
> - return false;
> + if (!nr_reclaimed) {
> + if (current_is_kswapd())
> + return false;
> + else if (!nr_scanned)
> + return false;
> + }
>
> /*
> * If we have not reclaimed enough pages for compaction and the
> @@ -1924,8 +1931,13 @@ restart:
>
> /* reclaim/compaction might need reclaim to continue */
> if (should_continue_reclaim(zone, nr_reclaimed,
> - sc->nr_scanned - nr_scanned, sc))
> + sc->nr_scanned - nr_scanned, sc)) {
> + /* Throttle direct reclaimers if congested */
> + if (!current_is_kswapd())
> + wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
> +
> goto restart;
> + }
>
> throttle_vm_writeout(sc->gfp_mask);
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists