linux-kernel - Re: [PATCH 7/8] writeback: sync old inodes first in background writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100726030813.GA7668@localhost>
Date:	Mon, 26 Jul 2010 11:08:13 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Mel Gorman <mel@....ul.ie>, Christoph Hellwig <hch@...radead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Dave Chinner <david@...morbit.com>,
	Chris Mason <chris.mason@...cle.com>,
	Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>
Subject: Re: [PATCH 7/8] writeback: sync old inodes first in background
 writeback

KOSAKI,

> My reviewing doesn't found any bug. however I think original thread have too many guess
> and we need to know reproduce way and confirm it.
> 
> At least, we need three confirms.
>  o original issue is still there?

As long as the root cause is still there :)

>  o DEF_PRIORITY/3 is best value?

There are no best value. I suspect the whole PAGEOUT_IO_SYNC and
wait_on_page_writeback() approach is a terrible workaround and should
be avoided as much as possible. This is why I lifted the bar from
DEF_PRIORITY/2 to DEF_PRIORITY/3.

wait_on_page_writeback() is bad because for a typical desktop, one
single call may block 1-10 seconds (remember we are under memory
pressure, which is almost always accompanied with busy disk IO, so
the page will wait noticeable time in the IO queue). To make it worse,
it is very possible there are 10 more dirty/writeback pages in the
isolated pages(dirty pages are often clustered). This ends up with
10-100 seconds stall time.

We do need some throttling under memory pressure. However stall time
more than 1s is not acceptable. A simple congestion_wait() may be
better, since it waits on _any_ IO completion (which will likely
release a set of PG_reclaim pages) rather than one specific IO
completion. This makes much smoother stall time.
wait_on_page_writeback() shall really be the last resort.
DEF_PRIORITY/3 means 1/16=6.25%, which is closer.

Since dirty/writeback pages are such a bad factor under memory
pressure, it may deserve to adaptively shrink dirty_limit as well.
When short on memory, why not reduce the dirty/writeback page cache?
This will not only consume memory, but also considerably improve IO
efficiency and responsiveness. When the LRU lists are scanned fast
(under memory pressure), it is likely lots of the dirty pages are
caught by pageout(). Reducing the number of dirty pages reduces the
pageout() invocations.

>  o Current approach have better performance than Wu's original proposal? (below)

I guess it will have better user experience :)

> Anyway, please feel free to use my reviewed-by tag.

Thanks,
Fengguang

> --- linux-next.orig/mm/vmscan.c	2010-06-24 14:32:03.000000000 +0800
> +++ linux-next/mm/vmscan.c	2010-07-22 16:12:34.000000000 +0800
> @@ -1650,7 +1650,7 @@ static void set_lumpy_reclaim_mode(int p
>  	 */
>  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
>  		sc->lumpy_reclaim_mode = 1;
> -	else if (sc->order && priority < DEF_PRIORITY - 2)
> +	else if (sc->order && priority < DEF_PRIORITY / 2)
>  		sc->lumpy_reclaim_mode = 1;
>  	else
>  		sc->lumpy_reclaim_mode = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/