linux-kernel - Re: [PATCH 0/5] [RFC] transfer ASYNC vmscan writeback IO to the flusher threads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100730075819.GE8811@localhost>
Date:	Fri, 30 Jul 2010 15:58:19 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Chris Mason <chris.mason@...cle.com>,
	Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Christoph Hellwig <hch@...radead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Mel Gorman <mel@....ul.ie>, Minchan Kim <minchan.kim@...il.com>
Subject: Re: [PATCH 0/5]  [RFC] transfer ASYNC vmscan writeback IO to the
 flusher threads

On Fri, Jul 30, 2010 at 07:23:30AM +0800, Dave Chinner wrote:
> On Thu, Jul 29, 2010 at 07:51:42PM +0800, Wu Fengguang wrote:
> > Andrew,
> > 
> > It's possible to transfer ASYNC vmscan writeback IOs to the flusher threads.
> > This simple patchset shows the basic idea. Since it's a big behavior change,
> > there are inevitably lots of details to sort out. I don't know where it will
> > go after tests and discussions, so the patches are intentionally kept simple.
> > 
> > sync livelock avoidance (need more to be complete, but this is minimal required for the last two patches)
> > 	[PATCH 1/5] writeback: introduce wbc.for_sync to cover the two sync stages
> > 	[PATCH 2/5] writeback: stop periodic/background work on seeing sync works
> > 	[PATCH 3/5] writeback: prevent sync livelock with the sync_after timestamp
> > 
> > let the flusher threads do ASYNC writeback for pageout()
> > 	[PATCH 4/5] writeback: introduce bdi_start_inode_writeback()
> > 	[PATCH 5/5] vmscan: transfer async file writeback to the flusher
> 
> I really do not like this - all it does is transfer random page writeback
> from vmscan to the flusher threads rather than avoiding random page
> writeback altogether. Random page writeback is nasty - just say no.

There are cases we have to do pageout().

- a stressed memcg with lots of dirty pages
- a large NUMA system whose nodes have unbalanced vmscan rate and dirty pages

In the above cases, the whole system may not be that stressed,
except for some local LRU list being busy scanned.  If the local
memory stress lead to lots of pageout(), it could bring down the whole
system by congesting the disks with many small seeky IO.

It may be an overkill to push global writeback (ie. it's silly to sync
1GB dirty data because there is a small stressed 100MB LRU list). The
obvious solution is to keep the pageout() calls and make them more IO
wise by doing write-around at the same time.  The write-around pages
will likely be in the same stressed LRU list, hence will do good for
page reclaim as well.

Transferring ASYNC work to the flushers helps the kswapd-vs-flusher
priority problem too. Currently the kswapd/direct reclaim either have
to skip dirty pages on congestion, or to risk being blocked in
get_request_wait(), both are not good options. However the use of
bdi_start_inode_writeback() do ask for a good vmscan throttling scheme
to prevent it falsely OOM before the flusher is able to clean the
transfered pages. This would be tricky.

If the system is globally memory stressed and run into pageout(), we
can safely kick the flusher threads for more writeback. There are 3
possible schemes:

- to kick writeback for N pages, eg. the existing wakeup_flusher_threads() calls

- to lower dirty_expire_interval, eg. to enqueue the current inode
  (that contains the current dirty page for pageout()) _plus_ all
  older inodes for writeback. This can be done when servicing the
  for_reclaim writeback work.

- to lower dirty throttle limit (trying to find a criterion...)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/