linux-kernel - Re: [PATCH 01/13] writeback: IO-less balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20101117150814.393ab033.akpm@linux-foundation.org>
Date:	Wed, 17 Nov 2010 15:08:14 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Jan Kara <jack@...e.cz>, Chris Mason <chris.mason@...cle.com>,
	Dave Chinner <david@...morbit.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
	"Theodore Ts'o" <tytso@....edu>, Mel Gorman <mel@....ul.ie>,
	Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-mm <linux-mm@...ck.org>, <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/13] writeback: IO-less balance_dirty_pages()

On Wed, 17 Nov 2010 12:27:21 +0800
Wu Fengguang <fengguang.wu@...el.com> wrote:

> Since the task will be soft throttled earlier than before, it may be
> perceived by end users as performance "slow down" if his application
> happens to dirty more than ~15% memory.

writeback has always had these semi-bogus assumptions that all pages
are the same, and it can sometimes go very wrong.

A chronic case would be a 4GB i386 machine where only 1/4 of memory is
useable for GFP_KERNEL allocations, filesystem metadata and /dev/sdX
pagecache.

When you think about it, a lot of the throttling work being done in
writeback is really being done on behalf of the page allocator (and
hence page reclaim).  But what happens if the workload is mainly
hammering away at ZONE_NORMAL, but writeback is considering ZONE_NORMAL
to be the same thing as ZONE_HIGHMEM?

Or vice versa, where page-dirtyings are all happening in lowmem?  Can
writeback then think that there are plenty of clean pages (because it's
looking at highmem as well) so little or no throttling is happening? 
If so, what effect does this have upon GFP_KERNEL/GFP_USER allocation?

And bear in mind that the user can tune the dirty levels.  If they're
set to 10% on a machine on which 25% of memory is lowmem then ill
effects might be rare.  But if the user tweaks the thresholds to 30%
then can we get into problems?  Such as a situation where 100% of
lowmem is dirty and throttling isn't cutting in?

So please have a think about that and see if you can think of ways in
which this assumption can cause things to go bad.  I'd suggest
writing some targetted tests which write to /dev/sdX (to generate
lowmem-only dirty pages) and which read from /dev/sdX (to request
allocation of lowmem pages).  Run these tests in conjunction with tests
which exercise the highmem zone as well and check that everything
behaves as expected.

Of course, this all assumes that you have a 4GB i386 box :( It's almost
getting to the stage where we need a fake-zone-highmem option for
x86_64 boxes just so we can test this stuff.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/