linux-kernel - Re: [PATCH] mm: disallow direct reclaim page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <op.vbdgxkherwwil4@sfaibish1.corp.emc.com>
Date:	Sun, 18 Apr 2010 15:11:34 -0400
From:	"Sorin Faibish" <sfaibish@....com>
To:	"Christoph Hellwig" <hch@...radead.org>,
	"Andrew Morton" <akpm@...ux-foundation.org>
Cc:	"Mel Gorman" <mel@....ul.ie>, "Dave Chinner" <david@...morbit.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>,
	"Chris Mason" <chris.mason@...cle.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback

On Sun, 18 Apr 2010 15:05:26 -0400, Christoph Hellwig <hch@...radead.org>  
wrote:

> On Sat, Apr 17, 2010 at 08:32:39PM -0400, Andrew Morton wrote:
>> The poor IO patterns thing is a regression.  Some time several years
>> ago (around 2.6.16, perhaps), page reclaim started to do a LOT more
>> dirty-page writeback than it used to.  AFAIK nobody attempted to work
>> out why, nor attempted to try to fix it.
>
> I just know that we XFS guys have been complaining about it a lot..
I know also that the ext3 and reisefs guys complained about this issue
as well.

>
> But that was mostly a tuning issue - before writeout mostly happened
> from pdflush.  If we got into kswapd or direct reclaim we already
> did get horrible I/O patterns - it just happened far less often.
>
>> Regarding simply not doing any writeout in direct reclaim (Dave's
>> initial proposal): the problem is that pageout() will clean a page in
>> the target zone.  Normal writeout won't do that, so we could get into a
>> situation where vast amounts of writeout is happening, but none of it
>> is cleaning pages in the zone which we're trying to allocate from.
>> It's quite possibly livelockable, too.
>
> As Chris mentioned currently btrfs and ext4 do not actually do delalloc
> conversions from this path, so for typical workloads the amount of
> writeout that can happen from this path is extremly limited.  And unless
> we get things fixed we will have to do the same for XFS.  I'd be much
> more happy if we could just sort it out at the VM level, because this
> means we have one sane place for this kind of policy instead of three
> or more hacks down inside the filesystems.  It's rather interesting
> that all people on the modern fs side completely agree here what the
> problem is, but it seems rather hard to convince the VM side to do
> anything about it.
>
>> To solve the stack-usage thing: dunno, really.  One could envisage code
>> which skips pageout() if we're using more than X amount of stack, but
>> that sucks.
>
> And it doesn't solve other issues, like the whole lock taking problem.
>
>> Another possibility might be to hand the target page over
>> to another thread (I suppose kswapd will do) and then synchronise with
>> that thread - get_page()+wait_on_page_locked() is one way.  The helper
>> thread could of course do writearound.
>
> Allowing the flusher threads to do targeted writeout would be the
> best from the FS POV.  We'll still have one source of the I/O, just
> with another know on how to select the exact region to write out.
> We can still synchronously wait for the I/O for lumpy reclaim if really
> nessecary.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"  
> in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Best Regards
Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

        EMC²
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : sfaibish@....com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/