[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110928071154.GA23535@redhat.com>
Date: Wed, 28 Sep 2011 09:11:54 +0200
From: Johannes Weiner <jweiner@...hat.com>
To: Minchan Kim <minchan.kim@...il.com>
Cc: Andrew Morton <akpm@...gle.com>, Mel Gorman <mgorman@...e.de>,
Christoph Hellwig <hch@...radead.org>,
Dave Chinner <david@...morbit.com>,
Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>,
Rik van Riel <riel@...hat.com>,
Chris Mason <chris.mason@...cle.com>,
"Theodore Ts'o" <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>, xfs@....sgi.com,
linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across
zones
On Wed, Sep 28, 2011 at 02:56:40PM +0900, Minchan Kim wrote:
> On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> > The maximum number of dirty pages that exist in the system at any time
> > is determined by a number of pages considered dirtyable and a
> > user-configured percentage of those, or an absolute number in bytes.
>
> It's explanation of old approach.
What do you mean? This does not change with this patch. We still
have a number of dirtyable pages and a limit that is applied
relatively to this number.
> > This number of dirtyable pages is the sum of memory provided by all
> > the zones in the system minus their lowmem reserves and high
> > watermarks, so that the system can retain a healthy number of free
> > pages without having to reclaim dirty pages.
>
> It's a explanation of new approach.
Same here, this aspect is also not changed with this patch!
> > But there is a flaw in that we have a zoned page allocator which does
> > not care about the global state but rather the state of individual
> > memory zones. And right now there is nothing that prevents one zone
> > from filling up with dirty pages while other zones are spared, which
> > frequently leads to situations where kswapd, in order to restore the
> > watermark of free pages, does indeed have to write pages from that
> > zone's LRU list. This can interfere so badly with IO from the flusher
> > threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> > requests from reclaim already, taking away the VM's only possibility
> > to keep such a zone balanced, aside from hoping the flushers will soon
> > clean pages from that zone.
>
> It's a explanation of old approach, again!
> Shoudn't we move above phrase of new approach into below?
Everything above describes the current behaviour (at the point of this
patch, so respecting lowmem_reserve e.g. is part of the current
behaviour by now) and its problems. And below follows a description
of how the patch tries to fix it.
> > Enter per-zone dirty limits. They are to a zone's dirtyable memory
> > what the global limit is to the global amount of dirtyable memory, and
> > try to make sure that no single zone receives more than its fair share
> > of the globally allowed dirty pages in the first place. As the number
> > of pages considered dirtyable exclude the zones' lowmem reserves and
> > high watermarks, the maximum number of dirty pages in a zone is such
> > that the zone can always be balanced without requiring page cleaning.
> >
> > As this is a placement decision in the page allocator and pages are
> > dirtied only after the allocation, this patch allows allocators to
> > pass __GFP_WRITE when they know in advance that the page will be
> > written to and become dirty soon. The page allocator will then
> > attempt to allocate from the first zone of the zonelist - which on
> > NUMA is determined by the task's NUMA memory policy - that has not
> > exceeded its dirty limit.
> >
> > At first glance, it would appear that the diversion to lower zones can
> > increase pressure on them, but this is not the case. With a full high
> > zone, allocations will be diverted to lower zones eventually, so it is
> > more of a shift in timing of the lower zone allocations. Workloads
> > that previously could fit their dirty pages completely in the higher
> > zone may be forced to allocate from lower zones, but the amount of
> > pages that 'spill over' are limited themselves by the lower zones'
> > dirty constraints, and thus unlikely to become a problem.
>
> That's a good justification.
>
> > For now, the problem of unfair dirty page distribution remains for
> > NUMA configurations where the zones allowed for allocation are in sum
> > not big enough to trigger the global dirty limits, wake up the flusher
> > threads and remedy the situation. Because of this, an allocation that
> > could not succeed on any of the considered zones is allowed to ignore
> > the dirty limits before going into direct reclaim or even failing the
> > allocation, until a future patch changes the global dirty throttling
> > and flusher thread activation so that they take individual zone states
> > into account.
> >
> > Signed-off-by: Johannes Weiner <jweiner@...hat.com>
>
> Otherwise, looks good to me.
> Reviewed-by: Minchan Kim <minchan.kim@...il.com>
Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists