[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100312093755.b2393b33.akpm@linux-foundation.org>
Date: Fri, 12 Mar 2010 09:37:55 -0500
From: Andrew Morton <akpm@...ux-foundation.org>
To: Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
Cc: Mel Gorman <mel@....ul.ie>, linux-mm@...ck.org,
Nick Piggin <npiggin@...e.de>,
Chris Mason <chris.mason@...cle.com>,
Jens Axboe <jens.axboe@...cle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone
pressure
On Fri, 12 Mar 2010 13:15:05 +0100 Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com> wrote:
> > It still feels a bit unnatural though that the page allocator waits on
> > congestion when what it really cares about is watermarks. Even if this
> > patch works for Christian, I think it still has merit so will kick it a
> > few more times.
>
> In whatever way I can look at it watermark_wait should be supperior to
> congestion_wait. Because as Mel points out waiting for watermarks is
> what is semantically correct there.
If a direct-reclaimer waits for some thresholds to be achieved then what
task is doing reclaim?
Ultimately, kswapd. This will introduce a hard dependency upon kswapd
activity. This might introduce scalability problems. And latency
problems if kswapd if off doodling with a slow device (say), or doing a
journal commit. And perhaps deadlocks if kswapd tries to take a lock
which one of the waiting-for-watermark direct relcaimers holds.
Generally, kswapd is an optional, best-effort latency optimisation
thing and we haven't designed for it to be a critical service.
Probably stuff would break were we to do so.
This is one of the reasons why we avoided creating such dependencies in
reclaim. Instead, what we do when a reclaimer is encountering lots of
dirty or in-flight pages is
msleep(100);
then try again. We're waiting for the disks, not kswapd.
Only the hard-wired 100 is a bit silly, so we made the "100" variable,
inversely dependent upon the number of disks and their speed. If you
have more and faster disks then you sleep for less time.
And that's what congestion_wait() does, in a very simplistic fashion.
It's a facility which direct-reclaimers use to ratelimit themselves in
inverse proportion to the speed with which the system can retire writes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists