[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100309182228.GJ4883@csn.ul.ie>
Date: Tue, 9 Mar 2010 18:22:28 +0000
From: Mel Gorman <mel@....ul.ie>
To: Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@...e.de>, linux-mm@...ck.org,
Chris Mason <chris.mason@...cle.com>,
Jens Axboe <jens.axboe@...cle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on
pressure to relieve instead of congestion
On Tue, Mar 09, 2010 at 04:42:50PM +0100, Christian Ehrhardt wrote:
>
>
> Nick Piggin wrote:
>> On Tue, Mar 09, 2010 at 02:17:13PM +0000, Mel Gorman wrote:
>>> On Wed, Mar 10, 2010 at 12:35:13AM +1100, Nick Piggin wrote:
>>>> On Mon, Mar 08, 2010 at 11:48:21AM +0000, Mel Gorman wrote:
>>>>> Under heavy memory pressure, the page allocator may call congestion_wait()
>>>>> to wait for IO congestion to clear or a timeout. This is not as sensible
>>>>> a choice as it first appears. There is no guarantee that BLK_RW_ASYNC is
>>>>> even congested as the pressure could have been due to a large number of
>>>>> SYNC reads and the allocator waits for the entire timeout, possibly uselessly.
>>>>>
>>>>> At the point of congestion_wait(), the allocator is struggling to get the
>>>>> pages it needs and it should back off. This patch puts the allocator to sleep
>>>>> on a zone->pressure_wq for either a timeout or until a direct reclaimer or
>>>>> kswapd brings the zone over the low watermark, whichever happens first.
>>>>>
>>>>> Signed-off-by: Mel Gorman <mel@....ul.ie>
>>>>> ---
>>>>> include/linux/mmzone.h | 3 ++
>>>>> mm/internal.h | 4 +++
>>>>> mm/mmzone.c | 47 +++++++++++++++++++++++++++++++++++++++++++++
>>>>> mm/page_alloc.c | 50 +++++++++++++++++++++++++++++++++++++++++++----
>>>>> mm/vmscan.c | 2 +
>>>>> 5 files changed, 101 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>>> index 30fe668..72465c1 100644
>>>>> --- a/include/linux/mmzone.h
>>>>> +++ b/include/linux/mmzone.h
> [...]
>>>>> +{
>>>>> + /* If no process is waiting, nothing to do */
>>>>> + if (!waitqueue_active(zone->pressure_wq))
>>>>> + return;
>>>>> +
>>>>> + /* Check if the high watermark is ok for order 0 */
>>>>> + if (zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0))
>>>>> + wake_up_interruptible(zone->pressure_wq);
>>>>> +}
>>>> If you were to do this under the zone lock (in your subsequent patch),
>>>> then it could avoid races. I would suggest doing it all as a single
>>>> patch and not doing the pressure checks in reclaim at all.
>>>>
>>> That is reasonable. I've already dropped the checks in reclaim because as you
>>> say, if the free path check is cheap enough, it's also sufficient. Checking
>>> in the reclaim paths as well is redundant.
>>>
>>> I'll move the call to check_zone_pressure() within the zone lock to avoid
>>> races.
>>>
>
> Mel, we talked about a thundering herd issue that might come up here in
> very constraint cases.
> So wherever you end up putting that wake_up call, how about being extra
> paranoid about a thundering herd flagging them WQ_FLAG_EXCLUSIVE and
> waking them with something like that:
>
> wake_up_interruptible_nr(zone->pressure_wq, #nrofpagesabovewatermark#);
>
> That should be an easy to calculate sane max of waiters to wake up.
> On the other hand it might be over-engineered and it implies the need to
> reconsider when it would be best to wake up the rest.
>
It seems over-engineering considering that they wake up after a timeout
unconditionally. I think it's best for the moment to let them wake up in
a herd and recheck their zonelists as they'll go back to sleep if
necessary.
> Get me right - I don't really have a hard requirement or need for that,
> I just wanted to mention it early on to hear your opinions about it.
>
> looking forward to test the v2 patch series, adapted to all the good
> stuff already discussed.
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists