linux-kernel - Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B966C7A.5040706@linux.vnet.ibm.com>
Date:	Tue, 09 Mar 2010 16:42:50 +0100
From:	Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
To:	Nick Piggin <npiggin@...e.de>, Mel Gorman <mel@....ul.ie>
CC:	linux-mm@...ck.org, Chris Mason <chris.mason@...cle.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure
 to relieve instead of congestion



Nick Piggin wrote:
> On Tue, Mar 09, 2010 at 02:17:13PM +0000, Mel Gorman wrote:
>> On Wed, Mar 10, 2010 at 12:35:13AM +1100, Nick Piggin wrote:
>>> On Mon, Mar 08, 2010 at 11:48:21AM +0000, Mel Gorman wrote:
>>>> Under heavy memory pressure, the page allocator may call congestion_wait()
>>>> to wait for IO congestion to clear or a timeout. This is not as sensible
>>>> a choice as it first appears. There is no guarantee that BLK_RW_ASYNC is
>>>> even congested as the pressure could have been due to a large number of
>>>> SYNC reads and the allocator waits for the entire timeout, possibly uselessly.
>>>>
>>>> At the point of congestion_wait(), the allocator is struggling to get the
>>>> pages it needs and it should back off. This patch puts the allocator to sleep
>>>> on a zone->pressure_wq for either a timeout or until a direct reclaimer or
>>>> kswapd brings the zone over the low watermark, whichever happens first.
>>>>
>>>> Signed-off-by: Mel Gorman <mel@....ul.ie>
>>>> ---
>>>>  include/linux/mmzone.h |    3 ++
>>>>  mm/internal.h          |    4 +++
>>>>  mm/mmzone.c            |   47 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  mm/page_alloc.c        |   50 +++++++++++++++++++++++++++++++++++++++++++----
>>>>  mm/vmscan.c            |    2 +
>>>>  5 files changed, 101 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index 30fe668..72465c1 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
[...]
>>>> +{
>>>> +	/* If no process is waiting, nothing to do */
>>>> +	if (!waitqueue_active(zone->pressure_wq))
>>>> +		return;
>>>> +
>>>> +	/* Check if the high watermark is ok for order 0 */
>>>> +	if (zone_watermark_ok(zone, 0, low_wmark_pages(zone), 0, 0))
>>>> +		wake_up_interruptible(zone->pressure_wq);
>>>> +}
>>> If you were to do this under the zone lock (in your subsequent patch),
>>> then it could avoid races. I would suggest doing it all as a single
>>> patch and not doing the pressure checks in reclaim at all.
>>>
>> That is reasonable. I've already dropped the checks in reclaim because as you
>> say, if the free path check is cheap enough, it's also sufficient. Checking
>> in the reclaim paths as well is redundant.
>>
>> I'll move the call to check_zone_pressure() within the zone lock to avoid
>> races.
>>

Mel, we talked about a thundering herd issue that might come up here in 
very constraint cases.
So wherever you end up putting that wake_up call, how about being extra 
paranoid about a thundering herd flagging them WQ_FLAG_EXCLUSIVE and 
waking them with something like that:

wake_up_interruptible_nr(zone->pressure_wq, #nrofpagesabovewatermark#);

That should be an easy to calculate sane max of waiters to wake up.
On the other hand it might be over-engineered and it implies the need to 
reconsider when it would be best to wake up the rest.

Get me right - I don't really have a hard requirement or need for that, 
I just wanted to mention it early on to hear your opinions about it.

looking forward to test the v2 patch series, adapted to all the good 
stuff already discussed.

-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/