linux-kernel - Re: [PATCH 12/12] mm, page_alloc: Only enforce watermarks for order-0 allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150921105141.GB3068@techsingularity.net>
Date:	Mon, 21 Sep 2015 11:51:41 +0100
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	David Rientjes <rientjes@...gle.com>,
	Michal Hocko <mhocko@...nel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 12/12] mm, page_alloc: Only enforce watermarks for
 order-0 allocations

On Fri, Sep 18, 2015 at 03:56:21PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 09, 2015 at 01:39:01PM +0100, Mel Gorman wrote:
> > On Tue, Sep 08, 2015 at 05:26:13PM +0900, Joonsoo Kim wrote:
> > > 2015-08-24 21:30 GMT+09:00 Mel Gorman <mgorman@...hsingularity.net>:
> > > > The primary purpose of watermarks is to ensure that reclaim can always
> > > > make forward progress in PF_MEMALLOC context (kswapd and direct reclaim).
> > > > These assume that order-0 allocations are all that is necessary for
> > > > forward progress.
> > > >
> > > > High-order watermarks serve a different purpose. Kswapd had no high-order
> > > > awareness before they were introduced (https://lkml.org/lkml/2004/9/5/9).
> > > > This was particularly important when there were high-order atomic requests.
> > > > The watermarks both gave kswapd awareness and made a reserve for those
> > > > atomic requests.
> > > >
> > > > There are two important side-effects of this. The most important is that
> > > > a non-atomic high-order request can fail even though free pages are available
> > > > and the order-0 watermarks are ok. The second is that high-order watermark
> > > > checks are expensive as the free list counts up to the requested order must
> > > > be examined.
> > > >
> > > > With the introduction of MIGRATE_HIGHATOMIC it is no longer necessary to
> > > > have high-order watermarks. Kswapd and compaction still need high-order
> > > > awareness which is handled by checking that at least one suitable high-order
> > > > page is free.
> > > 
> > > I still don't think that this one suitable high-order page is enough.
> > > If fragmentation happens, there would be no order-2 freepage. If kswapd
> > > prepares only 1 order-2 freepage, one of two successive process forks
> > > (AFAIK, fork in x86 and ARM require order 2 page) must go to direct reclaim
> > > to make order-2 freepage. Kswapd cannot make order-2 freepage in that
> > > short time. It causes latency to many high-order freepage requestor
> > > in fragmented situation.
> > > 
> > 
> > So what do you suggest instead? A fixed number, some other heuristic?
> > You have pushed several times now for the series to focus on the latency
> > of standard high-order allocations but again I will say that it is outside
> > the scope of this series. If you want to take steps to reduce the latency
> > of ordinary high-order allocation requests that can sleep then it should
> > be a separate series.
> 
> I don't understand why you think it should be a separate series.

Because atomic high-order allocation success and normal high-order
allocation stall latency are different problems. Atomic high-order
allocation successes are about reserves, normal high-order allocations
are about reclaim.

> I don't know exact reason why high order watermark check is
> introduced, but, based on your description, it is for high-order
> allocation request in atomic context.

Mostly yes, the initial motivation is described in the linked mail --
give kswapd high-order awareness because otherwise (higher-order && !wait)
allocations that fail would wake kswapd but it would go back to sleep.

> And, it would accidently take care
> about latency.

Except all it does is defer the problem. If kswapd frees N high-order
pages then it disrupts the system to satisfy the request, potentially
reclaiming hot pages for an allocation attempt that *may* occur that
will stall if there are N+1 allocation requests.

Kswapd reclaiming additional pages is definite system disruption and
potentially increases thrashing *now* to help an event that *might* occur
in the future.

> It is used for a long time and your patch try to remove it
> and it only takes care about success rate. That means that your patch
> could cause regression. I think that if this happens actually, it is handled
> in this patchset instead of separate series.
> 

Except it doesn't really.

Current situation
o A high-order watermark check might fail for a normal high-order
  allocation request. On failure, stall to reclaim more pages which may
  or may not succeed
o An atomic allocation may use a lower watermark but it can still fail
  even if there are free pages on the list

Patched situation

o A watermark check might fail for a normal high-order allocation
  request and cannot use one of the reserved pages. On failure, stall to
  reclaim more pages which may or may not succeed.
  Functionally, this is very similar to current behaviour
o An atomic allocation may use the reserves so if a free page exists, it
  will be used
  Functionally, this is more reliable than current behaviour as there is
  still potential for disruption

> In review of previous version, I suggested that removing watermark
> check only for higher than PAGE_ALLOC_COSTLY_ORDER.

It increases complexity for reasons that are not quantified.

> You didn't accept
> that and I still don't agree with your approach. You can show me that
> my concern is wrong via some number.
> 
> One candidate test for this is that making system fragmented and
> run hackbench which uses a lot of high-order allocation and measure
> elapsed-time.
> 

o There is no difference in normal allocation high-order success rates
  with this series appied
o With the series applied, such tests complete in approximately the same
  time
o For the tests with parallel high-order allocation requests, there was
  no significant difference in the elapsed times although success rates
  were slightly higher

Each time the full sets of tests take about 4 days to complete on this
series and so far no problems of the type you describe have been found.
If such a test case is found then there would a clear workload to
justify either having kswapd reclaiming multiple pages or apply the old
watermark scheme for lower orders.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/