lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Jul 2015 14:21:29 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Vlastimil Babka <vbabka@...e.cz>
cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [RFC 1/4] mm, compaction: introduce kcompactd

On Thu, 23 Jul 2015, Vlastimil Babka wrote:

> > When a khugepaged allocation fails for a node, it could easily kick off 
> > background compaction on that node and revisit the range later, very 
> > similar to how we can kick off background compaction in the page allocator 
> > when async or sync_light compaction fails.
> 
> The revisiting sounds rather complicated. Page allocator doesn't have to do that.
> 

I'm referring to khugepaged having a hugepage allocation fail, the page 
allocator kicking off background compaction, and khugepaged rescanning the 
same memory for which the allocation failed later.

> > The distinction I'm trying to draw is between "periodic" and "background" 
> > compaction.  I think there're usecases for both and we shouldn't be 
> > limiting ourselves to one or the other.
> 
> OK, I understand you think we can have both, and the periodic one would be in
> khugepaged. My main concern is that if we do the periodic one in khugepaged,
> people might oppose adding yet another one as kcompactd. I hope we agree that
> khugepaged is not suitable for all the use cases of the background one.
> 

Yes, absolutely.  I agree that we need the ability to do background 
compaction without requiring CONFIG_TRANSPARENT_HUGEPAGE.

> My secondary concern/opinion is that I would hope that the background compaction
> would be good enough to remove the need for the periodic one. So I would try the
> background one first. But I understand the periodic one is simpler to implement.
> On the other hand, it's not as urgent if you can simulate it from userspace.
> With the 15min period you use, there's likely not much overhead saved when
> invoking it from within the kernel? Sure there wouldn't be the synchronization
> with khugepaged activity, but I still wonder if wiating for up to 1 minute
> before khugepaged wakes up can make much difference with the 15min period.
> Hm, your cron job could also perhaps adjust the khugepaged sleep tunable when
> compaction is done, which IIRC results in immediate wakeup.
> 

There are certainly ways to do this from userspace, but the premise is 
that this issue, specifically for users of thp, is significant for 
everyone ;)

The problem that I've encountered with a background-only approach is that 
it doesn't help when you exec a large process that wants to fault most of 
its text and thp immediately cannot be allocated.  This can be a result of 
never having done any compaction at all other than from the page 
allocator, which terminates when a page of the given order is available.  
So on a fragmented machine, all memory faulted is shown in 
thp_fault_fallback and we rely on khugepaged to (slowly) fix this problem 
up for us.  We have shown great improvement in cpu utilization by 
periodically compacting memory today.

Background compaction arguably wouldn't help that situation because it's 
not fast enough to compact memory simultaneous to the large number of page 
faults, and you can't wait for it to complete at exec().  The result is 
the same: large thp_fault_fallback.

So I can understand the need for both periodic and background compaction 
(and direct compaction for non-thp non-atomic high-order allocations 
today) and I'm perhaps not as convinced as you are that we can eventually 
do without periodic compaction.


It seems to me that the vast majority of this discussion has centered 
around the vehicle that performs the compaction.  We certainly require 
kcompactd for background compaction, and we both agree that we need that 
functionality.

Two issues I want to bring up:

 (1) do non-thp configs benefit from periodic compaction?

     In my experience, no, but perhaps there are other use cases where
     this has been a pain.  The primary candidates, in my opinion,
     would be the networking stack and slub.  Joonsoo reports having to
     workaround issues with high-order slub allocations being too
     expensive.  I'm not sure that would be better served by periodic
     compaction, but it seems like a candidate for background compaction.

     This is why my rfc tied periodic compaction to khugepaged, and we
     have strong evidence that this helps thp and cpu utilization.  For
     periodic compaction to be possible outside of thp, we'd need a use
     case for it.

 (2) does kcompactd have to be per-node?

     I don't see the immediate benefit since direct compaction can
     already scan remote memory and migrate it, khugepaged can do the
     same.  Is there evidence that suggests that a per-node kcompactd
     is significantly better than a single kthread?  I think others
     would be more receptive of a single kthread addition.

My theory is that periodic compaction is only significantly beneficial for 
thp per my rfc, and I think there's a significant advantage for khugepaged 
to be able to trigger this periodic compaction immediately before scanning 
and allocating to avoid waiting potentially for the lengthy 
alloc_sleep_millisecs.  I don't see a problem with defining the period 
with a khugepaged tunable for that reason.

For background compaction, which is more difficult, it would be simple to 
implement a kcompactd to perform the memory compaction and actually be 
triggered by khugepaged to do the compaction on its behalf and wait to 
scan and allocate until it' complete.  The vehicle will probably end up as 
kcompactd doing the actual compaction is both cases.

But until we have a background compaction implementation, it seems like 
there's no objection to doing and defining periodic compaction in 
khugepaged as the rfc proposes?  It seems like we can easily extend that 
in the future once background compaction is available.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ