lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Apr 2023 22:08:14 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     linux-mm@...ck.org, Kaiyang Zhao <kaiyang2@...cmu.edu>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Rientjes <rientjes@...gle.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [RFC PATCH 00/26] mm: reliable huge page allocator

Hi Kirill, thanks for taking a look so quickly.

On Wed, Apr 19, 2023 at 02:54:02AM +0300, Kirill A. Shutemov wrote:
> On Tue, Apr 18, 2023 at 03:12:47PM -0400, Johannes Weiner wrote:
> > This series proposes to make THP allocations reliable by enforcing
> > pageblock hygiene, and aligning the allocator, reclaim and compaction
> > on the pageblock as the base unit for managing free memory. All orders
> > up to and including the pageblock are made first-class requests that
> > (outside of OOM situations) are expected to succeed without
> > exceptional investment by the allocating thread.
> > 
> > A neutral pageblock type is introduced, MIGRATE_FREE. The first
> > allocation to be placed into such a block claims it exclusively for
> > the allocation's migratetype. Fallbacks from a different type are no
> > longer allowed, and the block is "kept open" for more allocations of
> > the same type to ensure tight grouping. A pageblock becomes neutral
> > again only once all its pages have been freed.
> 
> Sounds like this will cause earlier OOM, no?
> 
> I guess with 2M pageblock on 64G server it shouldn't matter much. But how
> about smaller machines?

Yes, it's a tradeoff.

It's not really possible to reduce external fragmentation and increase
contiguity, without also increasing the risk of internal fragmentation
to some extent. The tradeoff is slighly less but overall faster memory.

A 2M block size *seems* reasonable for most current setups. It's
actually still somewhat on the lower side, if you consider that we had
4k blocks when memory was a few megabytes. (4k pages for 4M RAM is the
same ratio as 2M pages for 2G RAM. My phone has 8G and my desktop 32G.
64G is unusually small for a datacenter server.)

I wouldn't be opposed to sticking this behind a separate config option
if there are setups that WOULD want to keep the current best-effort
compaction without the block hygiene. But obviously, from a
maintenance POV life would be much easier if we didn't have to.

FWIF, I have been doing tests in an environment constrained to 2G and
haven't had any issues with premature OOMs. But I'm happy to test
other situations and workloads that might be of interest to people.

> > Reclaim and compaction are changed from partial block reclaim to
> > producing whole neutral page blocks.
> 
> How does it affect allocation latencies? I see direct compact stall grew
> substantially. Hm?

Good question.

There are 260 more compact stalls but also 1,734 more successful THP
allocations. And 1,433 fewer allocation stalls. There seems to be much
less direct work performed per successful allocation.

But of course, that's not the whole story. Let me trace the actual
latencies.

Thanks for your thoughts!
Johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ