lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZKcW/Zij1HZB0tmf@casper.infradead.org>
Date:   Thu, 6 Jul 2023 20:33:17 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Ryan Roberts <ryan.roberts@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Yin Fengwei <fengwei.yin@...el.com>,
        David Hildenbrand <david@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v2 3/5] mm: Default implementation of
 arch_wants_pte_order()

On Tue, Jul 04, 2023 at 08:07:19PM -0600, Yu Zhao wrote:
> >  - On arm64 when the process has marked the VMA for THP (or when
> > transparent_hugepage=always) but the VMA does not meet the requirements for a
> > PMD-sized mapping (or we failed to allocate, ...) then I'd like to map using
> > contpte. For 4K base pages this is 64K (order-4), for 16K this is 2M (order-7)
> > and for 64K this is 2M (order-5). The 64K base page case is very important since
> > the PMD size for that base page is 512MB which is almost impossible to allocate
> > in practice.
> 
> Which case (server or client) are you focusing on here? For our client
> devices, I can confidently say that 64KB has to be after 16KB, if it
> happens at all. For servers in general, I don't know of any major
> memory-intensive workloads that are not THP-aware, i.e., I don't think
> "VMA does not meet the requirements" is a concern.

It sounds like you've done some measurements, and I'd like to understand
those a bit better.  There are a number of factors involved:

 - A larger page size shrinks the length of the LRU list, so systems
   which see heavy LRU lock contention benefit more
 - A larger page size has more internal fragmentation, so we run out of
   memory and have to do reclaim more often (and maybe workload which
   used to fit in DRAM now do not)
(probably others; i'm not at 100% right now)

I think concerns about "allocating lots of order-2 folios makes it harder
to allocate order-4 folios" are _probably_ not warranted (without data
to prove otherwise).  All anonymous memory is movable, so our compaction
code should be able to create larger order folios.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ