lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <951a8d96-ecdf-7ca4-ec7a-e1c5eba8bce3@arm.com>
Date:   Wed, 2 Aug 2023 10:04:56 +0100
From:   Ryan Roberts <ryan.roberts@....com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        David Hildenbrand <david@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v4 2/5] mm: LARGE_ANON_FOLIO for improved performance

On 02/08/2023 09:02, Ryan Roberts wrote:
...

>>>
>>> I've captured run time and peak memory usage, and taken the mean. The stdev for
>>> the peak memory usage is big-ish, but I'm confident this still captures the
>>> central tendancy well:
>>>
>>> | MAX_ORDER_UNHINTED |   real-time |   kern-time |   user-time | peak memory |
>>> |:-------------------|------------:|------------:|------------:|:------------|
>>> | 4k                 |        0.0% |        0.0% |        0.0% |        0.0% |
>>> | 16k                |       -3.6% |      -26.5% |       -0.5% |       -0.1% |
>>> | 32k                |       -4.8% |      -37.4% |       -0.6% |       -0.1% |
>>> | 64k                |       -5.7% |      -42.0% |       -0.6% |       -1.1% |
>>> | 128k               |       -5.6% |      -42.1% |       -0.7% |        1.4% |
>>> | 256k               |       -4.9% |      -41.9% |       -0.4% |        1.9% |
>>>
>>> 64K looks like the clear sweet spot to me.

I'm sorry about this; I've concluded that these tests are flawed. While I'm
correctly setting the MAX_ORDER_UNHINTED value in each case, this is run against
a 4K base page kernel, which means that it's arch_wants_pte_order() return value
is order-4. So for MAX_ORDER_UNHINTED = {64k, 128k, 256k}, the actual order used
is order-4 (=64K):

	order = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER);

	if (!hugepage_vma_check(vma, vma->vm_flags, false, true, true))
		order = min(order, ANON_FOLIO_MAX_ORDER_UNHINTED);

So while I think we can conclude that the performance improves from 4k -> 64k,
and the peak memory is about the same, we can't conclude that 64k is definely
where performance gains peak or that peak memory increases after this.

The error bars on the memory consumption are fairly big.

I'll rework the tests so that I'm actually measuring what I was intending to
measure and repost in due course.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ