lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <075826b4-2df8-4460-a8f2-c0581d098cff@arm.com>
Date:   Tue, 5 Dec 2023 10:50:15 +0000
From:   Ryan Roberts <ryan.roberts@....com>
To:     David Hildenbrand <david@...hat.com>,
        Barry Song <21cnbao@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Alistair Popple <apopple@...dia.com>, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs
 interface

On 05/12/2023 09:57, David Hildenbrand wrote:
> On 05.12.23 10:50, Ryan Roberts wrote:
>> On 05/12/2023 04:21, Barry Song wrote:
>>> On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>>>
>>>> In preparation for adding support for anonymous multi-size THP,
>>>> introduce new sysfs structure that will be used to control the new
>>>> behaviours. A new directory is added under transparent_hugepage for each
>>>> supported THP size, and contains an `enabled` file, which can be set to
>>>> "inherit" (to inherit the global setting), "always", "madvise" or
>>>> "never". For now, the kernel still only supports PMD-sized anonymous
>>>> THP, so only 1 directory is populated.
>>>>
>>>> The first half of the change converts transhuge_vma_suitable() and
>>>> hugepage_vma_check() so that they take a bitfield of orders for which
>>>> the user wants to determine support, and the functions filter out all
>>>> the orders that can't be supported, given the current sysfs
>>>> configuration and the VMA dimensions. If there is only 1 order set in
>>>> the input then the output can continue to be treated like a boolean;
>>>> this is the case for most call sites. The resulting functions are
>>>> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders()
>>>> respectively.
>>>>
>>>> The second half of the change implements the new sysfs interface. It has
>>>> been done so that each supported THP size has a `struct thpsize`, which
>>>> describes the relevant metadata and is itself a kobject. This is pretty
>>>> minimal for now, but should make it easy to add new per-thpsize files to
>>>> the interface if needed in future (e.g. per-size defrag). Rather than
>>>> keep the `enabled` state directly in the struct thpsize, I've elected to
>>>> directly encode it into huge_anon_orders_[always|madvise|inherit]
>>>> bitfields since this reduces the amount of work required in
>>>> thp_vma_allowable_orders() which is called for every page fault.
>>>>
>>>> See Documentation/admin-guide/mm/transhuge.rst, as modified by this
>>>> commit, for details of how the new sysfs interface works.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>>>
>>> Reviewed-by: Barry Song <v-songbaohua@...o.com>
>>
>> Thanks!
>>
>>>
>>>> -khugepaged will be automatically started when
>>>> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll
>>>> -be automatically shutdown if it's set to "never".
>>>> +khugepaged will be automatically started when one or more hugepage
>>>> +sizes are enabled (either by directly setting "always" or "madvise",
>>>> +or by setting "inherit" while the top-level enabled is set to "always"
>>>> +or "madvise"), and it'll be automatically shutdown when the last
>>>> +hugepage size is disabled (either by directly setting "never", or by
>>>> +setting "inherit" while the top-level enabled is set to "never").
>>>>
>>>>   Khugepaged controls
>>>>   -------------------
>>>>
>>>> +.. note::
>>>> +   khugepaged currently only searches for opportunities to collapse to
>>>> +   PMD-sized THP and no attempt is made to collapse to other THP
>>>> +   sizes.
>>>
>>> For small-size THP, collapse is probably a bad idea. we like a one-shot
>>> try in Android especially we are using a 64KB and less large folio size. if
>>> PF succeeds in getting large folios, we map large folios, otherwise we
>>> give up as those memories can be quite unstably swapped-out, swapped-in
>>> and madvised to be DONTNEED.
>>>
>>> too many compactions will increase power consumption and decrease UI
>>> response.
>>
>> Understood; that's very useful information for the Android context. Multiple
>> people have made comments about eventually needing khugepaged (or something
>> similar) support in the server context though to async collapse to contpte size.
>> Actually one suggestion was a user space daemon that scans and collapses with
>> MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for
>> is flexible and can be enabled/disabled/configured for the different
>> environments.
> 
> There certainly is interest for 2 MiB THP on arm64 64k where the THP size would
> normally be 512 MiB. In that scenario, khugepaged makes perfect sense.

Indeed

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ