lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8adbde1c-970b-4a26-81b0-91b913c4850b@arm.com>
Date:   Tue, 5 Dec 2023 09:50:14 +0000
From:   Ryan Roberts <ryan.roberts@....com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        David Hildenbrand <david@...hat.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Alistair Popple <apopple@...dia.com>, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs
 interface

On 05/12/2023 04:21, Barry Song wrote:
> On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> In preparation for adding support for anonymous multi-size THP,
>> introduce new sysfs structure that will be used to control the new
>> behaviours. A new directory is added under transparent_hugepage for each
>> supported THP size, and contains an `enabled` file, which can be set to
>> "inherit" (to inherit the global setting), "always", "madvise" or
>> "never". For now, the kernel still only supports PMD-sized anonymous
>> THP, so only 1 directory is populated.
>>
>> The first half of the change converts transhuge_vma_suitable() and
>> hugepage_vma_check() so that they take a bitfield of orders for which
>> the user wants to determine support, and the functions filter out all
>> the orders that can't be supported, given the current sysfs
>> configuration and the VMA dimensions. If there is only 1 order set in
>> the input then the output can continue to be treated like a boolean;
>> this is the case for most call sites. The resulting functions are
>> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders()
>> respectively.
>>
>> The second half of the change implements the new sysfs interface. It has
>> been done so that each supported THP size has a `struct thpsize`, which
>> describes the relevant metadata and is itself a kobject. This is pretty
>> minimal for now, but should make it easy to add new per-thpsize files to
>> the interface if needed in future (e.g. per-size defrag). Rather than
>> keep the `enabled` state directly in the struct thpsize, I've elected to
>> directly encode it into huge_anon_orders_[always|madvise|inherit]
>> bitfields since this reduces the amount of work required in
>> thp_vma_allowable_orders() which is called for every page fault.
>>
>> See Documentation/admin-guide/mm/transhuge.rst, as modified by this
>> commit, for details of how the new sysfs interface works.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
> 
> Reviewed-by: Barry Song <v-songbaohua@...o.com>

Thanks!

> 
>> -khugepaged will be automatically started when
>> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll
>> -be automatically shutdown if it's set to "never".
>> +khugepaged will be automatically started when one or more hugepage
>> +sizes are enabled (either by directly setting "always" or "madvise",
>> +or by setting "inherit" while the top-level enabled is set to "always"
>> +or "madvise"), and it'll be automatically shutdown when the last
>> +hugepage size is disabled (either by directly setting "never", or by
>> +setting "inherit" while the top-level enabled is set to "never").
>>
>>  Khugepaged controls
>>  -------------------
>>
>> +.. note::
>> +   khugepaged currently only searches for opportunities to collapse to
>> +   PMD-sized THP and no attempt is made to collapse to other THP
>> +   sizes.
> 
> For small-size THP, collapse is probably a bad idea. we like a one-shot
> try in Android especially we are using a 64KB and less large folio size. if
> PF succeeds in getting large folios, we map large folios, otherwise we
> give up as those memories can be quite unstably swapped-out, swapped-in
> and madvised to be DONTNEED.
> 
> too many compactions will increase power consumption and decrease UI
> response.

Understood; that's very useful information for the Android context. Multiple
people have made comments about eventually needing khugepaged (or something
similar) support in the server context though to async collapse to contpte size.
Actually one suggestion was a user space daemon that scans and collapses with
MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for
is flexible and can be enabled/disabled/configured for the different environments.

> 
> Thanks
> Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ