lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkpNrtCfzuWFXWO-3Rwa62YibKYGBFuj0Wek_gO+nd7jMg@mail.gmail.com>
Date:   Wed, 1 Nov 2023 11:11:24 -0700
From:   Yang Shi <shy828301@...il.com>
To:     Ryan Roberts <ryan.roberts@....com>
Cc:     David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory

On Wed, Nov 1, 2023 at 7:02 AM Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 31/10/2023 18:29, Yang Shi wrote:
> > On Tue, Oct 31, 2023 at 4:55 AM Ryan Roberts <ryan.roberts@....com> wrote:
> >>
> >> On 31/10/2023 11:50, Ryan Roberts wrote:
> >>> On 06/10/2023 21:06, David Hildenbrand wrote:
> >>> [...]
> >>>>
> >>>> Change 2: sysfs interface.
> >>>>
> >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I
> >>>> agree.
> >>>>
> >>>> What we expose there and how, is TBD. Again, not a friend of "orders" and
> >>>> bitmaps at all. We can do better if we want to go down that path.
> >>>>
> >>>> Maybe we should take a look at hugetlb, and how they added support for multiple
> >>>> sizes. What *might* make sense could be (depending on which values we actually
> >>>> support!)
> >>>>
> >>>>
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/
> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/
> >>>>
> >>>> Each one would contain an "enabled" and "defrag" file. We want something minimal
> >>>> first? Start with the "enabled" option.
> >>>>
> >>>>
> >>>> enabled: always [global] madvise never
> >>>>
> >>>> Initially, we would set it for PMD-sized THP to "global" and for everything else
> >>>> to "never".
> >>>
> >>> Hi David,
> >>>
> >>> I've just started coding this, and it occurs to me that I might need a small
> >>> clarification here; the existing global "enabled" control is used to drive
> >>> decisions for both anonymous memory and (non-shmem) file-backed memory. But the
> >>> proposed new per-size "enabled" is implicitly only controlling anon memory (for
> >>> now).
> >>>
> >>> 1) Is this potentially confusing for the user? Should we rename the per-size
> >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so
> >>> we can reuse the same control for file-backed memory in future?
> >>>
> >>> 2) The global control will continue to drive the file-backed memory decision
> >>> (for now), even when hugepages-2048kB/enabled != "global"; agreed?
> >>>
> >>> Thanks,
> >>> Ryan
> >>>
> >>
> >> Also, an implementation question:
> >>
> >> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs
> >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true
> >> regardless. Is that by design? It couldn't fathom any reasoning from the commit log:
> >
> > The enabled="never" is for anonymous VMAs, DAX VMAs are typically file VMAs.
>
> That's not quite true; enabled="never" is honoured for non-DAX/non-shmem file
> VMAs (for collapse via CONFIG_READ_ONLY_THP_FOR_FS and more recently for

When implementing READ_ONLY_THP_FOR_FS the file THP just can be
collapsed by khugepaged, but khugepaged is started iff enabled !=
"never". So READ_ONLY_THP_FOR_FS has to honor it. Unfortunately there
are some confusing exceptions... But anyway DAX is not the same class.

> anything that implements huge_fault() - see
> 7a81751fcdeb833acc858e59082688e3020bfe12).

IIUC this commit just gives the vmas which implement huge_fault() a
chance to handle the fault. Currently just DAX vmas implement
huge_fault() in vanilla kernel AFAICT.

>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ