linux-kernel - Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHbLzkp6BW5sHiG5aFkOz0yq-5ZvbE0AhFYN4YEN4Mkx0koZeQ@mail.gmail.com>
Date:   Mon, 21 Aug 2023 15:59:13 -0700
From:   Yang Shi <shy828301@...il.com>
To:     "Zach O'Keefe" <zokeefe@...gle.com>
Cc:     Saurabh Singh Sengar <ssengar@...rosoft.com>,
        Matthew Wilcox <willy@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

On Mon, Aug 21, 2023 at 8:09 AM Zach O'Keefe <zokeefe@...gle.com> wrote:
>
> On Fri, Aug 18, 2023 at 2:21 PM Yang Shi <shy828301@...il.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 11:29 AM Zach O'Keefe <zokeefe@...gle.com> wrote:
> > >
> > > On Thu, Aug 17, 2023 at 10:47 AM Yang Shi <shy828301@...il.com> wrote:
> > > >
> > > > On Wed, Aug 16, 2023 at 2:48 PM Zach O'Keefe <zokeefe@...gle.com> wrote:
> > > > >
> > > > > > We have a out of tree driver that maps huge pages through a file handle and
> > > > > > relies on -> huge_fault. It used to work in 5.19 kernels but 6.1 changed this
> > > > > > behaviour.
> > > > > >
> > > > > > I don’t think reverting the earlier behaviour of fault_path for huge pages should
> > > > > > impact kernel negatively.
> > > > > >
> > > > > > Do you think we can restore this earlier behaviour of kernel to allow page fault
> > > > > > for huge pages via ->huge_fault.
> > > > >
> > > > > That seems reasonable to me. I think using the existence of a
> > > > > ->huge_fault() handler as a predicate to return "true" makes sense to
> > > > > me. The "normal" flow for file-backed memory along fault path still
> > > > > needs to return "false", so that we correctly fallback to ->fault()
> > > > > handler. Unless there are objections, I can do that in a v2.
> > > >
> > > > Sorry for chiming in late. I'm just back from vacation and trying to catch up...
> > > >
> > > > IIUC the out-of-tree driver tries to allocate huge page and install
> > > > PMD mapping via huge_fault() handler, but the cleanup of
> > > > hugepage_vma_check() prevents this due to the check to
> > > > VM_NO_KHUGEPAGED?
> > > >
> > > > So you would like to check whether a huge_fault() handler existed
> > > > instead of vma_is_dax()?
> > >
> > > Sorry for the multiple threads here. There are two problems: (a) the
> > > VM_NO_KHUGEPAGED check along fault path, and (b) we don't give
> > > ->huge_fault() a fair shake, if it exists, along fault path. The
> > > current code assumes vma_is_dax() iff ->huge_fault() exists.
> > >
> > > (a) is easy enough to fix. For (b), I'm currently looking at the
> > > possibility of not worrying about ->huge_fault() in
> > > hugepage_vma_check(), and just letting create_huge_pud() /
> > > create_huge_pmd() check and fallback as necessary. I think we'll need
> > > the explicit DAX check still, since we want to keep khugepaged and
> > > MADV_COLLAPSE away, and the presence / absence of ->huge_fault() isn't
> > > enough to know that (well.. today it kind of is, but we shouldn't
> > > depend on it).
> >
> > You meant something like:
> >
> > if (vma->vm_ops->huge_fault) {
> >     if (vma_is_dax(vma))
> >         return in_pf;
> >
> >     /Fall through */
> > }
>
> I don't think this will work for Saurabh's case, since IIUC, they
> aren't using dax, but are using VM_HUGEPAGE|VM_MIXEDMAP, faulted in
> using ->huge_fault()
>
> The old (v5.19) fault path looked like:
>
> static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
>                                           unsigned long vm_flags)
> {
>         /* Explicitly disabled through madvise. */
>         if ((vm_flags & VM_NOHUGEPAGE) ||
>             test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>                 return false;
>         return true;
> }
>
> /*
>  * to be used on vmas which are known to support THP.
>  * Use transparent_hugepage_active otherwise
>  */
> static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> {
>
>         /*
>          * If the hardware/firmware marked hugepage support disabled.
>          */
>         if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
>                 return false;
>
>         if (!transhuge_vma_enabled(vma, vma->vm_flags))
>                 return false;
>
>         if (vma_is_temporary_stack(vma))
>                 return false;
>
>         if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
>                 return true;
>
>         if (vma_is_dax(vma))
>                 return true;
>
>         if (transparent_hugepage_flags &
>                                 (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
>                 return !!(vma->vm_flags & VM_HUGEPAGE);
>
>         return false;
> }
>
> For non-anonymous, the next check (in create_huge_*) would be for that
> ->huge_fault handler, falling back as necessary if it didn't exist.

Yeah, you are right. I just replied to your v2 patch.

>
> The patch I sent out last week[1] somewhat restores this logic -- the
> only difference being we do the check for ->huge_fault in
> hugepage_vma_check() as well. This is so smaps can surface this
> possibility with some accuracy. I just realized it will erroneously
> return "true" for the collapse path, however..
>
> Maybe Matthew was right about unifying everything here :P That's 2
> mistakes I've made in trying to fix this issue (but maybe that's just
> me).

IMHO, no rush on fixing it.

>
> [1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/