lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcBUCZ+UGsE-9xHzgi0nmzcbzt_oKQWxP8=PJyp0W+iD1A@mail.gmail.com>
Date: Sat, 7 Jun 2025 06:55:57 -0600
From: Nico Pache <npache@...hat.com>
To: Dev Jain <dev.jain@....com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	david@...hat.com, ziy@...dia.com, baolin.wang@...ux.alibaba.com, 
	lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, ryan.roberts@....com, 
	corbet@....net, rostedt@...dmis.org, mhiramat@...nel.org, 
	mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org, baohua@...nel.org, 
	willy@...radead.org, peterx@...hat.com, wangkefeng.wang@...wei.com, 
	usamaarif642@...il.com, sunnanyong@...wei.com, vishal.moola@...il.com, 
	thomas.hellstrom@...ux.intel.com, yang@...amperecomputing.com, 
	kirill.shutemov@...ux.intel.com, aarcange@...hat.com, raquini@...hat.com, 
	anshuman.khandual@....com, catalin.marinas@....com, tiwai@...e.de, 
	will@...nel.org, dave.hansen@...ux.intel.com, jack@...e.cz, cl@...two.org, 
	jglisse@...gle.com, surenb@...gle.com, zokeefe@...gle.com, hannes@...xchg.org, 
	rientjes@...gle.com, mhocko@...e.com, rdunlap@...radead.org
Subject: Re: [PATCH v7 07/12] khugepaged: add mTHP support

On Sat, Jun 7, 2025 at 12:24 AM Dev Jain <dev.jain@....com> wrote:
>
>
> On 15/05/25 8:52 am, Nico Pache wrote:
> > Introduce the ability for khugepaged to collapse to different mTHP sizes.
> > While scanning PMD ranges for potential collapse candidates, keep track
> > of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit
> > represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If
> > mTHPs are enabled we remove the restriction of max_ptes_none during the
> > scan phase so we dont bailout early and miss potential mTHP candidates.
> >
> > After the scan is complete we will perform binary recursion on the
> > bitmap to determine which mTHP size would be most efficient to collapse
> > to. max_ptes_none will be scaled by the attempted collapse order to
> > determine how full a THP must be to be eligible.
> >
> > If a mTHP collapse is attempted, but contains swapped out, or shared
> > pages, we dont perform the collapse.
> >
> > For non PMD collapse we much leave the anon VMA write locked until after
> > we collapse the mTHP
>
> Why? I know that Hugh pointed out locking errors; I am yet to catch up
> on that thread, but you need to explain in the description why you do
> what you do.
I will add a better description in the next version. The reasoning is
that in the PMD case all the pages are isolated, but in the non-PMD
case this is not true, and we must keep the lock to prevent changes
from occurring after we unlock it.

Another potential solution is to isolate all the pages in the PMD,
then undo it after we collapse the mTHP.

-- Nico
>
> [--snip---]
>
> >
> > -
> > -     spin_lock(pmd_ptl);
> > -     BUG_ON(!pmd_none(*pmd));
> > -     folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
> > -     folio_add_lru_vma(folio, vma);
> > -     pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > -     set_pmd_at(mm, address, pmd, _pmd);
> > -     update_mmu_cache_pmd(vma, address, pmd);
> > -     deferred_split_folio(folio, false);
> > -     spin_unlock(pmd_ptl);
> > +     if (order == HPAGE_PMD_ORDER) {
> > +             pgtable = pmd_pgtable(_pmd);
> > +             _pmd = folio_mk_pmd(folio, vma->vm_page_prot);
> > +             _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
> > +
> > +             spin_lock(pmd_ptl);
> > +             BUG_ON(!pmd_none(*pmd));
> > +             folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE);
> > +             folio_add_lru_vma(folio, vma);
> > +             pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > +             set_pmd_at(mm, address, pmd, _pmd);
> > +             update_mmu_cache_pmd(vma, address, pmd);
> > +             deferred_split_folio(folio, false);
> > +             spin_unlock(pmd_ptl);
> > +     } else { /* mTHP collapse */
> > +             mthp_pte = mk_pte(&folio->page, vma->vm_page_prot);
> > +             mthp_pte = maybe_mkwrite(pte_mkdirty(mthp_pte), vma);
> > +
> > +             spin_lock(pmd_ptl);
>
> Nico,
>
> I've noticed a few occasions where my review comments have not been acknowledged -
> for example, [1]. It makes it difficult to follow up and contributes to some
> frustration on my end. I'd appreciate if you could make sure to respond to
> feedback, even if you are disagreeing with my comments. Thanks!
>
>
> [1] https://lore.kernel.org/all/08d13445-5ed1-42ea-8aee-c1dbde24407e@arm.com/
>
>
> [---snip---]
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ