linux-kernel - Re: [PATCH hotfix v2 2/2] mm/thp: fix deferred split unqueue naming and locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c47c355f-fc2a-d8f4-4c8d-4a7a1468f0b2@google.com>
Date: Mon, 28 Oct 2024 10:19:39 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: David Hildenbrand <david@...hat.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
    Usama Arif <usamaarif642@...il.com>, Yang Shi <shy828301@...il.com>, 
    Wei Yang <richard.weiyang@...il.com>, 
    "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, 
    Matthew Wilcox <willy@...radead.org>, Johannes Weiner <hannes@...xchg.org>, 
    Baolin Wang <baolin.wang@...ux.alibaba.com>, 
    Barry Song <baohua@...nel.org>, Kefeng Wang <wangkefeng.wang@...wei.com>, 
    Ryan Roberts <ryan.roberts@....com>, Nhat Pham <nphamcs@...il.com>, 
    Zi Yan <ziy@...dia.com>, Chris Li <chrisl@...nel.org>, 
    Shakeel Butt <shakeel.butt@...ux.dev>, linux-kernel@...r.kernel.org, 
    linux-mm@...ck.org
Subject: Re: [PATCH hotfix v2 2/2] mm/thp: fix deferred split unqueue naming
 and locking

On Mon, 28 Oct 2024, David Hildenbrand wrote:

> Hi Hugh,
> 
> mostly looks good to me, one comment:

Thanks...

> 
> > +++ b/mm/memcontrol-v1.c
> > @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio,
> >    css_get(&to->css);
> >    css_put(&from->css);
> >   +	/* Warning should never happen, so don't worry about refcount non-0 */
> > +	WARN_ON_ONCE(folio_unqueue_deferred_split(folio));
> >    folio->memcg_data = (unsigned long)to;
> >   
> >   	__folio_memcg_unlock(from);
> > @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t
> > *pmd,
> >    enum mc_target_type target_type;
> >    union mc_target target;
> >    struct folio *folio;
> > +	bool tried_split_before = false;
> >   +retry_pmd:
> >    ptl = pmd_trans_huge_lock(pmd, vma);
> >    if (ptl) {
> >   		if (mc.precharge < HPAGE_PMD_NR) {
> > @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t
> > *pmd,
> >     target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> >     if (target_type == MC_TARGET_PAGE) {
> >   			folio = target.folio;
> > +			/*
> > +			 * Deferred split queue locking depends on memcg,
> > +			 * and unqueue is unsafe unless folio refcount is 0:
> > +			 * split or skip if on the queue? first try to split.
> > +			 */
> > +			if (!list_empty(&folio->_deferred_list)) {
> > +				spin_unlock(ptl);
> > +				if (!tried_split_before)
> > +					split_folio(folio);
> > +				folio_unlock(folio);
> > +				folio_put(folio);
> > +				if (tried_split_before)
> > +					return 0;
> > +				tried_split_before = true;
> > +				goto retry_pmd;
> > +			}
> > +			/*
> > +			 * So long as that pmd lock is held, the folio cannot
> > +			 * be racily added to the _deferred_list, because
> > +			 * __folio_remove_rmap() will find !partially_mapped.
> > +			 */
> 
> Fortunately that code is getting ripped out.

Yes, and even more fortunately, we're in time to fix its final incarnation!

> 
> https://lkml.kernel.org/r/20241025012304.2473312-3-shakeel.butt@linux.dev
> 
> So I wonder ... as a quick fix should we simply handle it like the code
> further down where we refuse PTE-mapped large folios completely?

(I went through the same anxiety attack as you did, wondering what
happens to the large-but-not-PMD-large folios: then noticed it's safe
as you did.  The v1 commit message had a paragraph pondering whether
the deprecated code will need a patch to extend it for the new feature:
but once Shakeel posted the ripout, I ripped out that paragraph -
no longer any need for an answer.)

> 
> "ignore such a partial THP and keep it in original memcg"
> 
> ...
> 
> and simply skip this folio similarly? I mean, it's a corner case either way.

I certainly considered that option: it's known to give up like that
for many reasons.  But my thinking (in the commit message) was "Not ideal,
but moving charge has been requested, and khugepaged should repair the THP
later" - if someone is still using move_charge_at_immigrate, I thought
this change would generate fewer surprises - that huge charge likely
to be moved as it used to be.

Hugh