linux-kernel - Re: [PATCH hotfix v2 2/2] mm/thp: fix deferred split unqueue naming and locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <154430c4-7b17-443f-8628-ef3bb7738ae9@redhat.com>
Date: Mon, 28 Oct 2024 11:43:32 +0100
From: David Hildenbrand <david@...hat.com>
To: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>
Cc: Usama Arif <usamaarif642@...il.com>, Yang Shi <shy828301@...il.com>,
 Wei Yang <richard.weiyang@...il.com>,
 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
 Matthew Wilcox <willy@...radead.org>, Johannes Weiner <hannes@...xchg.org>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, Barry Song <baohua@...nel.org>,
 Kefeng Wang <wangkefeng.wang@...wei.com>, Ryan Roberts
 <ryan.roberts@....com>, Nhat Pham <nphamcs@...il.com>,
 Zi Yan <ziy@...dia.com>, Chris Li <chrisl@...nel.org>,
 Shakeel Butt <shakeel.butt@...ux.dev>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [PATCH hotfix v2 2/2] mm/thp: fix deferred split unqueue naming
 and locking

Hi Hugh,

mostly looks good to me, one comment:

> +++ b/mm/memcontrol-v1.c
> @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struct folio *folio,
>   	css_get(&to->css);
>   	css_put(&from->css);
>   
> +	/* Warning should never happen, so don't worry about refcount non-0 */
> +	WARN_ON_ONCE(folio_unqueue_deferred_split(folio));
>   	folio->memcg_data = (unsigned long)to;
>   
>   	__folio_memcg_unlock(from);
> @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>   	enum mc_target_type target_type;
>   	union mc_target target;
>   	struct folio *folio;
> +	bool tried_split_before = false;
>   
> +retry_pmd:
>   	ptl = pmd_trans_huge_lock(pmd, vma);
>   	if (ptl) {
>   		if (mc.precharge < HPAGE_PMD_NR) {
> @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>   		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
>   		if (target_type == MC_TARGET_PAGE) {
>   			folio = target.folio;
> +			/*
> +			 * Deferred split queue locking depends on memcg,
> +			 * and unqueue is unsafe unless folio refcount is 0:
> +			 * split or skip if on the queue? first try to split.
> +			 */
> +			if (!list_empty(&folio->_deferred_list)) {
> +				spin_unlock(ptl);
> +				if (!tried_split_before)
> +					split_folio(folio);
> +				folio_unlock(folio);
> +				folio_put(folio);
> +				if (tried_split_before)
> +					return 0;
> +				tried_split_before = true;
> +				goto retry_pmd;
> +			}
> +			/*
> +			 * So long as that pmd lock is held, the folio cannot
> +			 * be racily added to the _deferred_list, because
> +			 * __folio_remove_rmap() will find !partially_mapped.
> +			 */

Fortunately that code is getting ripped out.

https://lkml.kernel.org/r/20241025012304.2473312-3-shakeel.butt@linux.dev

So I wonder ... as a quick fix should we simply handle it like the code 
further down where we refuse PTE-mapped large folios completely?

"ignore such a partial THP and keep it in original memcg"

...

and simply skip this folio similarly? I mean, it's a corner case either way.

-- 
Cheers,

David / dhildenb