linux-kernel - [PATCH v3] mm: memory: move mem_cgroup_charge() into alloc_anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20240122011612.501029-1-wangkefeng.wang@huawei.com>
Date: Mon, 22 Jan 2024 09:16:12 +0800
From: Kefeng Wang <wangkefeng.wang@...wei.com>
To: Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>
CC: <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>, David
 Hildenbrand <david@...hat.com>, Michal Hocko <mhocko@...e.com>, Roman
 Gushchin <roman.gushchin@...ux.dev>, Johannes Weiner <hannes@...xchg.org>,
	Shakeel Butt <shakeelb@...gle.com>, Muchun Song <songmuchun@...edance.com>,
	Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: [PATCH v3] mm: memory: move mem_cgroup_charge() into alloc_anon_folio()

The GFP flags from vma_thp_gfp_mask() according to user configuration
only used for large folio allocation but not for memory cgroup charge,
and GFP_KERNEL is used for both order-0 and large order folio when memory
cgroup charge at present. However, mem_cgroup_charge() uses the GFP
flags in a fairly sophisticated way. In addition to checking
gfpflags_allow_blocking(), it pays attention to __GFP_NORETRY and
__GFP_RETRY_MAYFAIL to ensure that processes within this memcg do not
exceed their quotas.

So we'd better to move mem_cgroup_charge() into alloc_anon_folio(),
1) it will make us to allocate as much as possible large order folio,
because we could try the next order if mem_cgroup_charge() fails,
although the memcg's memory usage is close to its limits.
2) using same GFP flags for allocation and charge is to be consistent
with PMD THP firstly, in addition, according to GFP flag returned from
vma_thp_gfp_mask(), GFP_TRANSHUGE_LIGHT could make us skip direct reclaim,
_GFP_NORETRY will make us skip mem_cgroup_oom() and won't trigger memory
cgroup oom from large order(order <= COSTLY_ORDER) folio charging.

Reviewed-by: Ryan Roberts <ryan.roberts@....com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@...wei.com>
---
v3:
- update changelog suggested by Michal Hocko
- add RB from Ryan
v2:
- fix built when !CONFIG_TRANSPARENT_HUGEPAGE
- update changelog suggested by Matthew Wilcox

 mm/memory.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5e88d5379127..551f0b21bc42 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4153,8 +4153,8 @@ static bool pte_range_none(pte_t *pte, int nr_pages)
 
 static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 {
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	struct vm_area_struct *vma = vmf->vma;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	unsigned long orders;
 	struct folio *folio;
 	unsigned long addr;
@@ -4206,15 +4206,21 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr, true);
 		if (folio) {
+			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
+				folio_put(folio);
+				goto next;
+			}
+			folio_throttle_swaprate(folio, gfp);
 			clear_huge_page(&folio->page, vmf->address, 1 << order);
 			return folio;
 		}
+next:
 		order = next_order(&orders, order);
 	}
 
 fallback:
 #endif
-	return vma_alloc_zeroed_movable_folio(vmf->vma, vmf->address);
+	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
 }
 
 /*
@@ -4281,10 +4287,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	nr_pages = folio_nr_pages(folio);
 	addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
 
-	if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL))
-		goto oom_free_page;
-	folio_throttle_swaprate(folio, GFP_KERNEL);
-
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
 	 * preceding stores to the page contents become visible before
@@ -4338,8 +4340,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 release:
 	folio_put(folio);
 	goto unlock;
-oom_free_page:
-	folio_put(folio);
 oom:
 	return VM_FAULT_OOM;
 }
-- 
2.27.0