linux-kernel - Re: [PATCH 4/7] mm/hugetlb: Clean up map/global resv accounting when allocate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z3gSXYQaDhcCOIcJ@x1n>
Date: Fri, 3 Jan 2025 11:37:49 -0500
From: Peter Xu <peterx@...hat.com>
To: Ackerley Tng <ackerleytng@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, riel@...riel.com,
	leitao@...ian.org, akpm@...ux-foundation.org, muchun.song@...ux.dev,
	osalvador@...e.de, roman.gushchin@...ux.dev,
	nao.horiguchi@...il.com
Subject: Re: [PATCH 4/7] mm/hugetlb: Clean up map/global resv accounting when
 allocate

On Sat, Dec 28, 2024 at 12:06:34AM +0000, Ackerley Tng wrote:
> >
> > -	/* If this allocation is not consuming a reservation, charge it now.
> > +	/*
> > +	 * If this allocation is not consuming a per-vma reservation,
> > +	 * charge the hugetlb cgroup now.
> >  	 */
> > -	deferred_reserve = map_chg || cow_from_owner;
> > -	if (deferred_reserve) {
> > +	if (map_chg) {
> >  		ret = hugetlb_cgroup_charge_cgroup_rsvd(
> >  			idx, pages_per_huge_page(h), &h_cg);
> 
> Should hugetlb_cgroup_charge_cgroup_rsvd() be called when map_chg == MAP_CHG_ENFORCED?

This looks like a pretty niche use case, though I would say yes.

I don't think I take a lot of consideration here when drafting the patch,
as the change here should have kept the old behavior: map_chg grows into
the tristate so that we can drop deferred_reserve, OTOH nothing should
change from such behavior of cgroup charging.

When it happens, it means the owner process CoWed a private hugetlb folio
which will enforce bypassing the vma reservation.  Here bypassing the vma
check makes sense to me, because the new to-be-cowed folio X will replace
another folio Y, which should have consumed the private vma resv at this
specific index. So there's no way the to-be-cowed folio X can have anything
to do with the vma reservation..

Besides the vma reservation, I don't see why this folio allocation needs to
be any more special. IOW, it should still go through all rest checks and
fail the process properly if the check fails, that should include any form
of cgroups (either hugetlb or memcg), IMHO.

Do you have any specific thought on this path?

Thanks,

-- 
Peter Xu