linux-kernel - Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZRvcOV0+wkYRuGEh@dhcp22.suse.cz>
Date:   Tue, 3 Oct 2023 11:17:45 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Nhat Pham <nphamcs@...il.com>, akpm@...ux-foundation.org,
        riel@...riel.com, roman.gushchin@...ux.dev, shakeelb@...gle.com,
        muchun.song@...ux.dev, tj@...nel.org, lizefan.x@...edance.com,
        shuah@...nel.org, mike.kravetz@...cle.com, yosryahmed@...gle.com,
        linux-mm@...ck.org, kernel-team@...a.com,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in
 memory controller

On Mon 02-10-23 11:25:55, Johannes Weiner wrote:
> On Mon, Oct 02, 2023 at 05:08:34PM +0200, Michal Hocko wrote:
> > On Mon 02-10-23 10:50:26, Johannes Weiner wrote:
> > > On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote:
> > > > On Wed 27-09-23 17:57:22, Nhat Pham wrote:
> > [...]
> > > > - memcg limit reclaim doesn't assist hugetlb pages allocation when
> > > >   hugetlb overcommit is configured (i.e. pages are not consumed from the
> > > >   pool) which means that the page allocation might disrupt workloads
> > > >   from other memcgs.
> > > > - failure to charge a hugetlb page results in SIGBUS rather
> > > >   than memcg oom killer. That could be the case even if the
> > > >   hugetlb pool still has pages available and there is
> > > >   reclaimable memory in the memcg.
> > > 
> > > Are these actually true? AFAICS, regardless of whether the page comes
> > > from the pool or the buddy allocator, the memcg code will go through
> > > the regular charge path, attempt reclaim, and OOM if that fails.
> > 
> > OK, I should have been more explicit. Let me expand. Charges are
> > accounted only _after_ the actual allocation is done. So the actual
> > allocation is not constrained by the memcg context. It might reclaim
> > from the memcg at that time but the disruption could have already
> > happened. Not really any different from regular memory allocation
> > attempt but much more visible with GB pages and one could reasonably
> > expect that memcg should stop such a GB allocation if the local reclaim
> > would be hopeless to free up enough from its own consumption.
> > 
> > Makes more sense?
> 
> Yes, that makes sense.
> 
> This should be fairly easy to address by having hugetlb do the split
> transaction that charge_memcg() does in one go, similar to what we do
> for the hugetlb controller as well. IOW,
> 
> alloc_hugetlb_folio()
> {
> 	if (mem_cgroup_hugetlb_try_charge())
> 		return ERR_PTR(-ENOMEM);
> 
> 	folio = dequeue();
> 	if (!folio) {
> 		folio = alloc_buddy();
> 		if (!folio)
> 			goto uncharge;
> 	}
> 
> 	mem_cgroup_hugetlb_commit_charge();
> }

yes, this makes sense. I still suspect we will need a better charge
reclaim tuning for GB pages as those are just too huge and a simple
MAX_RECLAIM_RETRIES * GB worth of reclaim targets might be just overly
aggressive.

-- 
Michal Hocko
SUSE Labs