linux-kernel - Re: [PATCH 1/4] memcg: simplify consume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ik3yjjt6evxexkaculyiibgrgxtvimwx7rzalpbecb75gpmmck@pcsmy6kxzynb>
Date: Tue, 29 Apr 2025 21:37:26 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org, 
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Meta kernel team <kernel-team@...a.com>, bpf@...r.kernel.org
Subject: Re: [PATCH 1/4] memcg: simplify consume_stock

On Tue, Apr 29, 2025 at 04:51:03PM -0700, Alexei Starovoitov wrote:
> On Tue, Apr 29, 2025 at 04:04:25PM -0700, Shakeel Butt wrote:
> > The consume_stock() does not need to check gfp_mask for spinning and can
> > simply trylock the local lock to decide to proceed or fail. No need to
> > spin at all for local lock.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> > ---
> >  mm/memcontrol.c | 20 +++++++-------------
> >  1 file changed, 7 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 650fe4314c39..40d0838d88bc 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1804,16 +1804,14 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
> >   * consume_stock: Try to consume stocked charge on this cpu.
> >   * @memcg: memcg to consume from.
> >   * @nr_pages: how many pages to charge.
> > - * @gfp_mask: allocation mask.
> >   *
> > - * The charges will only happen if @memcg matches the current cpu's memcg
> > - * stock, and at least @nr_pages are available in that stock.  Failure to
> > - * service an allocation will refill the stock.
> > + * Consume the cached charge if enough nr_pages are present otherwise return
> > + * failure. Also return failure for charge request larger than
> > + * MEMCG_CHARGE_BATCH or if the local lock is already taken.
> >   *
> >   * returns true if successful, false otherwise.
> >   */
> > -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > -			  gfp_t gfp_mask)
> > +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> >  	struct memcg_stock_pcp *stock;
> >  	uint8_t stock_pages;
> > @@ -1821,12 +1819,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >  	bool ret = false;
> >  	int i;
> >  
> > -	if (nr_pages > MEMCG_CHARGE_BATCH)
> > -		return ret;
> > -
> > -	if (gfpflags_allow_spinning(gfp_mask))
> > -		local_lock_irqsave(&memcg_stock.stock_lock, flags);
> > -	else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> > +	if (nr_pages > MEMCG_CHARGE_BATCH ||
> > +	    !local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> 
> I don't think it's a good idea.
> spin_trylock() will fail often enough in PREEMPT_RT.
> Even during normal boot I see preemption between tasks and they
> contend on the same cpu for the same local_lock==spin_lock.
> Making them take slow path is a significant behavior change
> that needs to be carefully considered.

I didn't really think too much about PREEMPT_RT kernels as I assume
performance is not top priority but I think I get your point. Let me
explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
lock is a spin lock which is actually a mutex but with priority
inheritance. A task having the local lock can still get context switched
(but will remain on same CPU run queue) and the newer task can try to
acquire the memcg stock local lock. If we just do trylock, it will
always go to the slow path but if we do local_lock() then it will sleeps
and possibly gives its priority to the task owning the lock and possibly
make that task to get the CPU. Later the task slept on memcg stock lock
will wake up and go through fast path.


Ok, I will drop the first patch. Please let me know your comments on the
remaining series.

> 
> Also please cc bpf@...r in the future for these kind of changes.

Sounds good.