[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZNrITZVTf2EILRJq@slm.duckdns.org>
Date: Mon, 14 Aug 2023 14:35:25 -1000
From: Tejun Heo <tj@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Michal Hocko <mhocko@...e.com>, Shakeel Butt <shakeelb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads
Hello,
On Mon, Aug 14, 2023 at 05:28:22PM -0700, Yosry Ahmed wrote:
> > So, the original design used mutex for synchronize flushing with the idea
> > being that updates are high freq but reads are low freq and can be
> > relatively slow. Using rstats for mm internal operations changed this
> > assumption quite a bit and we ended up switching that mutex with a lock.
>
> Naive question, do mutexes handle thundering herd problems better than
> spinlocks? I would assume so but I am not sure.
I don't know. We can ask Waiman if that becomes a problem.
> > * Flush-side, maybe we can break flushing into per-cpu or whatnot but
> > there's no avoiding the fact that flushing can take quite a while if there
> > are a lot to flush whether locks are split or not. I wonder whether it'd
> > be possible to go back to mutex for flushing and update the users to
> > either consume the cached values or operate in a sleepable context if
> > synchronous read is necessary, which is the right thing to do anyway given
> > how long flushes can take.
>
> Unfortunately it cannot be broken down into per-cpu as all flushers
> update the same per-cgroup counters, so we need a bigger locking
> scope. Switching to atomics really hurts performance. Breaking down
> the lock to be per-cgroup is doable, but since we need to lock both
> the parent and the cgroup, flushing top-level cgroups (which I assume
> is most common) will lock the root anyway.
Plus, there's not much point in flushing in parallel, so I don't feel too
enthusiastic about splitting flush locking.
> All flushers right now operate in sleepable context, so we can go
> again to the mutex if you think this will make things better. The
Yes, I think that'd be more sane.
> slowness problem reported recently is in a sleepable context, it's
> just too slow for userspace if I understand correctly.
I mean, there's a certain amount of work to do. There's no way around it if
you wanna read the counters synchronously. The only solution there would be
using a cached value or having some sort of auto-flushing mechanism so that
the amount to flush don't build up too much - e.g. keep a count of the
number of entries to flush and trigger flush if it goes over some threshold.
Thanks.
--
tejun
Powered by blists - more mailing lists