linux-kernel - Re: [PATCH] mm: memcg: provide accurate stats for userspace reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZNrITZVTf2EILRJq@slm.duckdns.org>
Date:   Mon, 14 Aug 2023 14:35:25 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Michal Hocko <mhocko@...e.com>, Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads

Hello,

On Mon, Aug 14, 2023 at 05:28:22PM -0700, Yosry Ahmed wrote:
> > So, the original design used mutex for synchronize flushing with the idea
> > being that updates are high freq but reads are low freq and can be
> > relatively slow. Using rstats for mm internal operations changed this
> > assumption quite a bit and we ended up switching that mutex with a lock.
> 
> Naive question, do mutexes handle thundering herd problems better than
> spinlocks? I would assume so but I am not sure.

I don't know. We can ask Waiman if that becomes a problem.

> > * Flush-side, maybe we can break flushing into per-cpu or whatnot but
> >   there's no avoiding the fact that flushing can take quite a while if there
> >   are a lot to flush whether locks are split or not. I wonder whether it'd
> >   be possible to go back to mutex for flushing and update the users to
> >   either consume the cached values or operate in a sleepable context if
> >   synchronous read is necessary, which is the right thing to do anyway given
> >   how long flushes can take.
> 
> Unfortunately it cannot be broken down into per-cpu as all flushers
> update the same per-cgroup counters, so we need a bigger locking
> scope. Switching to atomics really hurts performance. Breaking down
> the lock to be per-cgroup is doable, but since we need to lock both
> the parent and the cgroup, flushing top-level cgroups (which I assume
> is most common) will lock the root anyway.

Plus, there's not much point in flushing in parallel, so I don't feel too
enthusiastic about splitting flush locking.

> All flushers right now operate in sleepable context, so we can go
> again to the mutex if you think this will make things better. The

Yes, I think that'd be more sane.

> slowness problem reported recently is in a sleepable context, it's
> just too slow for userspace if I understand correctly.

I mean, there's a certain amount of work to do. There's no way around it if
you wanna read the counters synchronously. The only solution there would be
using a cached value or having some sort of auto-flushing mechanism so that
the amount to flush don't build up too much - e.g. keep a count of the
number of entries to flush and trigger flush if it goes over some threshold.

Thanks.

-- 
tejun