linux-kernel - Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZOR6eyYfJYlxdMet@dhcp22.suse.cz>
Date:   Tue, 22 Aug 2023 11:06:03 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for
 userspace reads

On Mon 21-08-23 20:54:58, Yosry Ahmed wrote:
> Unified flushing allows for great concurrency for paths that attempt to
> flush the stats, at the expense of potential staleness and a single
> flusher paying the extra cost of flushing the full tree.
> 
> This tradeoff makes sense for in-kernel flushers that may observe high
> concurrency (e.g. reclaim, refault). For userspace readers, stale stats
> may be unexpected and problematic, especially when such stats are used
> for critical paths such as userspace OOM handling. Additionally, a
> userspace reader will occasionally pay the cost of flushing the entire
> hierarchy, which also causes problems in some cases [1].
> 
> Opt userspace reads out of unified flushing. This makes the cost of
> reading the stats more predictable (proportional to the size of the
> subtree), as well as the freshness of the stats. Since userspace readers
> are not expected to have similar concurrency to in-kernel flushers,
> serializing them among themselves and among in-kernel flushers should be
> okay.
> 
> This was tested on a machine with 256 cpus by running a synthetic test
> The script that creates 50 top-level cgroups, each with 5 children (250
> leaf cgroups). Each leaf cgroup has 10 processes running that allocate
> memory beyond the cgroup limit, invoking reclaim (which is an in-kernel
> unified flusher). Concurrently, one thread is spawned per-cgroup to read
> the stats every second (including root, top-level, and leaf cgroups --
> so total 251 threads). No regressions were observed in the total running
> time; which means that non-unified userspace readers are not slowing
> down in-kernel unified flushers:

I have to admit I am rather confused by cgroup_rstat_flush (and
cgroup_rstat_flush_locked). The former says it can block but the later
doesn't ever block and even if it drops the cgroup_rstat_lock it merely
cond_rescheds or busy loops. How much of a contention and yielding can
you see with this patch? What is the worst case? How bad a random user
can make the situation by going crazy and trying to flush from many
different contexts?
-- 
Michal Hocko
SUSE Labs