linux-kernel - Re: [PATCH v4 4/4] mm: memcg: use non-unified stats flushing for userspace reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZP92xP5rdKdeps7Z@mtj.duckdns.org>
Date:   Mon, 11 Sep 2023 10:21:24 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Wei Xu <weixugc@...gle.com>
Cc:     Michal Hocko <mhocko@...e.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>,
        Michal Koutný <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        Greg Thelen <gthelen@...gle.com>
Subject: Re: [PATCH v4 4/4] mm: memcg: use non-unified stats flushing for
 userspace reads

Hello,

On Mon, Sep 11, 2023 at 01:01:25PM -0700, Wei Xu wrote:
> Yes, it is the same test (10K contending readers). The kernel change
> is to remove stats_user_flush_mutex from mem_cgroup_user_flush_stats()
> so that the concurrent mem_cgroup_user_flush_stats() requests directly
> contend on cgroup_rstat_lock in cgroup_rstat_flush().

I don't think it'd be a good idea to twist rstat and other kernel internal
code to accommodate 10k parallel readers. If we want to support that, let's
explicitly support that by implementing better batching in the read path.
The only guarantee you need is that there has been at least one flush since
the read attempt started, so we can do sth like the following in the read
path:

1. Grab a waiter lock. Remember the current timestamp.

2. Try lock flush mutex. If obtained, drop the waiter lock, flush. Regrab
   the waiter lock, update the latest flush time to my start time, wake up
   waiters on the waitqueue (maybe do custom wakeups based on start time?).

3. Release the waiter lock and sleep on the waitqueue.

4. When woken up, regarb the waiter lock, compare whether the latest flush
   timestamp is later than my start time, if so, return the latest result.
   If not go back to #2.

Maybe the above isn't the best way to do it but you get the general idea.
When you have that many concurrent readers, most of them won't need to
actually flush.

Thanks.

-- 
tejun