linux-kernel - Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230914225844.woz7mke6vnmwijh7@google.com>
Date:   Thu, 14 Sep 2023 22:58:44 +0000
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        "Michal Koutný" <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>, kernel-team@...udflare.com,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy

On Thu, Sep 14, 2023 at 10:56:52AM -0700, Yosry Ahmed wrote:
[...]
> >
> > 1. How much delayed/stale stats have you observed on real world workload?
> 
> I am not really sure. We don't have a wide deployment of kernels with
> rstat yet. These are problems observed in testing and/or concerns
> expressed by our userspace team.
> 

Why sleep(2) not good enough for the tests?

> I am trying to solve this now because any problems that result from
> this staleness will be very hard to debug and link back to stale
> stats.
> 

I think first you need to show if this (2 sec stale stats) is really a
problem.

> >
> > 2. What is acceptable staleness in the stats for your use-case?
> 
> Again, unfortunately I am not sure, but right now it can be O(seconds)
> which is not acceptable as we have workloads querying the stats every
> 1s (and sometimes more frequently).
> 

It is 2 seconds in most cases and if it is higher, the system is already
in bad shape. O(seconds) seems more dramatic. So, why 2 seconds
staleness is not acceptable? Is 1 second acceptable? or 500 msec? Let's
look at the use-cases below.

> >
> > 3. What is your use-case?
> 
> A few use cases we have that may be affected by this:
> - System overhead: calculations using memory.usage and some stats from
> memory.stat. If one of them is fresh and the other one isn't we have
> an inconsistent view of the system.
> - Userspace OOM killing: We use some stats in memory.stat to gauge the
> amount of memory that will be freed by killing a task as sometimes
> memory.usage includes shared resources that wouldn't be freed anyway.
> - Proactive reclaim: we read memory.stat in a proactive reclaim
> feedback loop, stale stats may cause us to mistakenly think reclaim is
> ineffective and prematurely stop.
> 

I don't see why userspace OOM killing and proactive reclaim need
subsecond accuracy. Please explain. Same for system overhead but I can
see the complication of two different sources for stats. Can you provide
the formula of system overhead? I am wondering why do you need to read
stats from memory.stat files. Why not the memory.current of top level
cgroups and /proc/meminfo be enough. Something like:

Overhead = MemTotal - MemFree - SumOfTopCgroups(memory.current)

> >
> > I know I am going back on some of the previous agreements but this
> > whole locking back and forth has made in question the original
> > motivation.
> 
> That's okay. Taking a step back, having flushing being indeterministic

I would say atmost 2 second stale instead of indeterministic.

> in this way is a time bomb in my opinion. Note that this also affects
> in-kernel flushers like reclaim or dirty isolation

Fix the in-kernel flushers separately. Also the problem Cloudflare is
facing does not need to be tied with this.