linux-kernel - Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190412201004.GA27187@cmpxchg.org>
Date:   Fri, 12 Apr 2019 16:10:05 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        Cgroups <cgroups@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, kernel-team@...com,
        Roman Gushchin <guro@...com>
Subject: Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness
 & scalabilty

On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> We also faced this exact same issue as well and had the similar solution.
> 
> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> 
> Reviewed-by: Shakeel Butt <shakeelb@...gle.com>

Thanks for the review!

> (Unrelated to this patchset) I think there should also a way to get
> the exact memcg stats. As the machines are getting bigger (more cpus
> and larger basic page size) the accuracy of stats are getting worse.
> Internally we have an additional interface memory.stat_exact for that.
> However I am not sure in the upstream kernel will an additional
> interface is better or something like /proc/sys/vm/stat_refresh which
> sync all per-cpu stats.

I was talking to Roman about this earlier as well and he mentioned it
would be nice to have periodic flushing of the per-cpu caches. The
global vmstat has something similar. We might be able to hook into
those workers, but it would likely require some smarts so we don't
walk the entire cgroup tree every couple of seconds.

We haven't had any actual problems with the per-cpu fuzziness, mainly
because the cgroups of interest also grow in size as the machines get
bigger, and so the relative error doesn't increase.

Are your requirements that the error dissipates over time (waiting for
a threshold convergence somewhere?) or do you have automation that
gets decisions wrong due to the error at any given point in time?