lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVtGHoboSix3rexr@slm.duckdns.org>
Date:   Mon, 4 Oct 2021 08:21:18 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Michal Koutný <mkoutny@...e.com>,
        Cgroups <cgroups@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cgroup: rstat: optimize flush through speculative test

On Mon, Oct 04, 2021 at 11:07:45AM -0700, Shakeel Butt wrote:
> > Sorry for being so slow but can you point to the exact call path which gets
> > slowed down so significantly?
> 
> This is the mem_cgroup_flush_stats() inside workingset_refault() in
> mm/workingset.c.

I see. Was looking at a repo which was too old.

> > I'm mostly wondering whether we need some sort
> > of time-batched flushes because even with lock avoidance the flush path
> > really isn't great when called frequently. We can mitigate it further if
> > necessary - e.g. by adding an "updated" bitmap so that the flusher doesn't
> > have to go around touching the cachelines for all the cpus.
> 
> For the memcg stats, I already proposed a batched flush at [1].
> 
> I actually did perform the same experiment with the proposed patch
> along with [1] and it improves around just 1%. More specifically for
> memcg stats [1] is good enough but that is memcg specific and this
> patch has merits on its own.

So, the current rstat code doesn't pay a lot of attention to optimizing the
read path - the reasoning being that as long as we avoid O(nr_cgroups), the
flush operations aren't frequent enough to be problematic. The use in
refault path seems to change that balance and it likely is worthwhile to
update rstat accordingly. As I mentioned above, a next step could be adding
a cpumask which tracks cpus with populated updated list, which should add
pretty small cost to the writers while making frequent flushes significantly
cheaper.

What do you think about that approach? While the proposed patch looks fine,
it kinda bothers me that it's a very partial optimization - ie. if flush
frequency is high enough for this to matter, that for_each_possible_cpu()
scanning loop really isn't appropriate.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ