linux-kernel - Re: [PATCH 2/2] mm/memcontrol: split local and nested atomic vmstats/vmevents counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e768596e-f012-b8f0-ee3c-773abb7a3692@yandex-team.ru>
Date:   Thu, 18 Jul 2019 18:08:04 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH 2/2] mm/memcontrol: split local and nested atomic
 vmstats/vmevents counters

On 17.07.2019 20:53, Johannes Weiner wrote:
> On Wed, Jul 17, 2019 at 03:29:19PM +0300, Konstantin Khlebnikov wrote:
>> This is alternative solution for problem addressed in commit 815744d75152
>> ("mm: memcontrol: don't batch updates of local VM stats and events").
>>
>> Instead of adding second set of percpu counters which wastes memory and
>> slows down showing statistics in cgroup-v1 this patch use two arrays of
>> atomic counters: local and nested statistics.
>>
>> Then update has the same amount of atomic operations: local update and
>> one nested for each parent cgroup. Readers of hierarchical statistics
>> have to sum two atomics which isn't a big deal.
>>
>> All updates are still batched using one set of percpu counters.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
> 
> Yeah that looks better. Note that it was never about the atomics,
> though, but rather the number of cachelines dirtied. Your patch should
> solve this problem as well, but it might be a good idea to run
> will-it-scale on it to make sure the struct layout is still fine.
> 

Looks like this patch shows 2% regression for 24 core 2 numa node
machine I have. Compete remove of these counters gives 2% boost.
Also I cannot reproduce regression fixed by commit 815744d75152 - revert
have no effect.

So, feel free to ignore second patch. I'll play with this a little more.

Maybe atomic per-numa counters could give nice balance between scalability add overhead.
Ideally this memory could be mapped in per-cpu manner to give atomic access via fs/gs.