linux-kernel - Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YSzwWIeapkzNElwV@blackbook>
Date:   Mon, 30 Aug 2021 16:51:04 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        kernel test robot <oliver.sang@...el.com>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Balbir Singh <bsingharora@...il.com>,
        Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        andi.kleen@...el.com
Subject: Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

Hello Feng.

On Wed, Aug 18, 2021 at 10:30:04AM +0800, Feng Tang <feng.tang@...el.com> wrote:
> As Shakeel also mentioned, this 0day's vm-scalability doesn't involve
> any explicit mem_cgroup configurations.

If it all happens inside root memcg, there should be no accesses to the
0x10 offset since the root memcg is excluded from refcounting. (Unless
the modified cacheline is a μarch artifact. Actually, for the lack of
other ideas, I was thinking about similar cause even for non-root memcgs
since the percpu refcounting is implemented via a segment register.)

Is this still relevant? (You refer to it as 0day's vm-scalability
issue.)

By some rough estimates there could be ~10 cgroup_subsys_sets per 10 MiB
of workload, so the 128B padding gives 1e-4 relative overhead (but
presumably less in most cases). I also think it acceptable (size-wise).

Out of curiosity, have you measured impact of reshuffling the refcnt
member into the middle of the cgroup_subsys_state (keeping it distant
both from .cgroup and .parent)?

Thanks,
Michal