lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YSzwWIeapkzNElwV@blackbook>
Date:   Mon, 30 Aug 2021 16:51:04 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        kernel test robot <oliver.sang@...el.com>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Balbir Singh <bsingharora@...il.com>,
        Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        andi.kleen@...el.com
Subject: Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

Hello Feng.

On Wed, Aug 18, 2021 at 10:30:04AM +0800, Feng Tang <feng.tang@...el.com> wrote:
> As Shakeel also mentioned, this 0day's vm-scalability doesn't involve
> any explicit mem_cgroup configurations.

If it all happens inside root memcg, there should be no accesses to the
0x10 offset since the root memcg is excluded from refcounting. (Unless
the modified cacheline is a μarch artifact. Actually, for the lack of
other ideas, I was thinking about similar cause even for non-root memcgs
since the percpu refcounting is implemented via a segment register.)

Is this still relevant? (You refer to it as 0day's vm-scalability
issue.)

By some rough estimates there could be ~10 cgroup_subsys_sets per 10 MiB
of workload, so the 128B padding gives 1e-4 relative overhead (but
presumably less in most cases). I also think it acceptable (size-wise).

Out of curiosity, have you measured impact of reshuffling the refcnt
member into the middle of the cgroup_subsys_state (keeping it distant
both from .cgroup and .parent)?

Thanks,
Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ