linux-kernel - Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210902133924.GA72811@shbuild999.sh.intel.com>
Date:   Thu, 2 Sep 2021 21:39:24 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Michal Koutn?? <mkoutny@...e.com>
Cc:     Andi Kleen <ak@...ux.intel.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        andi.kleen@...el.com, kernel test robot <oliver.sang@...el.com>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...e.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Balbir Singh <bsingharora@...il.com>,
        Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>
Subject: Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

On Thu, Sep 02, 2021 at 12:53:06PM +0200, Michal Koutn?? wrote:
> Hi.
> 
> On Thu, Sep 02, 2021 at 11:46:28AM +0800, Feng Tang <feng.tang@...el.com> wrote:
> > > Narrowing it down to a single prefetcher seems good enough to me. The
> > > behavior of the prefetchers is fairly complicated and hard to predict, so I
> > > doubt you'll ever get a 100% step by step explanation.
>  
> My layman explanation with the available information is that the
> prefetcher somehow behaves as if it marked the offending cacheline as
> modified (even though reading only) therefore slowing down the remote reader.

But this can't explain the test that adding 128 bytes before css->cgroup
can restore/improve the performance.
 
> On Thu, Sep 02, 2021 at 09:35:58AM +0800, Feng Tang <feng.tang@...el.com> wrote:
> > @@ -139,10 +139,21 @@ struct cgroup_subsys_state {
> >       /* PI: the cgroup that this css is attached to */
> >       struct cgroup *cgroup;
> >
> > +     struct cgroup_subsys_state *parent;
> > +
> >       /* PI: the cgroup subsystem that this css is attached to */
> >       struct cgroup_subsys *ss;
> 
> Hm, an interesting move; be mindful of commit b8b1a2e5eca6 ("cgroup:
> move cgroup_subsys_state parent field for cache locality"). It might be
> a regression for systems with cpuacct root css present. (That is likely
> a big amount nowadays, that may be the reason why you don't see full
> recovery?  For future, we may at least guard cpuacct_charge() with
> cgroup_subsys_enabled() static branch.)

Goot catch! 

Acutally I also tested only moving 'destroy_work' and 'destroy_rwork'
('parent' is not touched with the cost of 8 bytes more padding), which
has simliar effect that restore to about 15% regression. 

> > [snip]
> > Yes, I'm afriad so, given that the policy/algorithm used by perfetcher
> > keeps changing from generation to generation.
> 
> Exactly. I'm afraid of relayouting the structure with each new
> generation. A robust solution is putting all frequently accessed members
> into individual cache-lines + separating them with one more cache line? :-/

Yes, this is hard. Even for my debug patch, we can only say it works
as restoring the regression partly, but not knowing the exact reason.

Thansk,
Feng

> 
> Michal