linux-kernel - Re: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YhWn7MmBvgZzP7CA@fedora>
Date:   Tue, 22 Feb 2022 19:20:12 -0800
From:   Darren Hart <darren@...amperecomputing.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Will Deacon <will@...nel.org>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Arm <linux-arm-kernel@...ts.infradead.org>,
        Catalin Marinas <Catalin.Marinas@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Valentin Schneider <Valentin.Schneider@....com>,
        "D . Scott Phillips" <scott@...amperecomputing.com>,
        Ilkka Koskinen <ilkka@...amperecomputing.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH] arm64: smp: Skip MC domain for SoCs without shared cache

On Thu, Feb 17, 2022 at 07:56:00AM +1300, Barry Song wrote:
...
> > > > Then, there is another point:
> > > > In your case, CLUSTER level still has the flag SD_SHARE_PKG_RESOURCES
> > > > which is used to define some scheduler internal variable like
> > > > sd_llc(sched domain last level of cache) which allows fast task
> > > > migration between this cpus in this level at wakeup. In your case the
> > > > sd_llc should not be the cluster but the MC with only one CPU. But I
> > > > would not be surprised that most of perf improvement comes from this
> > > > sd_llc wrongly set to cluster instead of the single CPU
> > >
> > > I assume this "mistake" is actually what Ampere altra needs while it
> > > is wrong but getting
> > > right result? Ampere altra has already got both:
> >
> > Hi Barry,
> >
> > Generally yes - although I do think we're placing too much emphasis on
> > the "right" or "wrong" of a heuristic which are more fluid in
> > definition over time. (e.g. I expect this will look different in a year
> > based on what we learn from this and other non current default topologies).
> >
> > > 1. Load Balance between clusters
> > > 2. wake_affine by select sibling cpu which is sharing SCU
> > >
> > > I am not sure how much 1 and 2 are helping Darren's workloads respectively.
> >
> > We definitely see improvements with load balancing between clusters.
> > We're running some tests with the wake_affine patchset you pointed me to
> > (thanks for that). My initial tbench runs resulted in higher average and
> > max latencies reported. I need to collect more results and see the
> > impact to other benchmarks of interest before I have more to share on
> > that.
> 
> Hi Darren,
> if you read Vincent's comments carefully, you will find it is
> pointless for you to
> test the wake_affine patchset as you have already got it. in your
> case, sd_llc_id
> is set to sd_cluster level due to PKG_RESOURCES sharing. So with my new
> patchset for wake_affine, it is completely redundant for your machine
> as it works
> with the assumption cluster-> llc. but for your case, llc=cluster, so
> it works in
> cluster->cluster.

Thanks Barry,

Makes sense as described. I did see degradation in the tests we ran with this
patch applied to 5.17-rc3. I'll have to follow up with you on that when I can
dig into it more. I'd be interested in the specifics of your testing to run
something similar. I think you said you were reporting on tbench?

-- 
Darren Hart
Ampere Computing / OS and Kernel