[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAgaK-EtQp_tzxM5Rcw=LORnrrZBbh24C8bqQ4m1u_-rQ@mail.gmail.com>
Date: Wed, 23 Dec 2020 14:23:41 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Mel Gorman <mgorman@...hsingularity.net>,
linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Qais Yousef <qais.yousef@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Jiang Biao <benbjiang@...il.com>
Subject: Re: [RFC][PATCH 0/5] select_idle_sibling() wreckage
On Wed, 16 Dec 2020 at 19:07, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Wed, 16 Dec 2020 at 14:00, Li, Aubrey <aubrey.li@...ux.intel.com> wrote:
> >
> > Hi Peter,
> >
> > On 2020/12/15 0:48, Peter Zijlstra wrote:
> > > Hai, here them patches Mel asked for. They've not (yet) been through the
> > > robots, so there might be some build fail for configs I've not used.
> > >
> > > Benchmark time :-)
> > >
> >
> > Here is the data on my side, benchmarks were tested on a x86 4 sockets system
> > with 24 cores per socket and 2 hyperthreads per core, total 192 CPUs.
> >
> > uperf throughput: netperf workload, tcp_nodelay, r/w size = 90
> >
> > threads baseline-avg %std patch-avg %std
> > 96 1 0.78 1.0072 1.09
> > 144 1 0.58 1.0204 0.83
> > 192 1 0.66 1.0151 0.52
> > 240 1 2.08 0.8990 0.75
> >
> > hackbench: process mode, 25600 loops, 40 file descriptors per group
> >
> > group baseline-avg %std patch-avg %std
> > 2(80) 1 10.02 1.0339 9.94
> > 3(120) 1 6.69 1.0049 6.92
> > 4(160) 1 6.76 0.8663 8.74
> > 5(200) 1 2.96 0.9651 4.28
> >
> > schbench: 99th percentile latency, 16 workers per message thread
> >
> > mthread baseline-avg %std patch-avg %std
> > 6(96) 1 0.88 1.0055 0.81
> > 9(144) 1 0.59 1.0007 0.37
> > 12(192) 1 0.61 0.9973 0.82
> > 15(240) 1 25.05 0.9251 18.36
> >
> > sysbench mysql throughput: read/write, table size = 10,000,000
> >
> > thread baseline-avg %std patch-avg %std
> > 96 1 6.62 0.9668 4.04
> > 144 1 9.29 0.9579 6.53
> > 192 1 9.52 0.9503 5.35
> > 240 1 8.55 0.9657 3.34
> >
> > It looks like
> > - hackbench has a significant improvement of 4 groups
> > - uperf has a significant regression of 240 threads
>
> Tests are still running on my side but early results shows perf
> regression for hackbench
Few more results before being off:
On small embedded system, the problem seems to be mainly a matter of
setting the right number of loops.
On large smt system, The system on which I usually run my tests if
off for now so i haven't been able to finalize tests yet but the
problem might be that we don't loop all core anymore with this
patchset compare to current algorithm
>
> >
> > Please let me know if you have any interested cases I can run/rerun.
> >
> > Thanks,
> > -Aubrey
Powered by blists - more mailing lists