[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230630093500.GA579792@ziqianlu-dell>
Date: Fri, 30 Jun 2023 17:35:00 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Deng Pan <pan.deng@...el.com>, <tim.c.chen@...el.com>,
<vincent.guittot@...aro.org>, <linux-kernel@...r.kernel.org>,
<tianyou.li@...el.com>, <yu.ma@...el.com>, <lipeng.zhu@...el.com>,
<yu.c.chen@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH v2] sched/task_group: Re-layout structure to reduce false
sharing
On Wed, Jun 28, 2023 at 01:18:34PM +0800, Aaron Lu wrote:
> On Tue, Jun 27, 2023 at 12:14:37PM +0200, Peter Zijlstra wrote:
> > and can we still measure an improvement over this with that approach?
>
> Let me re-run those tests and see how things change.
>
> In my previous tests I didn't turn on CONFIG_RT_GROUP_SCHED. To test
> this, I suppose I'll turn CONFIG_RT_GROUP_SCHED on and apply this change
> here that made tg->load_avg in a dedicated cacheline, then see how
> performances change with the "Make tg->load_avg per node" patch. Will
> report back once done.
The test summary is:
- On 2sockets/112cores/224threads SPR, it's still overall a win.
Transactions of postgres_sysbench improved 47.7%, hackbench improved
13.5% and netperf improved 39.5%;
- On 2sockets/64cores/128threads Icelake, hackbench and netperf have
improvement while postgres_sysbench transaction slightly dropped.
hackbench improved 6.2%, netperf improved 20.3% and transactions of
postgres_sysbench dropped 1.2%;
- On 2sockets/48cores/96threads CascadeLake, hackbench and netperf are
roughly flat.
Below are detailed results:
SPR: 2socket/112cores/224threads
postgres_sysbench/1instance/100%(nr_client=nr_cpu)
kernel transactions(higher is better)
aligned 89623.85±0.35%
per_node 132401.37±0.83%
hackbench/pipe/threads
kernel time(less is better)
aligned 47.43±0.48%
per_node 41.02±0.69%
netperf/UDP_RR/100%(nr_client=nr_cpu)
kernel throughput(higher is better)
aligned 9415.97±3.81%
per_node 13131.24±2.67%
ICL: 2sockets/64cores/128threads
postgres_sysbench/1instance/75%
kernel transactions
aligned 62291.58±0.64%
per_node 61561.40±0.39%
hackbench/pipe/threads
kernel time
aligned 41.66±0.04%
per_node 39.07±0.36%
netperf/UDP_RR/100%
kernel throughput
aligned 21365.01±3.32%
per_node 25692.05±2.03%
CSL: 2sockets/48cores/96threads
hackbench/pipe/threads
kernel time
aligned: 48.78±0.61%
per_node: 48.95±1.06
netperf/UDP_RR/100%
kernel throughput
aligned 25853.82±11.46%
per_node 25264.38±0.85%
I think I'll spin a new version for the "Make tg->load_avg per-node"
patch with all the information I collected.
Powered by blists - more mailing lists