linux-kernel - Re: [PATCH v2] sched/task_group: Re-layout structure to reduce false sharing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230630093500.GA579792@ziqianlu-dell>
Date:   Fri, 30 Jun 2023 17:35:00 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Deng Pan <pan.deng@...el.com>, <tim.c.chen@...el.com>,
        <vincent.guittot@...aro.org>, <linux-kernel@...r.kernel.org>,
        <tianyou.li@...el.com>, <yu.ma@...el.com>, <lipeng.zhu@...el.com>,
        <yu.c.chen@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH v2] sched/task_group: Re-layout structure to reduce false
 sharing

On Wed, Jun 28, 2023 at 01:18:34PM +0800, Aaron Lu wrote:
> On Tue, Jun 27, 2023 at 12:14:37PM +0200, Peter Zijlstra wrote:
> > and can we still measure an improvement over this with that approach?
> 
> Let me re-run those tests and see how things change.
> 
> In my previous tests I didn't turn on CONFIG_RT_GROUP_SCHED. To test
> this, I suppose I'll turn CONFIG_RT_GROUP_SCHED on and apply this change
> here that made tg->load_avg in a dedicated cacheline, then see how
> performances change with the "Make tg->load_avg per node" patch. Will
> report back once done.

The test summary is:
- On 2sockets/112cores/224threads SPR, it's still overall a win.
  Transactions of postgres_sysbench improved 47.7%, hackbench improved
  13.5% and netperf improved 39.5%;
- On 2sockets/64cores/128threads Icelake, hackbench and netperf have
  improvement while postgres_sysbench transaction slightly dropped.
  hackbench improved 6.2%, netperf improved 20.3% and transactions of
  postgres_sysbench dropped 1.2%;
- On 2sockets/48cores/96threads CascadeLake, hackbench and netperf are
  roughly flat.

Below are detailed results:

SPR: 2socket/112cores/224threads

postgres_sysbench/1instance/100%(nr_client=nr_cpu)
kernel          transactions(higher is better)
aligned         89623.85±0.35%
per_node       132401.37±0.83%

hackbench/pipe/threads
kernel          time(less is better)
aligned         47.43±0.48%
per_node        41.02±0.69%

netperf/UDP_RR/100%(nr_client=nr_cpu)
kernel          throughput(higher is better)
aligned          9415.97±3.81%
per_node        13131.24±2.67%

ICL: 2sockets/64cores/128threads

postgres_sysbench/1instance/75%
kernel         transactions
aligned        62291.58±0.64%
per_node       61561.40±0.39%

hackbench/pipe/threads
kernel         time
aligned        41.66±0.04%
per_node       39.07±0.36%

netperf/UDP_RR/100%
kernel         throughput
aligned        21365.01±3.32%
per_node       25692.05±2.03%

CSL: 2sockets/48cores/96threads

hackbench/pipe/threads
kernel          time
aligned:        48.78±0.61%
per_node:       48.95±1.06

netperf/UDP_RR/100%
kernel         throughput
aligned        25853.82±11.46%
per_node       25264.38±0.85%

I think I'll spin a new version for the "Make tg->load_avg per-node"
patch with all the information I collected.