[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <23a1e55f-bf03-4c05-a954-28326b59b06b@intel.com>
Date: Fri, 19 Dec 2025 21:56:18 +0800
From: "Guo, Wangyang" <wangyang.guo@...el.com>
To: Shubhang Kaushik Prasanna Kumar <shkaushik@...erecomputing.com>
Cc: benjamin.lei@...el.com, bsegall@...gle.com, dietmar.eggemann@....com,
juri.lelli@...hat.com, linux-kernel@...r.kernel.org, mgorman@...e.de,
mingo@...hat.com, peterz@...radead.org, rostedt@...dmis.org,
tianyou.li@...el.com, tim.c.chen@...ux.intel.com,
vincent.guittot@...aro.org, vschneid@...hat.com
Subject: Re: [PATCH] sched/fair: Avoid false sharing in nohz struct
On 12/19/2025 9:38 AM, Shubhang Kaushik Prasanna Kumar wrote:
> While the intuition behind isolating the `nr_cpus` counter seems correct, could you please justify the added padding ? As this is a high contention path in the scheduler, we shouldn't be inflating global structures with padding on logic alone. I’d like to see some benchmarking such as `perf c2c` results from a multi-core system proving the `false sharing` scenario as a measurable bottleneck.
Here is the cache line view in c2c report:
Num RmtHitm LclHitm Offset records Symbol
6.25% 0.00% 0.00% 0x0 4 [k] _nohz_idle_balance.isra.0
18.75% 100.00% 0.00% 0x8 14 [k] nohz_balance_exit_idle
6.25% 0.00% 0.00% 0x8 8 [k] nohz_balance_enter_idle
6.25% 0.00% 0.00% 0xc 8 [k] sched_balance_newidle
6.25% 0.00% 0.00% 0x10 31 [k] nohz_balancer_kick
6.25% 0.00% 0.00% 0x20 16 [k] sched_balance_newidle
37.50% 0.00% 0.00% 0x38 50 [k] irqtime_account_irq
6.25% 0.00% 0.00% 0x38 47 [k] account_process_tick
6.25% 0.00% 0.00% 0x38 12 [k] account_idle_ticks
Offsets:
* 0x0 -- nohz.idle_cpu_mask (r)
* 0x8 -- nohz.nr_cpus (w)
* 0x38 -- sched_clock_irqtime (r), not in nohz, but share cacheline
The layout in /proc/kallsyms can also confirm that:
ffffffff88600d40 b nohz
ffffffff88600d68 B arch_needs_tick_broadcast
ffffffff88600d6c b __key.264
ffffffff88600d6c b __key.265
ffffffff88600d70 b dl_generation
ffffffff88600d78 b sched_clock_irqtime
In perf cycle hotspots, we noticed that irqtime_account_irq has 3%
overhead caused by false sharing. With the patch applied, that hotspot
disappears.
> I am also concerned about the internal layout. By sandwiching the timer fields between two `__cacheline_aligned` boundaries, we might just be shifting the contention rather than fixing it. See to it that fields like `next_balance` aren't being squeezed into a new conflict zone. Would like to review the benchmark data and the struct layout before we move forward.
Before the patch, all the nohz members stay in the same cache line.
After layout fix, fields like `next_balance` still in the same cache
line, no new conflict zone created.
Since nohz is global, the size inflating is minimal - less than 64 bytes
in total.
BR
Wangyang
Powered by blists - more mailing lists