linux-kernel - RE: [PATCH v2] sched/task_group: Re-layout structure to reduce false sharing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <DM6PR11MB3260465D9B5D636F6EE85C64962CA@DM6PR11MB3260.namprd11.prod.outlook.com>
Date:   Thu, 6 Jul 2023 14:05:02 +0000
From:   "Deng, Pan" <pan.deng@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        "Lu, Aaron" <aaron.lu@...el.com>
CC:     "Chen, Tim C" <tim.c.chen@...el.com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Li, Tianyou" <tianyou.li@...el.com>, "Ma, Yu" <yu.ma@...el.com>,
        "Zhu, Lipeng" <lipeng.zhu@...el.com>,
        "Chen, Yu C" <yu.c.chen@...el.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>
Subject: RE: [PATCH v2] sched/task_group: Re-layout structure to reduce false
 sharing

Hi Peter,

> -----Original Message-----
> From: Deng, Pan
> Sent: Wednesday, June 28, 2023 12:13 AM
> To: Peter Zijlstra <peterz@...radead.org>; Lu, Aaron <aaron.lu@...el.com>
> Cc: Chen, Tim C <tim.c.chen@...el.com>; vincent.guittot@...aro.org; linux-
> kernel@...r.kernel.org; Li, Tianyou <tianyou.li@...el.com>; Ma, Yu
> <yu.ma@...el.com>; Zhu, Lipeng <lipeng.zhu@...el.com>; Chen, Yu C
> <yu.c.chen@...el.com>; Tim Chen <tim.c.chen@...ux.intel.com>
> Subject: RE: [PATCH v2] sched/task_group: Re-layout structure to reduce
> false sharing
> 
> 
> 
> > -----Original Message-----
> > From: Peter Zijlstra <peterz@...radead.org>
> > Sent: Tuesday, June 27, 2023 6:15 PM
> > To: Lu, Aaron <aaron.lu@...el.com>
> > Cc: Deng, Pan <pan.deng@...el.com>; Chen, Tim C
> > <tim.c.chen@...el.com>; vincent.guittot@...aro.org;
> > linux-kernel@...r.kernel.org; Li, Tianyou <tianyou.li@...el.com>; Ma,
> > Yu <yu.ma@...el.com>; Zhu, Lipeng <lipeng.zhu@...el.com>; Chen, Yu C
> > <yu.c.chen@...el.com>; Tim Chen <tim.c.chen@...ux.intel.com>
> > Subject: Re: [PATCH v2] sched/task_group: Re-layout structure to
> > reduce false sharing
> >
> > On Mon, Jun 26, 2023 at 01:47:56PM +0800, Aaron Lu wrote:
> >
> > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index
> > > > ec7b3e0a2b20..4fbd4b3a4bdd 100644
> > > > --- a/kernel/sched/sched.h
> > > > +++ b/kernel/sched/sched.h
> > > > @@ -389,6 +389,19 @@ struct task_group {  #endif  #endif
> > > >
> > > > +	struct rcu_head		rcu;
> > > > +	struct list_head	list;
> > > > +
> > > > +	struct list_head	siblings;
> > > > +	struct list_head	children;
> > > > +
> > > > +	/*
> > > > +	 * To reduce false sharing, current layout is optimized to make
> > > > +	 * sure load_avg is in a different cacheline from parent, rt_se
> > > > +	 * and rt_rq.
> > > > +	 */
> >
> > That comment is misleading I think; you don't particularly care about
> > those fields more than any other active fields that would cause false
> sharing.
> >
> 
> How about this one:
> 	/*
> 	 * load_avg can also cause cacheline bouncing with parent, rt_se
> 	 * and rt_rq, current layout is optimized to make sure they are in
> 	 * different cachelines.
> 	 */
> 
Does it work for you? Please feel free to drop any suggestion.

> > > > +	struct task_group	*parent;
> > > > +
> > >
> > > I wonder if we can simply do:
> > >
> > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index
> > > ec7b3e0a2b20..31b73e8d9568 100644
> > > --- a/kernel/sched/sched.h
> > > +++ b/kernel/sched/sched.h
> > > @@ -385,7 +385,9 @@ struct task_group {
> > >  	 * it in its own cacheline separated from the fields above which
> > >  	 * will also be accessed at each tick.
> > >  	 */
> > > -	atomic_long_t		load_avg ____cacheline_aligned;
> > > +	struct {
> > > +		atomic_long_t		load_avg;
> > > +	} ____cacheline_aligned_in_smp;
> > >  #endif
> > >  #endif
> > >
> > > This way it can make sure there is no false sharing with load_avg no
> > > matter how the layout of this structure changes in the future.
> >
> > This. Also, ISTR there was a series to split this atomic across nodes;
> > whatever happend to that, and can we still measure an improvement over
> > this with that approach?
> 
> I just ran unixbench context-switching in 1 node with 40C/80T, without this
> change perf c2c data shows c2c bouncing is still there, perf record data
> shows set_task_cpu takes ~4.5% overall cycles. With this change, that false-
> sharing is resolved, and set_task_cpu cycles drop to 0.5%.
> 
I mean even the only 1 NUMA node situation, this change benefits.
Aaron posted his performance data of "split atomic across nodes" over this patch 
@https://lore.kernel.org/lkml/20230630093500.GA579792@ziqianlu-dell/, 
looks they are complementary, so is it possible to merge this change firstly?

Thanks
Pan

> Thanks
> Pan