lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Jul 2023 14:57:57 -0500
From:   David Vernet <void@...ifault.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        gautham.shenoy@....com, kprateek.nayak@....com, aaron.lu@...el.com,
        clm@...a.com, tj@...nel.org, roman.gushchin@...ux.dev,
        kernel-team@...a.com
Subject: Re: [PATCH v2 6/7] sched: Shard per-LLC shared runqueues

On Tue, Jul 11, 2023 at 12:49:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2023 at 03:03:41PM -0500, David Vernet wrote:
> 
> > +struct shared_runq_shard {
> >  	struct list_head list;
> >  	spinlock_t lock;
> >  } ____cacheline_aligned;
> >  
> > +struct shared_runq {
> > +	u32 num_shards;
> > +	struct shared_runq_shard shards[];
> > +} ____cacheline_aligned;
> > +
> > +/* This would likely work better as a configurable knob via debugfs */
> > +#define SHARED_RUNQ_SHARD_SZ 6
> > +
> >  #ifdef CONFIG_SMP
> >  static struct shared_runq *rq_shared_runq(struct rq *rq)
> >  {
> >  	return rq->cfs.shared_runq;
> >  }
> >  
> > -static struct task_struct *shared_runq_pop_task(struct rq *rq)
> > +static struct shared_runq_shard *rq_shared_runq_shard(struct rq *rq)
> > +{
> > +	return rq->cfs.shard;
> > +}
> > +
> > +static int shared_runq_shard_idx(const struct shared_runq *runq, int cpu)
> > +{
> > +	return cpu % runq->num_shards;
> 
> I would suggest either:
> 
> 	(cpu >> 1) % num_shards
>
> or keeping num_shards even, to give SMT siblings a fighting chance to
> hit the same bucket.

Given that neither of these approaches guarantees that the SMT siblings
are in the same bucket, I'll just go with your suggestion which is
simpler.

Seems inevitable that we'll want to have another debugfs knob to adjust
the number of shards, but IMO it's preferable to just apply your
suggestion in v3 and hold off on adding that complexity until we know we
need it.

> (I've no idea how SMT4 (or worse SMT8) is typically enumerated, so
> someone from the Power/Sparc/MIPS world would have to go play with that
> if they so care)

Yeah, no idea either. If these things end up varying a lot across
different architectures then we can look into making shard assignment
architecture specific.

> 
> > +}
> 
> > +			num_shards = max(per_cpu(sd_llc_size, i) /
> > +					 SHARED_RUNQ_SHARD_SZ, 1);
> 
> > +			shared_runq->num_shards = num_shards;
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ