linux-kernel - Re: [PATCH v2 6/7] sched: Shard per-LLC shared runqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230711104958.GG3062772@hirez.programming.kicks-ass.net>
Date:   Tue, 11 Jul 2023 12:49:58 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     David Vernet <void@...ifault.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        gautham.shenoy@....com, kprateek.nayak@....com, aaron.lu@...el.com,
        clm@...a.com, tj@...nel.org, roman.gushchin@...ux.dev,
        kernel-team@...a.com
Subject: Re: [PATCH v2 6/7] sched: Shard per-LLC shared runqueues

On Mon, Jul 10, 2023 at 03:03:41PM -0500, David Vernet wrote:

> +struct shared_runq_shard {
>  	struct list_head list;
>  	spinlock_t lock;
>  } ____cacheline_aligned;
>  
> +struct shared_runq {
> +	u32 num_shards;
> +	struct shared_runq_shard shards[];
> +} ____cacheline_aligned;
> +
> +/* This would likely work better as a configurable knob via debugfs */
> +#define SHARED_RUNQ_SHARD_SZ 6
> +
>  #ifdef CONFIG_SMP
>  static struct shared_runq *rq_shared_runq(struct rq *rq)
>  {
>  	return rq->cfs.shared_runq;
>  }
>  
> -static struct task_struct *shared_runq_pop_task(struct rq *rq)
> +static struct shared_runq_shard *rq_shared_runq_shard(struct rq *rq)
> +{
> +	return rq->cfs.shard;
> +}
> +
> +static int shared_runq_shard_idx(const struct shared_runq *runq, int cpu)
> +{
> +	return cpu % runq->num_shards;

I would suggest either:

	(cpu >> 1) % num_shards

or keeping num_shards even, to give SMT siblings a fighting chance to
hit the same bucket.

(I've no idea how SMT4 (or worse SMT8) is typically enumerated, so
someone from the Power/Sparc/MIPS world would have to go play with that
if they so care)

> +}

> +			num_shards = max(per_cpu(sd_llc_size, i) /
> +					 SHARED_RUNQ_SHARD_SZ, 1);

> +			shared_runq->num_shards = num_shards;