linux-kernel - Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230614202625.GB2883716@maniforge>
Date:   Wed, 14 Jun 2023 15:26:25 -0500
From:   David Vernet <void@...ifault.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        rostedt@...dmis.org, dietmar.eggemann@....com, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        joshdon@...gle.com, roman.gushchin@...ux.dev, tj@...nel.org,
        kernel-team@...a.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

On Tue, Jun 13, 2023 at 10:41:11AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 12:20:04AM -0500, David Vernet wrote:
> > +struct swqueue {
> > +	struct list_head list;
> > +	spinlock_t lock;
> > +} ____cacheline_aligned;
> > +
> >  #ifdef CONFIG_SMP
> > +static struct swqueue *rq_swqueue(struct rq *rq)
> > +{
> > +	return rq->cfs.swqueue;
> > +}
> > +
> > +static struct task_struct *swqueue_pull_task(struct swqueue *swqueue)
> > +{
> > +	unsigned long flags;
> > +
> > +	struct task_struct *p;
> > +
> > +	spin_lock_irqsave(&swqueue->lock, flags);
> > +	p = list_first_entry_or_null(&swqueue->list, struct task_struct,
> > +				     swqueue_node);
> > +	if (p)
> > +		list_del_init(&p->swqueue_node);
> > +	spin_unlock_irqrestore(&swqueue->lock, flags);
> > +
> > +	return p;
> > +}
> > +
> > +static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
> > +{
> > +	unsigned long flags;
> > +	struct swqueue *swqueue;
> > +	bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
> > +	bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
> > +
> > +	/*
> > +	 * Only enqueue the task in the shared wakequeue if:
> > +	 *
> > +	 * - SWQUEUE is enabled
> > +	 * - The task is on the wakeup path
> > +	 * - The task wasn't purposefully migrated to the current rq by
> > +	 *   select_task_rq()
> > +	 * - The task isn't pinned to a specific CPU
> > +	 */
> > +	if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
> > +		return;
> > +
> > +	swqueue = rq_swqueue(rq);
> > +	spin_lock_irqsave(&swqueue->lock, flags);
> > +	list_add_tail(&p->swqueue_node, &swqueue->list);
> > +	spin_unlock_irqrestore(&swqueue->lock, flags);
> > +}
> > +
> >  static int swqueue_pick_next_task(struct rq *rq, struct rq_flags *rf)
> >  {
> > -	return 0;
> > +	struct swqueue *swqueue;
> > +	struct task_struct *p = NULL;
> > +	struct rq *src_rq;
> > +	struct rq_flags src_rf;
> > +	int ret;
> > +
> > +	swqueue = rq_swqueue(rq);
> > +	if (!list_empty(&swqueue->list))
> > +		p = swqueue_pull_task(swqueue);
> > +
> > +	if (!p)
> > +		return 0;
> > +
> > +	rq_unpin_lock(rq, rf);
> > +	raw_spin_rq_unlock(rq);
> > +
> > +	src_rq = task_rq_lock(p, &src_rf);
> > +
> > +	if (task_on_rq_queued(p) && !task_on_cpu(rq, p))
> > +		src_rq = migrate_task_to(src_rq, &src_rf, p, cpu_of(rq));
> > +
> > +	if (src_rq->cpu != rq->cpu)
> > +		ret = 1;
> > +	else
> > +		ret = -1;
> > +
> > +	task_rq_unlock(src_rq, p, &src_rf);
> > +
> > +	raw_spin_rq_lock(rq);
> > +	rq_repin_lock(rq, rf);
> > +
> > +	return ret;
> >  }
> >  
> >  static void swqueue_remove_task(struct task_struct *p)
> > -{}
> > +{
> > +	unsigned long flags;
> > +	struct swqueue *swqueue;
> > +
> > +	if (!list_empty(&p->swqueue_node)) {
> > +		swqueue = rq_swqueue(task_rq(p));
> > +		spin_lock_irqsave(&swqueue->lock, flags);
> > +		list_del_init(&p->swqueue_node);
> > +		spin_unlock_irqrestore(&swqueue->lock, flags);
> > +	}
> > +}
> >  
> >  /*
> >   * For asym packing, by default the lower numbered CPU has higher priority.
> 
> *sigh*... pretty much all, if not all of this is called with rq->lock
> held. So why the irqsave and big fat fail for using spinlock :-(

Hi Peter,

Thanks for the quick review. Yeah good call about the irq's -- looks
like we're holding an rq lock on all swqueue paths so we can just use a
raw spinlock. I'll make that change for v2.

Regarding the per-swqueue spinlock being a potential bottleneck, I'll
reply to Aaron's thread on [0] with some numbers I collected locally on
a 26 core / 52 thread Cooperlake host, and a 20/40 x 2 Skylake. The
TL;DR is that I'm not observing the spinlock be contended on either
netperf or kernel compile workloads, with swqueue actually performing
~1 - 2% better than non-swqueue on both of these hosts for netperf.

[0]: https://lore.kernel.org/all/20230614043529.GA1942@ziqianlu-dell/

Thanks,
David