linux-kernel - Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150807105738.GF16853@twins.programming.kicks-ass.net>
Date:	Fri, 7 Aug 2015 12:57:38 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Paul Mackerras <paulus@...ba.org>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	Luiz Capitulino <lcapitulino@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Steven Rostedt <srostedt@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
	Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq

On Wed, Feb 25, 2015 at 10:02:50PM +0100, Sebastian Andrzej Siewior wrote:

> >+static inline int swait_active(struct swait_queue_head *q)
> >+{
> >+	return !list_empty(&q->task_list);
> 
> In RT there was a smp_mb() which you dropped and I assume you had
> reasons for it.

Yeah, RT didn't have a reason for the smp_mb() -- any barrier without a
comment is a bug :-)

Also waitqueue_active(), its counterpart, does not have a barrier there
either.

Nor did I see any reason for that mb to be there.

> I assumed that one can perform list_empty_careful()
> without a lock if the items were removed with list_del_init(). But since
> nothing in -RT blow up so far I guess this here is legal, too :)

Nobody will come and arrest us for software bugs -- yet ;-)

> >+/*
> >+ * The thing about the wake_up_state() return value; I think we can ignore it.
> >+ *
> >+ * If for some reason it would return 0, that means the previously waiting
> >+ * task is already running, so it will observe condition true (or has already).
> >+ */
> >+void swake_up_locked(struct swait_queue_head *q)
> >+{
> >+	struct swait_queue *curr;
> >+
> >+	list_for_each_entry(curr, &q->task_list, task_list) {
> >+		wake_up_process(curr->task);
> 
> okay. So since we limit everything to TASK_NORMAL which has to sleep
> while on the list there is no need to check if we actually woken up
> someone.

Partly that, also that I don't see how that return value is meaningful
in the first place.

If it were to return false, the task was/is already running and it will
observe whatever condition we just satisfied to allow waking things up.

So either way around, we'll get (at least) 1 task running.

> >+		list_del_init(&curr->task_list);
> >+		break;
> >+	}
> >+}
> >+EXPORT_SYMBOL(swake_up_locked);



> >+void swake_up(struct swait_queue_head *q)
> >+{
> >+	unsigned long flags;
> >+
> >+	if (!swait_active(q))
> >+		return;
> >+
> >+	raw_spin_lock_irqsave(&q->lock, flags);
> >+	__swake_up_locked(q);
> 
> I thing this should have been swake_up_locked() instead since
> __swake_up_locked() isn't part of this patch.
> 
> Just a nitpick: later there is __prepare_to_swait() and __finish_swait()
> which have the __ prefix instead a _locked suffix. Not sure what is
> better for a better for a public API but maybe one way would be good.

Yeah, I suppose that's true ;-)

> >+	raw_spin_unlock_irqrestore(&q->lock, flags);
> >+}
> >+EXPORT_SYMBOL(swake_up);
> >+
> >+/*
> >+ * Does not allow usage from IRQ disabled, since we must be able to
> >+ * release IRQs to guarantee bounded hold time.
> >+ */
> >+void swake_up_all(struct swait_queue_head *q)
> >+{
> >+	struct swait_queue *curr, *next;
> >+	LIST_HEAD(tmp);
> 
> WARN_ON(irqs_disabled()) ?

Lockdep should already catch that by virtue of using unconditional _irq
spinlock primitives.

> >+	if (!swait_active(q))
> >+		return;
> >+
> >+	raw_spin_lock_irq(&q->lock);
> >+	list_splice_init(&q->task_list, &tmp);
> >+	while (!list_empty(&tmp)) {
> >+		curr = list_first_entry(&tmp, typeof(curr), task_list);
> >+
> >+		wake_up_state(curr->task, state);
> >+		list_del_init(&curr->task_list);
> 
> So because the task may timeout and remove itself from the list at
> anytime you need to hold the lock during wakeup and the removal from the
> list

Indeed.

> >+
> >+		if (list_empty(&tmp))
> >+			break;
> >+
> >+		raw_spin_unlock_irq(&q->lock);
> 
> and you drop the lock after each iteration in case there is an IRQ 
> pending or the task, that has been just woken up, has a higher priority
> than the current task and needs to get on the CPU.

> Not sure if this case matters:
> - _this_ task (wake_all) prio 120
> - first task in queue prio 10, RR
> - second task in queue prio 9, RR

Why complicate things? Better to not assume anything and just do the
simple correct thing.

> the *old* behavior would put the second task before the first task on
> CPU. The *new* behaviour puts the first task on the CPU after dropping
> the lock. The second task (that has a higher priority but nobody knows)
> has to wait until the first one is done (and anything else that might
> been woken up in the meantime with a higher prio than 120).

Irrelevant, we _must_ drop the lock in order to maintain bounded
behaviour.

> >+		raw_spin_lock_irq(&q->lock);
> >+	}
> >+	raw_spin_unlock_irq(&q->lock);
> >+}
> >+EXPORT_SYMBOL(swake_up_all);

> >+void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
> this one has no users the __ suggests that it is locked edition. Maybe
> it is for the completions…

Yeah, who knows, I certainly do not anymore ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/