linux-kernel - Re: [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5457F0F3.5080008@hp.com>
Date:	Mon, 03 Nov 2014 16:17:39 -0500
From:	Waiman Long <waiman.long@...com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, linux-arch@...r.kernel.org,
	x86@...nel.org, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org,
	xen-devel@...ts.xenproject.org, kvm@...r.kernel.org,
	Paolo Bonzini <paolo.bonzini@...il.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	David Vrabel <david.vrabel@...rix.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Scott J Norton <scott.norton@...com>,
	Douglas Hatch <doug.hatch@...com>
Subject: Re: [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support

On 11/03/2014 05:35 AM, Peter Zijlstra wrote:
> On Wed, Oct 29, 2014 at 04:19:09PM -0400, Waiman Long wrote:
>>   arch/x86/include/asm/pvqspinlock.h    |  411 +++++++++++++++++++++++++++++++++
> I do wonder why all this needs to live in x86..
>

I haven't looked into the para-virtualization code in other 
architectures to see if my PV code is equally applicable there. That is 
why I put it under the x86 directory. If other architectures decide to 
use qspinlock with paravirtualization, we may need to pull out some 
common code, if any, back to kernel/locking.

>>
>> +#ifdef CONFIG_QUEUE_SPINLOCK
>> +
>> +static __always_inline void pv_kick_cpu(int cpu)
>> +{
>> +	PVOP_VCALLEE1(pv_lock_ops.kick_cpu, cpu);
>> +}
>> +
>> +static __always_inline void pv_lockwait(u8 *lockbyte)
>> +{
>> +	PVOP_VCALLEE1(pv_lock_ops.lockwait, lockbyte);
>> +}
>> +
>> +static __always_inline void pv_lockstat(enum pv_lock_stats type)
>> +{
>> +	PVOP_VCALLEE1(pv_lock_ops.lockstat, type);
>> +}
> Why are any of these PV ops? they're only called from other pv_*()
> functions. What's the point of pv ops you only call from pv code?

It is the same reason that you made them PV ops in your patch. Even when 
PV is on, the code won't need to call any of the PV ops most of the 
time. So it is just a matter of optimizing the most common case at the 
expense of performance in the rare case that the CPU need to be halt and 
woken up which will be bad performance-wise anyway However, if you think 
they should be regular function pointers, I am fine with that too.

>> +/*
>> + *	Queue Spinlock Para-Virtualization (PV) Support
>> + *
>> + * The PV support code for queue spinlock is roughly the same as that
>> + * of the ticket spinlock.
> Relative comments are bad, esp. since we'll make the ticket code go away
> if this works, at which point this is a reference into a black hole.

Thank for the suggestion, I will remove that when I need to revise the 
patch.

>>                              Each CPU waiting for the lock will spin until it
>> + * reaches a threshold. When that happens, it will put itself to a halt state
>> + * so that the hypervisor can reuse the CPU cycles in some other guests as
>> + * well as returning other hold-up CPUs faster.
>
>
>
>> +/**
>> + * queue_spin_lock - acquire a queue spinlock
>> + * @lock: Pointer to queue spinlock structure
>> + *
>> + * N.B. INLINE_SPIN_LOCK should not be enabled when PARAVIRT_SPINLOCK is on.
> One should write a compile time fail for that, not a comment.

Will do that.

>> + */
>> +static __always_inline void queue_spin_lock(struct qspinlock *lock)
>> +{
>> +	u32 val;
>> +
>> +	val = atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL);
>> +	if (likely(val == 0))
>> +		return;
>> +	if (static_key_false(&paravirt_spinlocks_enabled))
>> +		pv_queue_spin_lock_slowpath(lock, val);
>> +	else
>> +		queue_spin_lock_slowpath(lock, val);
>> +}
> No, this is just vile.. _that_ is what we have PV ops for. And at that
> point its the same function it was before the PV stuff, so that whole
> inline thing is then gone.

I did that because in all the architectures except s390, the lock 
functions are not inlined. They live in the _raw_spin_lock* defined in 
kernel/locking/spinlock.c. The unlock functions, on the other hand, are 
all inlined except when PV spinlock is enabled. So adding a check for 
the jump label won't change any of the status quo.

>> +extern void queue_spin_unlock_slowpath(struct qspinlock *lock);
>> +
>>   /**
>>    * queue_spin_unlock - release a queue spinlock
>>    * @lock : Pointer to queue spinlock structure
>>    *
>>    * An effective smp_store_release() on the least-significant byte.
>> + *
>> + * Inlining of the unlock function is disabled when CONFIG_PARAVIRT_SPINLOCKS
>> + * is defined. So _raw_spin_unlock() will be the only call site that will
>> + * have to be patched.
> again if you hard rely on the not inlining make a build fail not a
> comment.

Will do that.

>>    */
>>   static inline void queue_spin_unlock(struct qspinlock *lock)
>>   {
>>   	barrier();
>> +	if (!static_key_false(&paravirt_spinlocks_enabled)) {
>> +		native_spin_unlock(lock);
>> +		return;
>> +	}
>>
>> +	/*
>> +	 * Need to atomically clear the lock byte to avoid racing with
>> +	 * queue head waiter trying to set _QLOCK_LOCKED_SLOWPATH.
>> +	 */
>> +	if (unlikely(cmpxchg((u8 *)lock, _Q_LOCKED_VAL, 0) != _Q_LOCKED_VAL))
>> +		queue_spin_unlock_slowpath(lock);
>> +}
> Idem, that static key stuff is wrong, use PV ops to switch between
> unlock paths.

As said in my previous emails, the PV ops call site patching code 
doesn't work well with locking. First of all, the native_patch() 
function was called even in a KVM PV guest. Some unlock calls happened 
before the paravirt_spinlocks_enabled jump label was set up. It occurs 
to me that call site patching is done the first time the call site is 
used. At least for those early unlock calls, there is no way to figure 
out if it should use the native fast path or the PV slow path. The only 
possible workaround that I can think of is to use a variable (if 
available) that signal the end of the bootup init phase, we can then 
defer the call site patching until the init phase has passed.

This is a rather complicated solution which may not worth the slight 
benefit of a faster unlock call in the native case.

>> @@ -354,7 +394,7 @@ queue:
>>   	 * if there was a previous node; link it and wait until reaching the
>>   	 * head of the waitqueue.
>>   	 */
>> -	if (old&  _Q_TAIL_MASK) {
>> +	if (!pv_link_and_wait_node(old, node)&&  (old&  _Q_TAIL_MASK)) {
>>   		prev = decode_tail(old);
>>   		ACCESS_ONCE(prev->next) = node;
>> @@ -369,9 +409,11 @@ queue:
>>   	 *
>>   	 * *,x,y ->  *,0,0
>>   	 */
>> -	while ((val = smp_load_acquire(&lock->val.counter))&
>> -			_Q_LOCKED_PENDING_MASK)
>> +	val = pv_wait_head(lock, node);
>> +	while (val&  _Q_LOCKED_PENDING_MASK) {
>>   		cpu_relax();
>> +		val = smp_load_acquire(&lock->val.counter);
>> +	}
>>
>>   	/*
>>   	 * claim the lock:
> Please make the pv_*() calls return void and reduce to NOPs. This keeps
> the logic invariant of the pv stuff.

In my patch, the two pv_*() calls above serve as replacements of the 
waiting code. Making them return void and reduce to NOPs will cause what 
Konrad said doing the same operation twice which is not ideal from a 
performance point of view for the PV version. Is putting the pre-PV code 
in the comment help to clarify what the code should be before the PV stuff?

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/