linux-kernel - Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51E4C011.4060803@linux.vnet.ibm.com>
Date:	Tue, 16 Jul 2013 09:07:53 +0530
From:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To:	Gleb Natapov <gleb@...hat.com>
CC:	mingo@...hat.com, jeremy@...p.org, x86@...nel.org,
	konrad.wilk@...cle.com, hpa@...or.com, pbonzini@...hat.com,
	linux-doc@...r.kernel.org, habanero@...ux.vnet.ibm.com,
	xen-devel@...ts.xensource.com, peterz@...radead.org,
	mtosatti@...hat.com, stefano.stabellini@...citrix.com,
	andi@...stfloor.org, attilio.rao@...rix.com, ouyang@...pitt.edu,
	gregkh@...e.de, agraf@...e.de, chegu_vinod@...com,
	torvalds@...ux-foundation.org, avi.kivity@...il.com,
	tglx@...utronix.de, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, stephan.diestelhorst@....com,
	riel@...hat.com, drjones@...hat.com,
	virtualization@...ts.linux-foundation.org,
	srivatsa.vaddagiri@...il.com
Subject: Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for
 linux guests running on KVM hypervisor

On 07/15/2013 04:06 PM, Gleb Natapov wrote:
> On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
>> On 07/14/2013 06:42 PM, Gleb Natapov wrote:
>>> On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
>>>> kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
>>>>
>>>> From: Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>
>>>>
>> trimming
>> [...]
>>>> +
>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>>>> +{
>>>> +	struct kvm_lock_waiting *w;
>>>> +	int cpu;
>>>> +	u64 start;
>>>> +	unsigned long flags;
>>>> +
>>>> +	w = &__get_cpu_var(lock_waiting);
>>>> +	cpu = smp_processor_id();
>>>> +	start = spin_time_start();
>>>> +
>>>> +	/*
>>>> +	 * Make sure an interrupt handler can't upset things in a
>>>> +	 * partially setup state.
>>>> +	 */
>>>> +	local_irq_save(flags);
>>>> +
>>>> +	/*
>>>> +	 * The ordering protocol on this is that the "lock" pointer
>>>> +	 * may only be set non-NULL if the "want" ticket is correct.
>>>> +	 * If we're updating "want", we must first clear "lock".
>>>> +	 */
>>>> +	w->lock = NULL;
>>>> +	smp_wmb();
>>>> +	w->want = want;
>>>> +	smp_wmb();
>>>> +	w->lock = lock;
>>>> +
>>>> +	add_stats(TAKEN_SLOW, 1);
>>>> +
>>>> +	/*
>>>> +	 * This uses set_bit, which is atomic but we should not rely on its
>>>> +	 * reordering gurantees. So barrier is needed after this call.
>>>> +	 */
>>>> +	cpumask_set_cpu(cpu, &waiting_cpus);
>>>> +
>>>> +	barrier();
>>>> +
>>>> +	/*
>>>> +	 * Mark entry to slowpath before doing the pickup test to make
>>>> +	 * sure we don't deadlock with an unlocker.
>>>> +	 */
>>>> +	__ticket_enter_slowpath(lock);
>>>> +
>>>> +	/*
>>>> +	 * check again make sure it didn't become free while
>>>> +	 * we weren't looking.
>>>> +	 */
>>>> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
>>>> +		add_stats(TAKEN_SLOW_PICKUP, 1);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Allow interrupts while blocked */
>>>> +	local_irq_restore(flags);
>>>> +
>>> So what happens if an interrupt comes here and an interrupt handler
>>> takes another spinlock that goes into the slow path? As far as I see
>>> lock_waiting will become overwritten and cpu will be cleared from
>>> waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
>>> called here after returning from the interrupt handler nobody is going
>>> to wake this lock holder. Next random interrupt will "fix" it, but it
>>> may be several milliseconds away, or never. We should probably check
>>> if interrupt were enabled and call native_safe_halt() here.
>>>
>>
>> Okay you mean something like below should be done.
>> if irq_enabled()
>>    native_safe_halt()
>> else
>>    halt()
>>
>> It is been a complex stuff for analysis for me.
>>
>> So in our discussion stack would looking like this.
>>
>> spinlock()
>>    kvm_lock_spinning()
>>                    <------ interrupt here
>>            halt()
>>
>>
>>  From the halt if we trace
>>
> It is to early to trace the halt since it was not executed yet. Guest
> stack trace will look something like this:
>
> spinlock(a)
>    kvm_lock_spinning(a)
>     lock_waiting = a
>     set bit in waiting_cpus
>                  <------ interrupt here
>                  spinlock(b)
>                    kvm_lock_spinning(b)
>                      lock_waiting = b
>                      set bit in waiting_cpus
>                      halt()
>                      unset bit in waiting_cpus
>                      lock_waiting = NULL
>       ----------> ret from interrupt
>     halt()
>
> Now at the time of the last halt above lock_waiting == NULL and
> waiting_cpus is empty and not interrupt it pending, so who will unhalt
> the waiter?
>

Yes. if an interrupt occurs between
local_irq_restore() and halt(), this is possible. and since this is 
rarest of rare (possiility of irq entering slowpath and then no random 
irq to do spurious wakeup), we had never hit this problem in the past.

So I am,
1. trying to artificially reproduce this.

2. I replaced the halt with below code,
        if (arch_irqs_disabled())
                 halt();

and ran benchmarks.
But this results in degradation because, it means we again go back and 
spin in irq enabled case.

3. Now I am analyzing the performance overhead of safe_halt in irq 
enabled case.
       if (arch_irqs_disabled())
                halt();
       else
                safe_halt();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/