linux-kernel - Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-less get_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20120306234317.2817d2071038d11ab3831c82@gmail.com>
Date:	Tue, 6 Mar 2012 23:43:17 +0900
From:	Takuya Yoshikawa <takuya.yoshikawa@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Takuya Yoshikawa <yoshikawa.takuya@....ntt.co.jp>, avi@...hat.com,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-less
 get_dirty_log()

Marcelo Tosatti <mtosatti@...hat.com> wrote:

> > +	spin_lock(&kvm->mmu_lock);
> 
> It is not clear why mmu_lock is needed. Dropping it across the xchg loop
> should be similar to srcu implementation, in that concurrent updates
> will be visible only on the next get_dirty call? Well, it is necessary
> anyway for write protecting the sptes.

My implementation does write protection inside the xchg loop.
Then, after that loop, flushes TLB.

mmu_lock must protect both of these together.

If we do not mind scanning the bitmap twice, we can decouple the
xchg loop and write protection, but it will be a bit slower, and in
any case we need to hold mmu_lock until TLB is flushed.

As can be seen from the unit-test result the majority of time
is being spent on write protecting sptes, so decoupling xchg loop
alone will not alleviate the problem so much -- my guess.

> A cond_resched_lock() would alleviate the potentially long held 
> times for mmu_lock (can you measure it with large memslots?)

How to move TLB flush out of mmu_lock critical sections was discussed
before, and there seemed to be some proposals.

Anyone is working on that?

After that we can do many things.

One idea is to make the extra bitmap buffer size shrink to one page
or so and do xchg and write protection loop by that limited size.

Because we can drop mmu_lock, it is possible to copy_to_user part of
the dirty bitmap, and then go to the next part.

After everything is protected, we can then do TLB flush after dropping
mmu_lock.

> Otherwise looks nice.

Thanks,
	Takuya

> > -		r = -ENOMEM;
> > -		slots = kmemdup(kvm->memslots, sizeof(*kvm->memslots), GFP_KERNEL);
> > -		if (!slots)
> > -			goto out;
> > +	for (i = 0; i < n / sizeof(long); i++) {
> > +		unsigned long mask;
> > +		gfn_t offset;
> >  
> > -		memslot = id_to_memslot(slots, log->slot);
> > -		memslot->nr_dirty_pages = 0;
> > -		memslot->dirty_bitmap = dirty_bitmap_head;
> > -		update_memslots(slots, NULL);
> > +		if (!dirty_bitmap[i])
> > +			continue;
> >  
> > -		old_slots = kvm->memslots;
> > -		rcu_assign_pointer(kvm->memslots, slots);
> > -		synchronize_srcu_expedited(&kvm->srcu);
> > -		kfree(old_slots);
> > +		is_dirty = true;
> >  
> > -		write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
> > +		mask = xchg(&dirty_bitmap[i], 0);
> > +		dirty_bitmap_buffer[i] = mask;
> >  
> > -		r = -EFAULT;
> > -		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
> > -			goto out;
> > -	} else {
> > -		r = -EFAULT;
> > -		if (clear_user(log->dirty_bitmap, n))
> > -			goto out;
> > +		offset = i * BITS_PER_LONG;
> > +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> >  	}
> > +	if (is_dirty)
> > +		kvm_flush_remote_tlbs(kvm);
> > +
> > +	spin_unlock(&kvm->mmu_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/