lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120306234317.2817d2071038d11ab3831c82@gmail.com>
Date:	Tue, 6 Mar 2012 23:43:17 +0900
From:	Takuya Yoshikawa <takuya.yoshikawa@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Takuya Yoshikawa <yoshikawa.takuya@....ntt.co.jp>, avi@...hat.com,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-less
 get_dirty_log()

Marcelo Tosatti <mtosatti@...hat.com> wrote:

> > +	spin_lock(&kvm->mmu_lock);
> 
> It is not clear why mmu_lock is needed. Dropping it across the xchg loop
> should be similar to srcu implementation, in that concurrent updates
> will be visible only on the next get_dirty call? Well, it is necessary
> anyway for write protecting the sptes.

My implementation does write protection inside the xchg loop.
Then, after that loop, flushes TLB.

mmu_lock must protect both of these together.

If we do not mind scanning the bitmap twice, we can decouple the
xchg loop and write protection, but it will be a bit slower, and in
any case we need to hold mmu_lock until TLB is flushed.

As can be seen from the unit-test result the majority of time
is being spent on write protecting sptes, so decoupling xchg loop
alone will not alleviate the problem so much -- my guess.

> A cond_resched_lock() would alleviate the potentially long held 
> times for mmu_lock (can you measure it with large memslots?)

How to move TLB flush out of mmu_lock critical sections was discussed
before, and there seemed to be some proposals.

Anyone is working on that?

After that we can do many things.

One idea is to make the extra bitmap buffer size shrink to one page
or so and do xchg and write protection loop by that limited size.

Because we can drop mmu_lock, it is possible to copy_to_user part of
the dirty bitmap, and then go to the next part.

After everything is protected, we can then do TLB flush after dropping
mmu_lock.

> Otherwise looks nice.

Thanks,
	Takuya


> > -		r = -ENOMEM;
> > -		slots = kmemdup(kvm->memslots, sizeof(*kvm->memslots), GFP_KERNEL);
> > -		if (!slots)
> > -			goto out;
> > +	for (i = 0; i < n / sizeof(long); i++) {
> > +		unsigned long mask;
> > +		gfn_t offset;
> >  
> > -		memslot = id_to_memslot(slots, log->slot);
> > -		memslot->nr_dirty_pages = 0;
> > -		memslot->dirty_bitmap = dirty_bitmap_head;
> > -		update_memslots(slots, NULL);
> > +		if (!dirty_bitmap[i])
> > +			continue;
> >  
> > -		old_slots = kvm->memslots;
> > -		rcu_assign_pointer(kvm->memslots, slots);
> > -		synchronize_srcu_expedited(&kvm->srcu);
> > -		kfree(old_slots);
> > +		is_dirty = true;
> >  
> > -		write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
> > +		mask = xchg(&dirty_bitmap[i], 0);
> > +		dirty_bitmap_buffer[i] = mask;
> >  
> > -		r = -EFAULT;
> > -		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
> > -			goto out;
> > -	} else {
> > -		r = -EFAULT;
> > -		if (clear_user(log->dirty_bitmap, n))
> > -			goto out;
> > +		offset = i * BITS_PER_LONG;
> > +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> >  	}
> > +	if (is_dirty)
> > +		kvm_flush_remote_tlbs(kvm);
> > +
> > +	spin_unlock(&kvm->mmu_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ