lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131011053831.GG15954@redhat.com>
Date:	Fri, 11 Oct 2013 08:38:31 +0300
From:	Gleb Natapov <gleb@...hat.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Xiao Guangrong <xiaoguangrong.eric@...il.com>,
	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>,
	avi.kivity@...il.com, pbonzini@...hat.com,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page
 table out of vcpu thread

On Thu, Oct 10, 2013 at 06:03:01PM -0300, Marcelo Tosatti wrote:
> On Thu, Oct 10, 2013 at 10:16:46PM +0300, Gleb Natapov wrote:
> > On Thu, Oct 10, 2013 at 01:42:22PM -0300, Marcelo Tosatti wrote:
> > > On Thu, Oct 10, 2013 at 03:08:45PM +0300, Gleb Natapov wrote:
> > > > On Wed, Oct 09, 2013 at 10:47:10PM -0300, Marcelo Tosatti wrote:
> > > > > > >> Gleb has a idea that uses RCU_DESTORY to protect the shadow page table
> > > > > > >> and encodes the page-level into the spte (since we need to check if the spte
> > > > > > >> is the last-spte. ).  How about this?
> > > > > > > 
> > > > > > > Pointer please? Why is DESTROY_SLAB_RCU any safer than call_rcu with
> > > > > > > regards to limitation? (maybe it is).
> > > > > > 
> > > > > > For my experience, freeing shadow page and allocing shadow page are balanced,
> > > > > > we can check it by (make -j12 on a guest with 4 vcpus and):
> > > > > > 
> > > > > > # echo > trace
> > > > > > [root@...c-desktop tracing]# cat trace > ~/log | sleep 3
> > > > > > [root@...c-desktop tracing]# cat ~/log | grep new | wc -l
> > > > > > 10816
> > > > > > [root@...c-desktop tracing]# cat ~/log | grep prepare | wc -l
> > > > > > 10656
> > > > > > [root@...c-desktop tracing]# cat set_event
> > > > > > kvmmmu:kvm_mmu_get_page
> > > > > > kvmmmu:kvm_mmu_prepare_zap_page
> > > > > > 
> > > > > > alloc VS. free = 10816 : 10656
> > > > > > 
> > > > > > So that, mostly all allocing and freeing are done in the slab's
> > > > > > cache and the slab frees shdadow pages very slowly, there is no rcu issue.
> > > > > 
> > > > > A more detailed test case would be:
> > > > > 
> > > > > - cpu0-vcpu0 releasing pages as fast as possible
> > > > > - cpu1 executing get_dirty_log
> > > > > 
> > > > > Think of a very large guest.
> > > > > 
> > > > The number of shadow pages allocated from slab will be bounded by
> > > > n_max_mmu_pages, 
> > > 
> > > Correct, but that limit is not suitable (maximum number of mmu pages
> > > should be larger than number of mmu pages freeable in a rcu grace
> > > period).
> > > 
> > I am not sure I understand what you mean here. What I was sating is that if
> > we change code to allocate sp->spt from slab, this slab will never have
> > more then n_max_mmu_pages objects in it.
> 
> n_max_mmu_pages is not a suitable limit to throttle freeing of pages via
> RCU (its too large). If the free memory watermarks are smaller than 
> n_max_mmu_pages for all guests, OOM is possible.
> 
Ah, yes. I am not saying n_max_mmu_pages will throttle RCU, just saying
that slab size will be bound, so hopefully shrinker will touch it
rarely.

> > > > and, in addition, page released to slab is immediately
> > > > available for allocation, no need to wait for grace period. 
> > > 
> > > See SLAB_DESTROY_BY_RCU comment at include/linux/slab.h.
> > > 
> > This comment is exactly what I was referring to in the code you quoted. Do
> > you see anything problematic in what comment describes?
> 
> "This delays freeing the SLAB page by a grace period, it does _NOT_
> delay object freeing." The page is not available for allocation.
By "page" I mean "spt page" which is a slab object. So "spt page"
AKA slab object will be available fo allocation immediately.
 
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ