linux-kernel - Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <522437EF.8040002@linux.vnet.ibm.com>
Date:	Mon, 02 Sep 2013 15:02:07 +0800
From:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
To:	Gleb Natapov <gleb@...hat.com>
CC:	avi.kivity@...il.com, mtosatti@...hat.com, pbonzini@...hat.com,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

On 08/30/2013 07:38 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 07:26:40PM +0800, Xiao Guangrong wrote:
>> On 08/29/2013 05:51 PM, Gleb Natapov wrote:
>>> On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote:
>>>>> As Documentation/RCU/whatisRCU.txt says:
>>>>>
>>>>>         As with rcu_assign_pointer(), an important function of
>>>>>         rcu_dereference() is to document which pointers are protected by
>>>>>         RCU, in particular, flagging a pointer that is subject to changing
>>>>>         at any time, including immediately after the rcu_dereference().
>>>>>         And, again like rcu_assign_pointer(), rcu_dereference() is
>>>>>         typically used indirectly, via the _rcu list-manipulation
>>>>>         primitives, such as list_for_each_entry_rcu().
>>>>>
>>>>> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
>>>>> important. The code is complicated, so self documentation will not hurt.
>>>>> I want to see what is actually protected by rcu here. Freeing shadow
>>>>> pages with call_rcu() further complicates matters: does it mean that
>>>>> shadow pages are also protected by rcu? 
>>>>
>>>> Yes, it stops shadow page to be freed when we do write-protection on
>>>> it.
>>>>
>>> Yeah, I got the trick, what I am saying that we have a data structure
>>> here protected by RCU, but we do not use RCU functions to access it...
>>
>> Yes, they are not used when insert a spte into rmap and get the rmap from
>> the entry... but do we need to use these functions to guarantee the order?
>>
>> The worst case is, we fetch the spte from the desc but the spte is not
>> updated yet, we can happily skip this spte since it will set the
>> dirty-bitmap later, this is guaranteed by the barrier between mmu_spte_update()
>> and mark_page_dirty(), the code is:
>>
>> set_spte():
>>
>> 	if (mmu_spte_update(sptep, spte))
>> 		kvm_flush_remote_tlbs(vcpu->kvm);
>>
>> 	if (!remap) {
>> 		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
>> 			rmap_recycle(vcpu, sptep, gfn);
>>
>> 		if (level > PT_PAGE_TABLE_LEVEL)
>> 			++vcpu->kvm->stat.lpages;
>> 	}
>>
>> 	smp_wmb();
>>
>> 	if (pte_access & ACC_WRITE_MASK)
>> 		mark_page_dirty(vcpu->kvm, gfn);
>>
>> So, i guess if we can guaranteed the order by ourself, we do not need
>> to call the rcu functions explicitly...
>>
>> But, the memory barres in the rcu functions are really light on x86 (store
>> can not be reordered with store), so i do not mind to explicitly use them
>> if you think this way is more safe. :)
>>
> I think the self documentation aspect of using rcu function is also
> important.

Okay. I will use these rcu functions and measure them to see whether it'll
cause performance issue.

> 
>>> BTW why not allocate sp->spt from SLAB_DESTROY_BY_RCU cache too? We may
>>> switch write protection on a random spt occasionally if page is deleted
>>> and reused for another spt though. For last level spt it should not be a
>>> problem and for non last level we have is_last_spte() check in
>>> __rmap_write_protect_lockless(). Can it work?
>>
>> Yes, i also considered this way. It can work if we handle is_last_spte()
>> properly. Since the sp->spte can be reused, we can not get the mapping
>> level from sp. We need to encode the mapping level into spte so that
>> cmpxhg can understand if the page table has been moved to another mapping
>> level.
> Isn't one bit that says that spte is the last one enough? IIRC we
> have one more ignored bit to spare in spte.

Right. But i also want to use this way in fast_page_fault where mapping-level
is needed.

> 
>>         Could you allow me to make this optimization separately after this
>> patchset be merged?
>>
> If you think it will complicate the initial version I am fine with
> postponing it for later.

Thank you, Gleb!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/