lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <521F14FE.3070900@linux.vnet.ibm.com>
Date:	Thu, 29 Aug 2013 17:31:42 +0800
From:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
To:	Gleb Natapov <gleb@...hat.com>
CC:	avi.kivity@...il.com, mtosatti@...hat.com, pbonzini@...hat.com,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

On 08/29/2013 05:08 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote:
>>>>> BTW I do not see
>>>>> rcu_assign_pointer()/rcu_dereference() in your patches which hints on
>>>>
>>>> IIUC, We can not directly use rcu_assign_pointer(), that is something like:
>>>> p = v to assign a pointer to a pointer. But in our case, we need:
>>>>    *pte_list = (unsigned long)desc | 1;
>>> >From Documentation/RCU/whatisRCU.txt:
>>>
>>> The updater uses this function to assign a new value to an RCU-protected pointer.
>>>
>>> This is what we do, no? (assuming slot->arch.rmap[] is what rcu protects here)
>>> The fact that the value is not correct pointer should not matter.
>>>
>>
>> Okay. Will change that code to:
>>
>> +
>> +#define rcu_assign_head_desc(pte_list_p, value)        \
>> +       rcu_assign_pointer(*(unsigned long __rcu **)(pte_list_p), (unsigned long *)(value))
>> +
>>  /*
>>   * Pte mapping structures:
>>   *
>> @@ -1006,14 +1010,7 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
>>                 desc->sptes[1] = spte;
>>                 desc_mark_nulls(pte_list, desc);
>>
>> -               /*
>> -                * Esure the old spte has been updated into desc, so
>> -                * that the another side can not get the desc from pte_list
>> -                * but miss the old spte.
>> -                */
>> -               smp_wmb();
>> -
>> -               *pte_list = (unsigned long)desc | 1;
>> +               rcu_assign_head_desc(pte_list, (unsigned long)desc | 1);
>>
>>>>
>>>> So i add the smp_wmb() by myself:
>>>> 		/*
>>>> 		 * Esure the old spte has been updated into desc, so
>>>> 		 * that the another side can not get the desc from pte_list
>>>> 		 * but miss the old spte.
>>>> 		 */
>>>> 		smp_wmb();
>>>>
>>>> 		*pte_list = (unsigned long)desc | 1;
>>>>
>>>> But i missed it when inserting a empty desc, in that case, we need the barrier
>>>> too since we should make desc->more visible before assign it to pte_list to
>>>> avoid the lookup side seeing the invalid "nulls".
>>>>
>>>> I also use own code instead of rcu_dereference():
>>>> pte_list_walk_lockless():
>>>> 	pte_list_value = ACCESS_ONCE(*pte_list);
>>>> 	if (!pte_list_value)
>>>> 		return;
>>>>
>>>> 	if (!(pte_list_value & 1))
>>>> 		return fn((u64 *)pte_list_value);
>>>>
>>>> 	/*
>>>> 	 * fetch pte_list before read sptes in the desc, see the comments
>>>> 	 * in pte_list_add().
>>>> 	 *
>>>> 	 * There is the data dependence since the desc is got from pte_list.
>>>> 	 */
>>>> 	smp_read_barrier_depends();
>>>>
>>>> That part can be replaced by rcu_dereference().
>>>>
>>> Yes please, also see commit c87a124a5d5e8cf8e21c4363c3372bcaf53ea190 for
>>> kind of scary bugs we can get here.
>>
>> Right, it is likely trigger-able in our case, will fix it.
>>
>>>
>>>>> incorrect usage of RCU. I think any access to slab pointers will need to
>>>>> use those.
>>>>
>>>> Remove desc is not necessary i think since we do not mind to see the old
>>>> info. (hlist_nulls_del_rcu() does not use rcu_dereference() too)
>>>>
>>> May be a bug. I also noticed that rculist_nulls uses rcu_dereference()
>>
>> But list_del_rcu() does not use rcu_assign_pointer() too.
>>
> This also suspicious.
> 
>>> to access ->next, but it does not use rcu_assign_pointer() pointer to
>>> assign it.
>>
>> You mean rcu_dereference() is used in hlist_nulls_for_each_entry_rcu()? I think
>> it's because we should validate the prefetched data before entry->next is
>> accessed, it is paired with the barrier in rcu_assign_pointer() when add a
>> new entry into the list. rcu_assign_pointer() make other fields in the entry
>> be visible before linking entry to the list. Otherwise, the lookup can access
>> that entry but get the invalid fields.
>>
>> After more thinking, I still think rcu_assign_pointer() is unneeded when a entry
>> is removed. The remove-API does not care the order between unlink the entry and
>> the changes to its fields. It is the caller's responsibility:
>> - in the case of rcuhlist, the caller uses call_rcu()/synchronize_rcu(), etc to
>>   enforce all lookups exit and the later change on that entry is invisible to the
>>   lookups.
>>
>> - In the case of rculist_nulls, it seems refcounter is used to guarantee the order
>>   (see the example from Documentation/RCU/rculist_nulls.txt).
>>
>> - In our case, we allow the lookup to see the deleted desc even if it is in slab cache
>>   or its is initialized or it is re-added.
>>
>> Your thought?
>>
> 
> As Documentation/RCU/whatisRCU.txt says:
> 
>         As with rcu_assign_pointer(), an important function of
>         rcu_dereference() is to document which pointers are protected by
>         RCU, in particular, flagging a pointer that is subject to changing
>         at any time, including immediately after the rcu_dereference().
>         And, again like rcu_assign_pointer(), rcu_dereference() is
>         typically used indirectly, via the _rcu list-manipulation
>         primitives, such as list_for_each_entry_rcu().
> 
> The documentation aspect of rcu_assign_pointer()/rcu_dereference() is
> important. The code is complicated, so self documentation will not hurt.
> I want to see what is actually protected by rcu here. Freeing shadow
> pages with call_rcu() further complicates matters: does it mean that
> shadow pages are also protected by rcu? 

Yes, it stops shadow page to be freed when we do write-protection on
it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ