lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 Nov 2015 13:17:17 +1100
From:	Alexey Kardashevskiy <aik@...abs.ru>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Paul Mackerras <paulus@...ba.org>,
	David Gibson <david@...son.dropbear.id.au>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH kernel] rcu: Define lockless version of
 list_for_each_entry_rcu

On 11/04/2015 01:39 AM, Steven Rostedt wrote:
> On Tue,  3 Nov 2015 17:57:05 +1100
> Alexey Kardashevskiy <aik@...abs.ru> wrote:
>
>> This defines list_for_each_entry_lockless. This allows safe list
>> traversing in cases when lockdep() invocation is unwanted like
>> real mode (MMU is off).
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@...abs.ru>
>> ---
>>
>> This is for VFIO acceleration in POWERKVM for pSeries guests.
>> There is a KVM instance. There also can be some VFIO (PCI passthru)
>> devices attached to a KVM guest.
>>
>> To perform DMA, a pSeries guest registers DMA memory by calling some
>> hypercalls explicitely at the rate close to one-two hcalls per
>> a network packet, i.e. very often. When a guest does a hypercall
>> (which is just an assembly instruction), the host kernel receives it
>> in the real mode (MMU is off). When real mode fails to handle it,
>> it enables MMU and tries handling a hcall in virtual mode.
>>
>> A logical bus ID (LIOBN) is a tagret id for these hypecalls.
>>
>> Each VFIO device belongs to an IOMMU group. Each group has an address
>> translation table. It is allowed to have multiple IOMMU groups (i.e.
>> multiple tables) under the same LIOBN.
>>
>> So effectively every DMA hcall has to update one or more TCE tables
>> attached to the same LIOBN. RCU is used to update/traverse this list
>> safely.
>>
>> Using RCU as is in virtual mode is fine. Lockdep works, etc.
>> list_add_rcu() is used to populate the list;
>> list_del_rcu() + call_rcu() used to remove groups from a list.
>> These operations can happen in runtim as a result of PCI hotplug/unplug
>> in guests.
>>
>> Using RCU as is in real mode is not fine as some RCU checks can lock up
>> the system and in real mode we won't even have a chance to see any
>> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.
>>
>> Previous version of this used to define list_for_each_entry_rcu_notrace()
>> but it was proposed to use list_entry_lockless() instead. However
>> the comment for lockless_dereference() suggests this is a good idea
>> if "lifetime is managed by something other than RCU" but it is in my case.
>>
>> So what would be the correct approach here? Thanks.
>
> If the only use case for this so far is in POWERKVM, perhaps it should
> be defined specifically (and in arch/powerpc) and not confuse others
> about using this.
>
> Or, if you do imagine that this can be used in other scenarios, then a
> much deeper comment must be made in the code in the kerneldoc section.
> list_for_each_entry_rcu() should really be used in 99.99% of the time
> in the kernel. This looks to be an extreme exception. I hate to add a
> generic helper for something that will only be used in one location.


No, I cannot imagine this really and I can move it somewhere in arch/powerpc.

Still, is my approach correct? What does the comment for 
lockless_dereference() actally mean - it won't work together with RCU at 
all or this is to force people not to use it as "list_for_each_entry_rcu() 
should really be used in 99.99% of the time"? :)



>
> -- Steve
>
>
>> ---
>>   include/linux/rculist.h | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index 17c6b1f..a83a924 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
>>   		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>
>>   /**
>> + * list_for_each_entry_lockless - iterate over rcu list of given type
>> + * @pos:	the type * to use as a loop cursor.
>> + * @head:	the head for your list.
>> + * @member:	the name of the list_struct within the struct.
>> + *
>> + * This list-traversal primitive may safely run concurrently
>> + */
>> +#define list_entry_lockless(ptr, type, member) \
>> +	container_of((typeof(ptr))lockless_dereference(ptr), type, member)
>> +
>> +#define list_for_each_entry_lockless(pos, head, member) \
>> +	for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
>> +		&pos->member != (head); \
>> +		pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
>> +
>> +/**
>>    * list_for_each_entry_continue_rcu - continue iteration over list of given type
>>    * @pos:	the type * to use as a loop cursor.
>>    * @head:	the head for your list.
>


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists