[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <563C0DAD.8070100@ozlabs.ru>
Date: Fri, 6 Nov 2015 13:17:17 +1100
From: Alexey Kardashevskiy <aik@...abs.ru>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Paul Mackerras <paulus@...ba.org>,
David Gibson <david@...son.dropbear.id.au>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH kernel] rcu: Define lockless version of
list_for_each_entry_rcu
On 11/04/2015 01:39 AM, Steven Rostedt wrote:
> On Tue, 3 Nov 2015 17:57:05 +1100
> Alexey Kardashevskiy <aik@...abs.ru> wrote:
>
>> This defines list_for_each_entry_lockless. This allows safe list
>> traversing in cases when lockdep() invocation is unwanted like
>> real mode (MMU is off).
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@...abs.ru>
>> ---
>>
>> This is for VFIO acceleration in POWERKVM for pSeries guests.
>> There is a KVM instance. There also can be some VFIO (PCI passthru)
>> devices attached to a KVM guest.
>>
>> To perform DMA, a pSeries guest registers DMA memory by calling some
>> hypercalls explicitely at the rate close to one-two hcalls per
>> a network packet, i.e. very often. When a guest does a hypercall
>> (which is just an assembly instruction), the host kernel receives it
>> in the real mode (MMU is off). When real mode fails to handle it,
>> it enables MMU and tries handling a hcall in virtual mode.
>>
>> A logical bus ID (LIOBN) is a tagret id for these hypecalls.
>>
>> Each VFIO device belongs to an IOMMU group. Each group has an address
>> translation table. It is allowed to have multiple IOMMU groups (i.e.
>> multiple tables) under the same LIOBN.
>>
>> So effectively every DMA hcall has to update one or more TCE tables
>> attached to the same LIOBN. RCU is used to update/traverse this list
>> safely.
>>
>> Using RCU as is in virtual mode is fine. Lockdep works, etc.
>> list_add_rcu() is used to populate the list;
>> list_del_rcu() + call_rcu() used to remove groups from a list.
>> These operations can happen in runtim as a result of PCI hotplug/unplug
>> in guests.
>>
>> Using RCU as is in real mode is not fine as some RCU checks can lock up
>> the system and in real mode we won't even have a chance to see any
>> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.
>>
>> Previous version of this used to define list_for_each_entry_rcu_notrace()
>> but it was proposed to use list_entry_lockless() instead. However
>> the comment for lockless_dereference() suggests this is a good idea
>> if "lifetime is managed by something other than RCU" but it is in my case.
>>
>> So what would be the correct approach here? Thanks.
>
> If the only use case for this so far is in POWERKVM, perhaps it should
> be defined specifically (and in arch/powerpc) and not confuse others
> about using this.
>
> Or, if you do imagine that this can be used in other scenarios, then a
> much deeper comment must be made in the code in the kerneldoc section.
> list_for_each_entry_rcu() should really be used in 99.99% of the time
> in the kernel. This looks to be an extreme exception. I hate to add a
> generic helper for something that will only be used in one location.
No, I cannot imagine this really and I can move it somewhere in arch/powerpc.
Still, is my approach correct? What does the comment for
lockless_dereference() actally mean - it won't work together with RCU at
all or this is to force people not to use it as "list_for_each_entry_rcu()
should really be used in 99.99% of the time"? :)
>
> -- Steve
>
>
>> ---
>> include/linux/rculist.h | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index 17c6b1f..a83a924 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
>> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>
>> /**
>> + * list_for_each_entry_lockless - iterate over rcu list of given type
>> + * @pos: the type * to use as a loop cursor.
>> + * @head: the head for your list.
>> + * @member: the name of the list_struct within the struct.
>> + *
>> + * This list-traversal primitive may safely run concurrently
>> + */
>> +#define list_entry_lockless(ptr, type, member) \
>> + container_of((typeof(ptr))lockless_dereference(ptr), type, member)
>> +
>> +#define list_for_each_entry_lockless(pos, head, member) \
>> + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
>> + &pos->member != (head); \
>> + pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
>> +
>> +/**
>> * list_for_each_entry_continue_rcu - continue iteration over list of given type
>> * @pos: the type * to use as a loop cursor.
>> * @head: the head for your list.
>
--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists