lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151103093913.346374e2@gandalf.local.home>
Date:	Tue, 3 Nov 2015 09:39:13 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Alexey Kardashevskiy <aik@...abs.ru>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Paul Mackerras <paulus@...ba.org>,
	David Gibson <david@...son.dropbear.id.au>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH kernel] rcu: Define lockless version of
 list_for_each_entry_rcu

On Tue,  3 Nov 2015 17:57:05 +1100
Alexey Kardashevskiy <aik@...abs.ru> wrote:

> This defines list_for_each_entry_lockless. This allows safe list
> traversing in cases when lockdep() invocation is unwanted like
> real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@...abs.ru>
> ---
> 
> This is for VFIO acceleration in POWERKVM for pSeries guests.
> There is a KVM instance. There also can be some VFIO (PCI passthru)
> devices attached to a KVM guest.
> 
> To perform DMA, a pSeries guest registers DMA memory by calling some
> hypercalls explicitely at the rate close to one-two hcalls per
> a network packet, i.e. very often. When a guest does a hypercall
> (which is just an assembly instruction), the host kernel receives it
> in the real mode (MMU is off). When real mode fails to handle it,
> it enables MMU and tries handling a hcall in virtual mode.
> 
> A logical bus ID (LIOBN) is a tagret id for these hypecalls.
> 
> Each VFIO device belongs to an IOMMU group. Each group has an address
> translation table. It is allowed to have multiple IOMMU groups (i.e.
> multiple tables) under the same LIOBN.
> 
> So effectively every DMA hcall has to update one or more TCE tables
> attached to the same LIOBN. RCU is used to update/traverse this list
> safely.
> 
> Using RCU as is in virtual mode is fine. Lockdep works, etc.
> list_add_rcu() is used to populate the list;
> list_del_rcu() + call_rcu() used to remove groups from a list.
> These operations can happen in runtim as a result of PCI hotplug/unplug
> in guests.
> 
> Using RCU as is in real mode is not fine as some RCU checks can lock up
> the system and in real mode we won't even have a chance to see any
> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.
> 
> Previous version of this used to define list_for_each_entry_rcu_notrace()
> but it was proposed to use list_entry_lockless() instead. However
> the comment for lockless_dereference() suggests this is a good idea
> if "lifetime is managed by something other than RCU" but it is in my case.
> 
> So what would be the correct approach here? Thanks.

If the only use case for this so far is in POWERKVM, perhaps it should
be defined specifically (and in arch/powerpc) and not confuse others
about using this.

Or, if you do imagine that this can be used in other scenarios, then a
much deeper comment must be made in the code in the kerneldoc section.
list_for_each_entry_rcu() should really be used in 99.99% of the time
in the kernel. This looks to be an extreme exception. I hate to add a
generic helper for something that will only be used in one location.

-- Steve


> ---
>  include/linux/rculist.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..a83a924 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
>  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_lockless - iterate over rcu list of given type
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently
> + */
> +#define list_entry_lockless(ptr, type, member) \
> +	container_of((typeof(ptr))lockless_dereference(ptr), type, member)
> +
> +#define list_for_each_entry_lockless(pos, head, member) \
> +	for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
> +		&pos->member != (head); \
> +		pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given type
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ