lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue,  3 Nov 2015 17:57:05 +1100
From:	Alexey Kardashevskiy <aik@...abs.ru>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Alexey Kardashevskiy <aik@...abs.ru>,
	Steven Rostedt <rostedt@...dmis.org>,
	Paul Mackerras <paulus@...ba.org>,
	David Gibson <david@...son.dropbear.id.au>,
	linux-kernel@...r.kernel.org
Subject: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu

This defines list_for_each_entry_lockless. This allows safe list
traversing in cases when lockdep() invocation is unwanted like
real mode (MMU is off).

Signed-off-by: Alexey Kardashevskiy <aik@...abs.ru>
---

This is for VFIO acceleration in POWERKVM for pSeries guests.
There is a KVM instance. There also can be some VFIO (PCI passthru)
devices attached to a KVM guest.

To perform DMA, a pSeries guest registers DMA memory by calling some
hypercalls explicitely at the rate close to one-two hcalls per
a network packet, i.e. very often. When a guest does a hypercall
(which is just an assembly instruction), the host kernel receives it
in the real mode (MMU is off). When real mode fails to handle it,
it enables MMU and tries handling a hcall in virtual mode.

A logical bus ID (LIOBN) is a tagret id for these hypecalls.

Each VFIO device belongs to an IOMMU group. Each group has an address
translation table. It is allowed to have multiple IOMMU groups (i.e.
multiple tables) under the same LIOBN.

So effectively every DMA hcall has to update one or more TCE tables
attached to the same LIOBN. RCU is used to update/traverse this list
safely.

Using RCU as is in virtual mode is fine. Lockdep works, etc.
list_add_rcu() is used to populate the list;
list_del_rcu() + call_rcu() used to remove groups from a list.
These operations can happen in runtim as a result of PCI hotplug/unplug
in guests.

Using RCU as is in real mode is not fine as some RCU checks can lock up
the system and in real mode we won't even have a chance to see any
debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.

Previous version of this used to define list_for_each_entry_rcu_notrace()
but it was proposed to use list_entry_lockless() instead. However
the comment for lockless_dereference() suggests this is a good idea
if "lifetime is managed by something other than RCU" but it is in my case.

So what would be the correct approach here? Thanks.
---
 include/linux/rculist.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 17c6b1f..a83a924 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
 		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_lockless - iterate over rcu list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently
+ */
+#define list_entry_lockless(ptr, type, member) \
+	container_of((typeof(ptr))lockless_dereference(ptr), type, member)
+
+#define list_for_each_entry_lockless(pos, head, member) \
+	for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
-- 
2.5.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ