lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 15 Sep 2014 14:17:59 +0200 From: Thomas Graf <tgraf@...g.ch> To: davem@...emloft.net, eric.dumazet@...il.com, paulmck@...ux.vnet.ibm.com, john.r.fastabend@...el.com, kaber@...sh.net Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: [PATCH 0/5 RFC net-next] rhashtable: Parallel atomic insert/deletion & deferred expansion This is still work in progress but I wanted to share this as soon as the direction where this is heading to becomes clear. * Patch 1: Remove gfp_t awareness * Patch 2: Enhance selftest to catch traversal miscount * Patch 3: Nulls marker convertion * Patch 4: Add spin_lock_bh_nested() * Patch 5: Per bucket spinlock, deferred expansion/shrinking Paul, Eric, John, Dave: Let me know how far off this is from what you had in mind. Opens: * I'm in particular interested in the feasibility of the side effect regarding recent insertion visibility in bucket traversals (see below). I think current users of rhashtable can live with it but it might be worth to offer a synchronous expansion alternative at some point. If anybody has a better alternative I'm definitely interested to hear about it. * Convert inet hashtable over to this and see if it is missing anything. * Own worker thread as we might eventually sleep for multiple grace periods? * Yes, some of the rcu_assign_pointer() can probably be replaced with RCU_INIT_POINTER(). Converts the single linked list to us a nulls marker. Reserves 4 bits to distinguish list membership. Stores a full 27 bit hash as the default marker. This allows the marker to stay valid even if the stable expands or shrinks. Introduces an array of spinlocks to protect bucket mutations. The number of spinlocks per CPU is configurable and selected based on the hash of the bucket. This allows for parallel insertions and removals of entries which do not share a lock. The patch also defers expansion and shrinking to a worker queue which allow insertion and removal from atomic context. Insertions and deletions may occur in parallel to it and are only held up briefly while the particular bucket is linked or unzipped. Mutations of the bucket table pointer is protected by a new mutex taken during expansion and shrinking, read access to that pointer for insertion and deletion is RCU protected. In the event of an expansion or shrinking, the new bucket table allocated is exposed as a so called future table right away. Lookups, deletions, and insertions will briefly use both tables. The future table becomes the main table after an RCU grace period and initial relinking was performed which guarantees that no new insertions occur on the old table and all entries can be found. The side effect of this is that during that RCU grace period, a bucket traversal using any rht_for_each() variant on the main table will not see any insertions performed during the RCU grace period which would at that point land in the future table. The lookup will see them as it searches both tables if needed. Thomas Graf (5): rhashtable: Remove gfp_flags from insert and remove functions rhashtable: Check for count misatch in selftest rhashtable: Convert to nulls list spinlock: Add spin_lock_bh_nested() rhashtable: Per bucket locks & expansion/shrinking in work queue include/linux/list_nulls.h | 3 +- include/linux/rhashtable.h | 241 +++++++++++------ include/linux/spinlock.h | 8 + include/linux/spinlock_api_smp.h | 2 + include/linux/spinlock_api_up.h | 1 + kernel/locking/spinlock.c | 8 + lib/rhashtable.c | 539 ++++++++++++++++++++++++++------------- net/netfilter/nft_hash.c | 69 ++--- net/netlink/af_netlink.c | 18 +- net/netlink/diag.c | 4 +- 10 files changed, 604 insertions(+), 289 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists