netdev - Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 15 Nov 2014 11:12:20 -0800
From:	Josh Triplett <josh@...htriplett.org>
To:	Thomas Graf <tgraf@...g.ch>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>, netdev@...r.kernel.org,
	eric.dumazet@...il.com, paulmck@...ux.vnet.ibm.com
Subject: Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held

On Sat, Nov 15, 2014 at 11:16:26AM +0000, Thomas Graf wrote:
> On 11/15/14 at 11:25am, Herbert Xu wrote:
> > On Thu, Nov 13, 2014 at 06:43:43PM +0800, Herbert Xu wrote:
> > > On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
> > > >
> > > > Never mind. You did fix it. I looked at the wrong patch.
> > > 
> > > OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
> > > hash into rhashtable :)
> > 
> > Thomas, it appears that rhashtable as it stands cannot handle
> > changing the random seed because of the way it constructs the
> > new hash table without degrading into a linked list.  Is there
> > something I'm missing?
> > 
> > FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
> > correctly.  Any objections to me converting rhashtable to use my
> > scheme instead?
> 
> Can you elaborate a bit?
> 
> The point of rhashtable is to not require two sets of linked list
> pointers as done by MDB or OVS flow tables to work around the
> increased cache footprint of that approach. The difference of the
> two algos is dicussed in this paper [0].
> 
> The disadvantage of rhashtable is that, AFAIK, the hash function
> cannot change while resizing as it would break the mutual linked
> lists.

You can handle hash function changes with rhashtable without needing a
second set of linked-list pointers, actually.  You'd have to add ~1
unlikely() conditional to the reader common case, and you'd make readers
*during* a hash algorithm change (which I'd hope happens as rarely as a
resize) somewhat less efficient, but you wouldn't use any more memory
than a resize currently does, and you wouldn't use the memory of extra
linked-list pointers in the common case.

Rather than the current approach of switching out the hash table pointer
and having readers only search the new table (which will have
valid-but-crosslinked buckets during a resize), instead keep two hash
table pointers (each with their own hash parameters) and a single
toggleable old/new indicator.  Readers check both hash table pointers,
and if valid, check both tables, first old then new (order important to
make sure they don't miss a node).  Then, the rehashing algorithm can
incrementally move nodes from the old buckets to the new buckets,
*without* disrupting concurrent readers.

Rough rehashing algorithm sketch:

- Set up the new empty table with the new set of hash parameters.
- synchronize_rcu().  Readers will now search both old and new tables.
- Peel nodes off the ends of the old hash table and add them to the new
  hash table (similar to the unzip step in the resize algorithm).
  Because readers search old-then-new, and writers make each node appear
  in the new table before making it disappear from the old, you don't
  need a synchronize_rcu() here, just an smp_wmb() after linking the
  node into the new table and before unlinking the node from the old
  table.  Also, because you remove nodes from the *ends* of old-table
  buckets, you don't have to worry about a reader's linked-list walk
  getting dragged over to the new table and missing nodes from the old
  one.
- Once all nodes have moved to the new table, mark the pointer to the
  old table as NULL, synchronize_rcu(), and free the old table.
- Toggle the old/new flag.

Ordering principles in this algorithm:
- You want readers to see the changes made by the rehasher in order.
- If the reader reads location A then B, and the rehasher writes
  location B then A, the rehasher just needs an smp_wmb() in between.
- If the reader reads location A then B, and the rehasher writes
  location A then B, the rehasher needs a synchronize_rcu() in between.


- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html