netdev - Re: Ottawa and slow hash-table resize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150223230619.GD15405@linux.vnet.ibm.com>
Date:	Mon, 23 Feb 2015 15:06:19 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	David Miller <davem@...emloft.net>
Cc:	tgraf@...g.ch, josh@...htriplett.org, alexei.starovoitov@...il.com,
	herbert@...dor.apana.org.au, kaber@...sh.net,
	ying.xue@...driver.com, netdev@...r.kernel.org,
	netfilter-devel@...r.kernel.org
Subject: Re: Ottawa and slow hash-table resize

On Mon, Feb 23, 2015 at 05:32:52PM -0500, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> Date: Mon, 23 Feb 2015 13:52:49 -0800
> 
> > On Mon, Feb 23, 2015 at 09:03:58PM +0000, Thomas Graf wrote:
> >> On 02/23/15 at 11:12am, josh@...htriplett.org wrote:
> >> > In theory, resizes should only take the locks for the buckets they're
> >> > currently unzipping, and adds should take those same locks.  Neither one
> >> > should take a whole-table lock, other than resize excluding concurrent
> >> > resizes.  Is that still insufficient?
> >> 
> >> Correct, this is what happens. The problem is basically that
> >> if we insert from atomic context we cannot slow down inserts
> >> and the table may not grow quickly enough.
> >> 
> >> > Yeah, the add/remove statistics used for tracking would need some
> >> > special handling to avoid being a table-wide bottleneck.
> >> 
> >> Daniel is working on a patch to do per-cpu element counting
> >> with a batched update cycle.
> > 
> > One approach is simply to count only when a resize operation is in
> > flight.  Another is to keep a per-bucket count, which can be summed
> > at the beginning of the next resize operation.
> 
> I think we should think exactly about what we should do when someone
> loops non-stop adding 1 million entries to the hash table and the
> initial table size is very small.
> 
> This is a common use case for at least one of the current rhashtable
> users (nft_hash).  When you load an nftables rule with a large set
> of IP addresses attached, this is what happens.
> 
> Yes I understand that nftables could give a hint and start with a
> larger hash size from the start when it knows this is going to happen,
> but I still believe that we should behave reasonably when starting
> from a small table.
> 
> I'd say that with the way things work right now, in this situation it
> actually hurts to allow asynchronous inserts during a resize.  Because
> we end up with extremely long hash table chains, and thus make the
> resize work and the lookups both take an excruciatingly long amount of
> time to complete.
> 
> I just did a quick scan of all code paths that do inserts into an
> rhashtable, and it seems like all of them can easily block.  So why
> don't we do that?  Make inserts sleep on an rhashtable expansion
> waitq.
> 
> There could even be a counter of pending inserts, so the expander can
> decide to expand further before waking the inserting threads up.

Should be reasonably simple, and certainly seems worth a try!

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html