[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150119125813.GA7672@casper.infradead.org>
Date: Mon, 19 Jan 2015 12:58:13 +0000
From: Thomas Graf <tgraf@...g.ch>
To: Patrick McHardy <kaber@...sh.net>
Cc: David Laight <David.Laight@...LAB.COM>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"john.r.fastabend@...el.com" <john.r.fastabend@...el.com>,
"josh@...htriplett.org" <josh@...htriplett.org>,
"netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>
Subject: Re: [PATCH 7/9] rhashtable: Per bucket locks & deferred
expansion/shrinking
On 01/17/15 at 08:02am, Patrick McHardy wrote:
> On 16.01, Thomas Graf wrote:
> > Resize operations should be *really* rare as well unless you start
> > with really small hash table sizes and constantly add/remove at the
> > watermark.
>
> Which are far enough from each other that this should only happen
> in really unlucky cases.
>
> > Re-dumping on insert/remove is a different story of course. Do you
> > care about missed insert/removals for dumps? If not we can do the
> > sequence number consistency checking for resizing only.
>
> No, that has always been undeterministic with netlink. We want to
> dump everything that was present when the dump was started and is
> still present when it finishes. Anything else can be handled using
> notifications.
It looks like we want to provide two ways to resolve this:
1) Walker holds ht->mutex the entire time to block out resizes.
Optionally the walker can acquire all bucket locks. Such
scenarios would seem to benefit from either a single or a very
small number of bucket locks.
2) Walker holds ht->mutex during individual Netlink message
construction periods and relases it while user space reads the
message. rhashtable provides a hook which is called when a
resize operation is scheduled allowing for the walker code to
bump a sequence number and notify user space that the dump is
inconsistent, causing it to request a new dump.
I'll provide an API to achieve (2). (1) is already achieveable with
the current API.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists