[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090412090603.556ba4fa@nehalam>
Date: Sun, 12 Apr 2009 09:06:03 -0700
From: Stephen Hemminger <shemminger@...tta.com>
To: paulmck@...ux.vnet.ibm.com
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
David Miller <davem@...emloft.net>,
Ingo Molnar <mingo@...e.hu>,
Lai Jiangshan <laijs@...fujitsu.com>,
jeff.chua.linux@...il.com, dada1@...mosbay.com, jengelh@...ozas.de,
kaber@...sh.net, r000n@...0n.net,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
netfilter-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: iptables very slow after commit
784544739a25c30637397ace5489eeb6e15d7d49
On Sat, 11 Apr 2009 17:34:45 -0700
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> On Sat, Apr 11, 2009 at 11:57:16AM -0700, Linus Torvalds wrote:
> >
> >
> > On Fri, 10 Apr 2009, Paul E. McKenney wrote:
> > >
> > > 1. Assuming that the synchronize_net() is intended to guarantee
> > > that the new rules will be in effect before returning to
> > > user space:
> >
> > Btw, I think that's a bad assumption.
>
> It does indeed appear to be!
>
> > The thing is, nobody can really care if the new rules are in effect or
> > not, because the thing you race with is not the "return to user space"
> > part, but the incoming packets.
> >
> > And those incoming packets might have been incoming before the rules were
> > set up too.
> >
> > So I seriously doubt you need to synchronize with any returning to user
> > space. What you want to synchronize with is then later actions that do
> > things like turning on the interface that the rules are attached to etc!
> >
> > So I would suggest:
> >
> > - remove the synchronize_net() entirely. Replace it with just freeing the
> > old rules using RCU.
> >
> > - new packets will always end up seeing the new rules. That includes the
> > case of somebody doing "ifconfig eth0 up" that enables a new source of
> > packets, so there are no real security issues.
> >
> > - if you enabled your network interfaces before you updated your packet
> > filtering rules, you already had a window where packets would come in
> > with the old rules, so doing a "synchronize_net()" in no way protects
> > against any race conditions anyway.
> >
> > Am I missing something?
>
> The issue at this point seems to be the need to get accurate snapshots
> of various counters -- there are a number of Linux networking users who
> need to account for every byte flowing through their systems. However,
> it is also necessary to update these counters very efficiently, given
> that they are updated on a per-packet basis. The current approach is
> as follows:
>
> 1. Install a new set of counters.
>
> 2. Wait for a grace period to elapse.
>
> 3. At this point, we know that all subsequent counting will happen
> on the new set of counters.
>
> 4. Add the value of the old set of counters to the new set of
> counters.
>
> 5. Copy the old set of counters up to user space.
>
> So we get a good snapshot in #5, while #4 ensures that we don't lose
> any counts when taking future snapshots. Unfortunately, #2 hits us
> with grace-period latencies on the critical path.
>
> We are going through the following possibilities:
>
> o Stick with the current approach, and ask people to move to
> new batch-oriented interfaces. However, a 30x decrease in
> performance is pretty grim, even for an old-style interface.
>
> o Use various atomic tricks to get an immediate snapshot of the
> old counters after step 1. Make step 3 use call_rcu() instead
> of synchronize_rcu(), and then step 4 happens off the
> critical path.
>
> This approach moves the RCU grace period off of the critical
> path, but the atomic tricks are extremely ugly on 32-bit SMP
> machines. 32-bit UP machines and 64-bit machines are not
> too bad, though the 32-bit UP case does add preemption-disable
> overhead on the counter-update fastpath.
>
> o Provide some sort of expedited synchronize_rcu(). This might
> be able to decrease the hit from 30x down to maybe 5x.
> But I might need to do this for the fast-boot folks anyway,
> though I am first trying to get away with just speeding
> up synchronized_rcu(). Though I was not thinking in terms
> of 6x, let alone 30x.
>
> Please note that this would not be a drop-in replacement for
> synchronize_rcu(). One would use synchronize_rcu_expedited()
> (or whatever) only when the system really could not get any
> useful work done while the grace period was in progress.
> The general approach would be to keep the whole machine busy
> trying to get the grace period done as soon as possible.
>
> Thanx, Paul
We could also try:
* per-cpu spinlock on counters (instead of synchronize_net).
When doing update, just acquire
lock on that cpu and futz with counters then. Overhead should
still be less than 2.6.29 and earlier global rwlock
* synchonize_rcu/synchronize_net is more guarantee than needed?
* use on_each_cpu() somehow to do grace periood?
* Add a cond_resched() into net_rx_action which might cause rx processing
to get out of rcu sooner? also in transmit packet scheduler.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists