[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49E7384B.5020707@trash.net>
Date: Thu, 16 Apr 2009 15:53:15 +0200
From: Patrick McHardy <kaber@...sh.net>
To: Eric Dumazet <dada1@...mosbay.com>
CC: Stephen Hemminger <shemminger@...tta.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Miller <davem@...emloft.net>, jeff.chua.linux@...il.com,
paulmck@...ux.vnet.ibm.com, paulus@...ba.org, mingo@...e.hu,
laijs@...fujitsu.com, jengelh@...ozas.de, r000n@...0n.net,
linux-kernel@...r.kernel.org, netfilter-devel@...r.kernel.org,
netdev@...r.kernel.org, benh@...nel.crashing.org
Subject: Re: [PATCH] netfilter: use per-cpu spinlock and RCU (v5)
Eric Dumazet wrote:
> Stephen Hemminger a écrit :
>> This is an alternative version of ip/ip6/arp tables locking using
>> per-cpu locks. This avoids the overhead of synchronize_net() during
>> update but still removes the expensive rwlock in earlier versions.
>>
>> The idea for this came from an earlier version done by Eric Dumazet.
>> Locking is done per-cpu, the fast path locks on the current cpu
>> and updates counters. The slow case involves acquiring the locks on
>> all cpu's. This version uses RCU for the table->base reference
>> but per-cpu-lock for counters.
> This version is a regression over 2.6.2[0-9], because of two points
>
> 1) Much more atomic ops :
>
> Because of additional
>
>> + spin_lock(&__get_cpu_var(ip_tables_lock));
>> ADD_COUNTER(e->counters, ntohs(ip->tot_len), 1);
>> + spin_unlock(&__get_cpu_var(ip_tables_lock));
>
> added on each counter updates.
>
> On many setups, each packet coming in or out of the machine has
> to update between 2 to 20 rule counters. So to avoid *one* atomic ops
> of read_unlock(), this v4 version adds 2 to 20 atomic ops...
I agree, this seems to be a step backwards.
> I still not see the problem between the previous version (2.6.2[0-8]) that had a central
> rwlock, that hurted performance on SMP because of cache line ping pong, and the solution
> having one rwlock per cpu.
>
> We wanted to reduce the cache line ping pong first. This *is* the hurting point,
> by an order of magnitude.
Dave doesn't seem to like the rwlock approach. I don't see a way to
do anything asynchronously like call_rcu() to improve this, so to
bring up one of Stephens suggestions again:
>> > * use on_each_cpu() somehow to do grace periood?
We could use this to replace the counters, presuming it is
indeed faster than waiting for a RCU grace period.
> 2) Second problem : Potential OOM
>
> About freeing old rules with call_rcu() and/or schedule_work(), this is going
> to OOM pretty fast on small appliances with basic firewall setups loading
> rules one by one, as done by original topic reporter.
>
> We had reports from guys using linux with 4MB of available ram (French provider free.fr on
> their applicance box), and we had to use SLAB_DESTROY_BY_RCU thing on conntrack
> to avoid OOM for their setups. We dont want to use call_rcu() and queue 100 or 200 vfree().
Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists