[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49848A11.7020908@cosmosbay.com>
Date: Sat, 31 Jan 2009 18:27:45 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Stephen Hemminger <shemminger@...tta.com>
CC: David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH 6/6] netfilter: convert x_tables to use RCU
Stephen Hemminger a écrit :
> Replace existing reader/writer lock with Read-Copy-Update to
> elminate the overhead of a read lock on each incoming packet.
> This should reduce the overhead of iptables especially on SMP
> systems.
>
> The previous code used a reader-writer lock for two purposes.
> The first was to ensure that the xt_table_info reference was not in
> process of being changed. Since xt_table_info is only freed via one
> routine, it was a direct conversion to RCU.
>
> The other use of the reader-writer lock was to to block changes
> to counters while they were being read. This synchronization was
> fixed by the previous patch. But still need to make sure table info
> isn't going away.
>
> Signed-off-by: Stephen Hemminger <shemminger@...tta.com>
>
>
> ---
> include/linux/netfilter/x_tables.h | 10 ++++++--
> net/ipv4/netfilter/arp_tables.c | 16 ++++++-------
> net/ipv4/netfilter/ip_tables.c | 27 +++++++++++-----------
> net/ipv6/netfilter/ip6_tables.c | 16 ++++++-------
> net/netfilter/x_tables.c | 45 ++++++++++++++++++++++++++-----------
> 5 files changed, 70 insertions(+), 44 deletions(-)
OK, I have a last comment Stephen
(I tested your patches on my dev machine and got nice results)
In net/ipv4/netfilter/ip_tables.c,
get_counters() can be quite slow on large firewalls, and it is
called from alloc_counters() with BH disabled, but also from __do_replace()
without BH disabled.
So get_counters() first iterates on set_entry_to_counter() for current cpu
but might be interrupted and we get bad results...
Solution is to always use seqcount_t protection, and no BH disabling at all.
Then, I wonder if all this is valid, since alloc_counters() might be supposed
to perform an atomic snapshots of all counters. This was possible because
of a write_lock_bh(), but does it make any sense to block all packets for
such a long time ??? If this is a strong requirement, I am afraid we have
to make other changes...
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
net/ipv4/netfilter/ip_tables.c | 25 +++----------------------
1 files changed, 3 insertions(+), 22 deletions(-)
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 7c50e2e..d208b25 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -900,25 +900,8 @@ get_counters(const struct xt_table_info *t,
{
unsigned int cpu;
unsigned int i;
- unsigned int curcpu;
-
- /* Instead of clearing (by a previous call to memset())
- * the counters and using adds, we set the counters
- * with data used by 'current' CPU
- * We dont care about preemption here.
- */
- curcpu = raw_smp_processor_id();
-
- i = 0;
- IPT_ENTRY_ITERATE(t->entries[curcpu],
- t->size,
- set_entry_to_counter,
- counters,
- &i);
for_each_possible_cpu(cpu) {
- if (cpu == curcpu)
- continue;
i = 0;
IPT_ENTRY_ITERATE(t->entries[cpu],
t->size,
@@ -939,15 +922,12 @@ static struct xt_counters * alloc_counters(struct xt_table *table)
(other than comefrom, which userspace doesn't care
about). */
countersize = sizeof(struct xt_counters) * private->number;
- counters = vmalloc_node(countersize, numa_node_id());
+ counters = __vmalloc(countersize, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
if (counters == NULL)
return ERR_PTR(-ENOMEM);
- /* First, sum counters... */
- local_bh_disable();
get_counters(private, counters);
- local_bh_enable();
return counters;
}
@@ -1209,7 +1189,8 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
void *loc_cpu_old_entry;
ret = 0;
- counters = vmalloc(num_counters * sizeof(struct xt_counters));
+ counters = __vmalloc(num_counters * sizeof(struct xt_counters),
+ GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
if (!counters) {
ret = -ENOMEM;
goto out;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists