lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 3 Feb 2009 13:22:20 -0800 From: Stephen Hemminger <shemminger@...tta.com> To: paulmck@...ux.vnet.ibm.com Cc: Eric Dumazet <dada1@...mosbay.com>, David Miller <davem@...emloft.net>, netdev@...r.kernel.org Subject: Re: [PATCH 3/3] iptables: lock free counters (alternate version) On Tue, 3 Feb 2009 13:10:00 -0800 "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote: > On Tue, Feb 03, 2009 at 09:20:04PM +0100, Eric Dumazet wrote: > > Paul E. McKenney a écrit : > > > On Tue, Feb 03, 2009 at 08:00:16PM +0100, Eric Dumazet wrote: > > >> Stephen Hemminger a écrit : > > >>> This is an alternative to earlier RCU/seqcount_t version of counters. > > >>> The counters operate as usual without locking, but when counters are rotated > > >>> around the CPU's entries. RCU is used in two ways, first to handle the > > >>> counter rotation, second for replace. > > >> Is it a working patch or just a prototype ? > > >> > > >>> Signed-off-by: Stephen Hemminger <shemminger@...tta.com> > > >>> > > >>> --- > > >>> include/linux/netfilter/x_tables.h | 10 +++- > > >>> net/ipv4/netfilter/arp_tables.c | 73 +++++++++++++++++++++++++++--------- > > >>> net/ipv4/netfilter/ip_tables.c | 68 ++++++++++++++++++++++++--------- > > >>> net/ipv6/netfilter/ip6_tables.c | 75 ++++++++++++++++++++++++++----------- > > >>> net/netfilter/x_tables.c | 43 +++++++++++++++------ > > >>> 5 files changed, 197 insertions(+), 72 deletions(-) > > >>> > > >>> --- a/include/linux/netfilter/x_tables.h 2009-02-02 15:06:39.893751845 -0800 > > >>> +++ b/include/linux/netfilter/x_tables.h 2009-02-02 15:28:10.022574005 -0800 > > >>> @@ -353,7 +353,7 @@ struct xt_table > > >>> unsigned int valid_hooks; > > >>> > > >>> /* Lock for the curtain */ > > >>> - rwlock_t lock; > > >>> + struct mutex lock; > > >>> > > >>> /* Man behind the curtain... */ > > >>> struct xt_table_info *private; > > >>> @@ -383,9 +383,15 @@ struct xt_table_info > > >>> unsigned int hook_entry[NF_INET_NUMHOOKS]; > > >>> unsigned int underflow[NF_INET_NUMHOOKS]; > > >>> > > >>> + /* For the dustman... */ > > >>> + union { > > >>> + struct rcu_head rcu; > > >>> + struct work_struct work; > > >>> + }; > > >>> + > > >>> /* ipt_entry tables: one per CPU */ > > >>> /* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */ > > >>> - char *entries[1]; > > >>> + void *entries[1]; > > >>> }; > > >>> > > >>> #define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \ > > >>> --- a/net/ipv4/netfilter/ip_tables.c 2009-02-02 15:06:29.684249364 -0800 > > >>> +++ b/net/ipv4/netfilter/ip_tables.c 2009-02-02 15:14:13.256499021 -0800 > > >>> @@ -347,10 +347,12 @@ ipt_do_table(struct sk_buff *skb, > > >>> mtpar.family = tgpar.family = NFPROTO_IPV4; > > >>> tgpar.hooknum = hook; > > >>> > > >>> - read_lock_bh(&table->lock); > > >>> IP_NF_ASSERT(table->valid_hooks & (1 << hook)); > > >>> - private = table->private; > > >>> - table_base = (void *)private->entries[smp_processor_id()]; > > >>> + > > >>> + rcu_read_lock_bh(); > > >>> + private = rcu_dereference(table->private); > > >>> + table_base = rcu_dereference(private->entries[smp_processor_id()]); > > >>> + > > >>> e = get_entry(table_base, private->hook_entry[hook]); > > >>> > > >>> /* For return from builtin chain */ > > >>> @@ -445,7 +447,7 @@ ipt_do_table(struct sk_buff *skb, > > >>> } > > >>> } while (!hotdrop); > > >>> > > >>> - read_unlock_bh(&table->lock); > > >>> + rcu_read_unlock_bh(); > > >>> > > >>> #ifdef DEBUG_ALLOW_ALL > > >>> return NF_ACCEPT; > > >>> @@ -892,45 +894,73 @@ set_entry_to_counter(const struct ipt_en > > >>> return 0; > > >>> } > > >>> > > >>> +static inline int > > >>> +set_counter_to_entry(struct ipt_entry *e, > > >>> + const struct ipt_counters total[], > > >>> + unsigned int *i) > > >>> +{ > > >>> + SET_COUNTER(e->counters, total[*i].bcnt, total[*i].pcnt); > > >>> + > > >>> + (*i)++; > > >>> + return 0; > > >>> +} > > >>> + > > >>> + > > >>> static void > > >>> -get_counters(const struct xt_table_info *t, > > >>> +get_counters(struct xt_table_info *t, > > >>> struct xt_counters counters[]) > > >>> { > > >>> unsigned int cpu; > > >>> unsigned int i; > > >>> unsigned int curcpu; > > >>> + struct ipt_entry *e; > > >>> > > >>> - /* Instead of clearing (by a previous call to memset()) > > >>> - * the counters and using adds, we set the counters > > >>> - * with data used by 'current' CPU > > >>> - * We dont care about preemption here. > > >>> - */ > > >>> + preempt_disable(); > > >>> curcpu = raw_smp_processor_id(); > > >>> - > > >>> + e = t->entries[curcpu]; > > >>> i = 0; > > >>> - IPT_ENTRY_ITERATE(t->entries[curcpu], > > >>> + IPT_ENTRY_ITERATE(e, > > >>> t->size, > > >>> set_entry_to_counter, > > >> Hum, current cpu might be interrupted by NIC, since you only disabled preemption. > > >> set_entry_to_counter() might get garbage. > > >> I suppose I already mentioned it :) > > >> > > >>> counters, > > >>> &i); > > >>> > > >>> for_each_possible_cpu(cpu) { > > >>> + void *p; > > >>> + > > >>> if (cpu == curcpu) > > >>> continue; > > >>> + > > >>> + /* Swizzle the values and wait */ > > >>> + e->counters = ((struct xt_counters) { 0, 0 }); > > >> I dont see what you want to do here... > > >> > > >> e->counters is the counter associated with rule #0 > > >> > > >>> + p = t->entries[cpu]; > > >>> + rcu_assign_pointer(t->entries[cpu], e); > > >>> + synchronize_net(); > > >> > > >> Oh well, not this synchronize_net() :) > > >> > > >> This wont provide atomic sampling of counters for whole CPUS, and introduce large delays > > >> on big machines (NR_CPUS >= 64) > > > > > > Why would this not provide the moral equivalent of atomic sampling? > > > The code above switches to another counter set, and waits for a grace > > > period. Shouldn't this mean that all CPUs that were incrementing the > > > old set of counters have finished doing so, so that the aggregate count > > > covers all CPUs that started their increments before the pointer switch? > > > Same as acquiring a write lock, which would wait for all CPUs that > > > started their increments before starting the write-lock acquisition. > > > CPUs that started their increments after starting the write acquisition > > > would not be accounted for in the total, same as the RCU approach. > > > > > > Steve's approach does delay reading out the counters, but it avoids > > > delaying any CPU trying to increment the counters. > > > > I see your point, but this is not what Stephen implemented. > > > > So.. CPU will increments which counters, if not delayed ? > > The new set installed by the rcu_assign_pointer(). > > > How counters will be synced again after our 'iptables -L' finished ? > > The usual approach would be to have three sets of counters, one currently > being incremented, one just removed from service, and the last one holding > the cumulative value. After a synchronize_net() following removing > a set from service, you add in the values in the previous set removed > from service. Then you keep the new set for the next 'iptables -L'. > > > "iptable -L" is not supposed to miss some counters updates (only some packets > > might be droped at NIC level because we spend time in the collection). > > If packets matches some rules, we really want up2date counters. > > No counter updates would be lost using the above method. > > > Maybe we need for this collection an extra "cpu", to collect > > all increments that were done when CPUs where directed to a > > "secondary table/counters" > > It should be easier to maintain a third set of counters that hold the > accumulated counts from the earlier instances of 'iptables -L'. > > > > So what am I missing here? > > > > Well, I saw one synchronize_net() inside the for_each_possible_cpu(cpu) loop. > > Say we have NR_CPUS=4096, how long will it takes to perform "iptables -L" ? > > Good point, the for_each_possible_cpu() was cut out -- I should have > gone back and looked at the original patch. > > Seems like it should be possible to do a single synchronize_net() > after swizzling all the counters... > > > General/intuitive idea would be : > > > > switch pointers to a newly allocated table (and zeroed counters) > > wait one RCU grace period > > collect/sum all counters of "old" table + (all cpus) into user provided table > > restore previous table > > wait one RCU grace period > > disable_bh() > > collect/sum all counters of "new and temporary" table (all cpus) and > > reinject them into local cpu table (this cpu should not be interrupted) > > enable_bh() > > > > This way, "iptables -L" is not too expensive and doesnt block packet processing at all. > > My thought would be: > > o acquire some sort of mutex. > > o switch counters to newly allocated (and zeroed) table (T1). > The old table being switched out is T2. > > o wait one RCU grace period. > > o Sum T2 into a single global counter (G). > > o Free T2. > > o Copy G to a local variable. > > o release the mutex. > > o Return the value of the local variable. > > Then you can repeat, allocating a new table again and using the new > value of G. > > Which may well be what you are saying above, > I was using the current CPU counter as the global counter G. > Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists