netdev - Re: [RFC PATCH] openvswitch: use percpu flow stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1471666072.29842.123.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Fri, 19 Aug 2016 21:07:52 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     David Miller <davem@...emloft.net>
Cc:     cascardo@...hat.com, dev@...nvswitch.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH] openvswitch: use percpu flow stats

On Fri, 2016-08-19 at 18:09 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Fri, 19 Aug 2016 12:56:56 -0700
> 
> > On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote:
> >> Instead of using flow stats per NUMA node, use it per CPU. When using
> >> megaflows, the stats lock can be a bottleneck in scalability.
> > 
> > ...
> > 
> >>  
> >>  	flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
> >> -				       + (nr_node_ids
> >> +				       + (num_possible_cpus()
> >>  					  * sizeof(struct flow_stats *)),
> >>  				       0, 0, NULL);
> >>  	if (flow_cache == NULL)
> > 
> > This is bogus.
> > 
> > Please use nr_cpu_ids instead of num_possible_cpus().
> 
> Then how would hot-plugged cpus be handled properly?
> 
> The only other way is you'd have to free all objects in the current
> flow_cache, destroy it, then make a new kmem_cache and reallocate
> all of the existing objects and hook them back up every time a cpu
> up event occurs.
> 
> That doesn't make any sense at all.
> 
> Therefore, the size of the objects coming out of "sw_flow" have
> to accomodate all possible cpu indexes.

Here is an example of a system I had in the past.

8 cpus  (num_possible_cpus() returns 8)

Cpus were numbered : 0, 1, 2, 3, 8, 9, 10, 11  : (nr_cpu_ids = 12)

I am pretty sure they are still alive.

Since you want an array indexed by cpu numbers, you need to size it by
nr_cpu_ids, otherwise array[11] will crash / access garbage.

This is why you find many nr_cpu_ids in the linux tree, not
num_possible_cpus() to size arrays.

# git grep -n nr_cpu_ids -- net
net/bridge/netfilter/ebtables.c:900:                    vmalloc(nr_cpu_ids * sizeof(*(newinfo->chainstack)));
net/bridge/netfilter/ebtables.c:1132:   countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:1185:   countersize = COUNTER_OFFSET(repl->nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:2200:   countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/core/dev.c:3483:    if (next_cpu < nr_cpu_ids) {
net/core/dev.c:3588:             *   - Current CPU is unset (>= nr_cpu_ids).
net/core/dev.c:3596:                (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
net/core/dev.c:3603:            if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
net/core/dev.c:3651:            if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
net/core/neighbour.c:2737:      for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/core/neighbour.c:2751:      for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/core/net-procfs.c:122:      while (*pos < nr_cpu_ids)
net/core/sysctl_net_core.c:70:                          rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1;
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:360:      for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:375:      for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:254:   for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:267:   for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv6/icmp.c:918:            kzalloc(nr_cpu_ids * sizeof(struct sock *), GFP_KERNEL);
net/netfilter/nf_conntrack_netlink.c:1265:      for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:2003:      if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:2006:      for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:3211:      if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:3214:      for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_standalone.c:309:    for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/netfilter/nf_conntrack_standalone.c:324:    for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/netfilter/nf_synproxy_core.c:255:   for (cpu = *pos - 1; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_synproxy_core.c:270:   for (cpu = *pos; cpu < nr_cpu_ids; cpu++) {
net/netfilter/x_tables.c:1064:  size = sizeof(void **) * nr_cpu_ids;
net/sunrpc/svc.c:167:   unsigned int maxpools = nr_cpu_ids;