[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1471666072.29842.123.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Fri, 19 Aug 2016 21:07:52 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: cascardo@...hat.com, dev@...nvswitch.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH] openvswitch: use percpu flow stats
On Fri, 2016-08-19 at 18:09 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Fri, 19 Aug 2016 12:56:56 -0700
>
> > On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote:
> >> Instead of using flow stats per NUMA node, use it per CPU. When using
> >> megaflows, the stats lock can be a bottleneck in scalability.
> >
> > ...
> >
> >>
> >> flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
> >> - + (nr_node_ids
> >> + + (num_possible_cpus()
> >> * sizeof(struct flow_stats *)),
> >> 0, 0, NULL);
> >> if (flow_cache == NULL)
> >
> > This is bogus.
> >
> > Please use nr_cpu_ids instead of num_possible_cpus().
>
> Then how would hot-plugged cpus be handled properly?
>
> The only other way is you'd have to free all objects in the current
> flow_cache, destroy it, then make a new kmem_cache and reallocate
> all of the existing objects and hook them back up every time a cpu
> up event occurs.
>
> That doesn't make any sense at all.
>
> Therefore, the size of the objects coming out of "sw_flow" have
> to accomodate all possible cpu indexes.
Here is an example of a system I had in the past.
8 cpus (num_possible_cpus() returns 8)
Cpus were numbered : 0, 1, 2, 3, 8, 9, 10, 11 : (nr_cpu_ids = 12)
I am pretty sure they are still alive.
Since you want an array indexed by cpu numbers, you need to size it by
nr_cpu_ids, otherwise array[11] will crash / access garbage.
This is why you find many nr_cpu_ids in the linux tree, not
num_possible_cpus() to size arrays.
# git grep -n nr_cpu_ids -- net
net/bridge/netfilter/ebtables.c:900: vmalloc(nr_cpu_ids * sizeof(*(newinfo->chainstack)));
net/bridge/netfilter/ebtables.c:1132: countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:1185: countersize = COUNTER_OFFSET(repl->nentries) * nr_cpu_ids;
net/bridge/netfilter/ebtables.c:2200: countersize = COUNTER_OFFSET(tmp.nentries) * nr_cpu_ids;
net/core/dev.c:3483: if (next_cpu < nr_cpu_ids) {
net/core/dev.c:3588: * - Current CPU is unset (>= nr_cpu_ids).
net/core/dev.c:3596: (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
net/core/dev.c:3603: if (tcpu < nr_cpu_ids && cpu_online(tcpu)) {
net/core/dev.c:3651: if (rflow->filter == filter_id && cpu < nr_cpu_ids &&
net/core/neighbour.c:2737: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/core/neighbour.c:2751: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/core/net-procfs.c:122: while (*pos < nr_cpu_ids)
net/core/sysctl_net_core.c:70: rps_cpu_mask = roundup_pow_of_two(nr_cpu_ids) - 1;
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:360: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c:375: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:254: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/ipv4/route.c:267: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/ipv6/icmp.c:918: kzalloc(nr_cpu_ids * sizeof(struct sock *), GFP_KERNEL);
net/netfilter/nf_conntrack_netlink.c:1265: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:2003: if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:2006: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_netlink.c:3211: if (cb->args[0] == nr_cpu_ids)
net/netfilter/nf_conntrack_netlink.c:3214: for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_conntrack_standalone.c:309: for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) {
net/netfilter/nf_conntrack_standalone.c:324: for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) {
net/netfilter/nf_synproxy_core.c:255: for (cpu = *pos - 1; cpu < nr_cpu_ids; cpu++) {
net/netfilter/nf_synproxy_core.c:270: for (cpu = *pos; cpu < nr_cpu_ids; cpu++) {
net/netfilter/x_tables.c:1064: size = sizeof(void **) * nr_cpu_ids;
net/sunrpc/svc.c:167: unsigned int maxpools = nr_cpu_ids;
Powered by blists - more mailing lists