[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_DZxGesjymubWoWx7hQjaZ7Vdchh=cdOW9tAr4F7QvzXg@mail.gmail.com>
Date: Wed, 27 Sep 2017 10:44:35 -0700
From: Wei Wang <weiwan@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>,
Martin KaFai Lau <kafai@...com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst
is available
On Wed, Sep 27, 2017 at 10:14 AM, Paolo Abeni <pabeni@...hat.com> wrote:
> When a host is under high ipv6 load, the updates of the ingress
> route '__use' field are a source of relevant contention: such
> field is updated for each packet and several cores can access
> concurrently the dst, even if percpu dst entries are available
> and used.
>
> The __use value is just a rough indication of the dst usage: is
> already updated concurrently from multiple CPUs without any lock,
> so we can decrease the contention leveraging the percpu dst to perform
> __use bulk updates: if a per cpu dst entry is found, we account on
> such entry and we flush the percpu counter once per jiffy.
>
> Performace gain under UDP flood is as follows:
>
> nr RX queues before after delta
> kpps kpps (%)
> 2 2316 2688 16
> 3 3033 3605 18
> 4 3963 4328 9
> 5 4379 5253 19
> 6 5137 6000 16
>
> Performance gain under TCP syn flood should be measurable as well.
>
> Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> ---
> net/ipv6/route.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 26cc9f483b6d..e69f304de950 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>
> struct rt6_info *pcpu_rt;
>
> - rt->dst.lastuse = jiffies;
> - rt->dst.__use++;
> pcpu_rt = rt6_get_pcpu_route(rt);
>
> if (pcpu_rt) {
> + unsigned long ts;
> +
> read_unlock_bh(&table->tb6_lock);
> +
> + /* do lazy updates of rt->dst->__use, at most once
> + * per jiffy, to avoid contention on such cacheline.
> + */
> + ts = jiffies;
> + pcpu_rt->dst.__use++;
> + if (pcpu_rt->dst.lastuse != ts) {
> + rt->dst.__use += pcpu_rt->dst.__use;
> + rt->dst.lastuse = ts;
> + pcpu_rt->dst.__use = 0;
> + pcpu_rt->dst.lastuse = ts;
> + }
> } else {
> /* We have to do the read_unlock first
> * because rt6_make_pcpu_route() may trigger
> @@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
> read_unlock_bh(&table->tb6_lock);
> pcpu_rt = rt6_make_pcpu_route(rt);
> dst_release(&rt->dst);
> + rt->dst.lastuse = jiffies;
> + rt->dst.__use++;
> }
>
> trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
> --
> 2.13.5
>
Hi Paolo,
Eric and I discussed about this issue recently as well :).
What about the following change:
diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..33e1d86bcef6 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
static inline void dst_use(struct dst_entry *dst, unsigned long time)
{
dst_hold(dst);
- dst->__use++;
- dst->lastuse = time;
+ if (dst->lastuse != time) {
+ dst->__use++;
+ dst->lastuse = time;
+ }
}
static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
{
- dst->__use++;
- dst->lastuse = time;
+ if (dst->lastuse != time) {
+ dst->__use++;
+ dst->lastuse = time;
+ }
}
static inline struct dst_entry *dst_clone(struct dst_entry *dst)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..e195f093add3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
struct fib6_table *table,
struct rt6_info *pcpu_rt;
- rt->dst.lastuse = jiffies;
- rt->dst.__use++;
+ dst_use_noref(rt, jiffies);
pcpu_rt = rt6_get_pcpu_route(rt);
if (pcpu_rt) {
This way we always only update dst->__use and dst->lastuse at most
once per jiffy. And we don't really need to update pcpu and then do
the copy over from pcpu_rt to rt operation.
Another thing is that I don't really see any places making use of
dst->__use. So maybe we can also get rid of this dst->__use field?
Thanks.
Wei
Powered by blists - more mailing lists