[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb69bcc246e06a1a53287db571df1b98f82807d2.camel@redhat.com>
Date: Mon, 18 Nov 2019 22:58:30 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Edward Cree <ecree@...arflare.com>
Subject: Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look
hints for list input
On Mon, 2019-11-18 at 15:29 -0500, Willem de Bruijn wrote:
> On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > When doing RX batch packet processing, we currently always repeat
> > the route lookup for each ingress packet. If policy routing is
> > configured, and IPV6_SUBTREES is disabled at build time, we
> > know that packets with the same destination address will use
> > the same dst.
> >
> > This change tries to avoid per packet route lookup caching
> > the destination address of the latest successful lookup, and
> > reusing it for the next packet when the above conditions are
> > in place. Ingress traffic for most servers should fit.
> >
> > The measured performance delta under UDP flood vs a recvmmsg
> > receiver is as follow:
> >
> > vanilla patched delta
> > Kpps Kpps %
> > 1431 1664 +14
>
> Since IPv4 speed-up is almost half and code considerably more complex,
> maybe only do IPv6?
uhmm... I would avoid that kind of assimmetry, and I would not look
down on a 8% speedup, if possible.
> > In the worst-case scenario - each packet has a different
> > destination address - the performance delta is within noise
> > range.
> >
> > v1 -> v2:
> > - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES
> > - fix potential race when fib6_has_custom_rules is set
> > while processing a packet batch
> >
> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > ---
> > net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 36 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> > index ef7f707d9ae3..f559ad6b09ef 100644
> > --- a/net/ipv6/ip6_input.c
> > +++ b/net/ipv6/ip6_input.c
> > @@ -44,10 +44,16 @@
> > #include <net/inet_ecn.h>
> > #include <net/dst_metadata.h>
> >
> > +struct ip6_route_input_hint {
> > + unsigned long refdst;
> > + struct in6_addr daddr;
> > +};
> > +
> > INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *));
> > INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *));
> > static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
> > - struct sk_buff *skb)
> > + struct sk_buff *skb,
> > + struct ip6_route_input_hint *hint)
> > {
> > void (*edemux)(struct sk_buff *skb);
> >
> > @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk,
> > INDIRECT_CALL_2(edemux, tcp_v6_early_demux,
> > udp_v6_early_demux, skb);
> > }
> > - if (!skb_valid_dst(skb))
> > +
> > + if (skb_valid_dst(skb))
> > + return;
> > +
> > + if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr))
> > + __skb_dst_copy(skb, hint->refdst);
> > + else
> > ip6_route_input(skb);
>
> Is it possible to do the address comparison in ip6_list_rcv_finish
> itself and pass a pointer to refdst if safe? To avoid new struct
> definition, memcpy and to have all logic in one place. Need to
> keep a pointer to the prev skb, then, instead.
I haven't tought about that. Sounds promising. I'll try, thanks.
> > static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
> > struct list_head *head)
> > {
> > + struct ip6_route_input_hint _hint, *hint = NULL;
> > struct dst_entry *curr_dst = NULL;
> > struct sk_buff *skb, *next;
> > struct list_head sublist;
> > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
> > skb = l3mdev_ip6_rcv(skb);
> > if (!skb)
> > continue;
> > - ip6_rcv_finish_core(net, sk, skb);
> > + ip6_rcv_finish_core(net, sk, skb, hint);
> > dst = skb_dst(skb);
> > if (curr_dst != dst) {
> > + if (ip6_can_cache_route_hint(net)) {
> > + _hint.refdst = skb->_skb_refdst;
> > + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr,
> > + sizeof(_hint.daddr));
> > + hint = &_hint;
> > + } else {
> > + hint = NULL;
> > + }
>
> not needed. ip6_can_cache_route_hit is the same for all iterations of
> the loop (indeed, compile time static), so if false, hint is never
> set.
I think this is needed, instead: if CONFIG_MULTIPLE_TABLES=y,
fib6_has_custom_rules can change at runtime - from 'false' to 'true'.
If we don't reset 'hint', we could end-up with use-after-free.
Cheers,
Paolo
Powered by blists - more mailing lists