[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKBzOOMSQv5U8vpRcNtEYmPtOzqOWLxNgyjAnGOC=Bx+A@mail.gmail.com>
Date: Tue, 8 Oct 2024 16:15:37 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Breno Leitao <leitao@...ian.org>
Cc: Paolo Abeni <pabeni@...hat.com>, David Ahern <dsahern@...nel.org>,
"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, rmikey@...a.com,
kernel-team@...a.com, horms@...nel.org,
"open list:NETWORKING [IPv4/IPv6]" <netdev@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()
On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@...ian.org> wrote:
>
> Hello Paolo,
>
> On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > On 10/4/24 19:37, Breno Leitao wrote:
> > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > this commit adjusts the function to favor the IPv6 path.
> > > > >
> > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > >
> > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > version in many deployments, and should benefit modern network
> > > > > configurations.
> > > > >
> > > > > Signed-off-by: Breno Leitao <leitao@...ian.org>
> > > > > ---
> > > > > include/net/route.h | 6 +++---
> > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > --- a/include/net/route.h
> > > > > +++ b/include/net/route.h
> > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > > struct net_device *dev = rt->dst.dev;
> > > > > struct neighbour *neigh;
> > > > > - if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > - } else if (rt->rt_gw_family == AF_INET6) {
> > > > > + if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > > *is_v6gw = true;
> > > > > + } else if (rt->rt_gw_family == AF_INET) {
> > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > } else {
> > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > > }
> > > >
> > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > >
> > > Right. In which case is this called on IPv6 only systems?
> > >
> > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > it is mispredicted 100% of the time.
> >
> > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > perf script
> >
> > should give you an hint.
>
> Thanks. That proved to be very useful.
>
> As I said above, all the hosts I have a webserver running, I see this
> that likely mispredicted. Same for this server:
>
> # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
> correct incorrect % Function File Line
> 0 17127 100 ip_neigh_for_gw route.h 393
>
> It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> say that these machine has no IPv4 configured, except 127.0.0.1
> (localhost).
Now run the experiment on a typical server using IPv4 ?
I would advise removing the likely() if it really bothers you.
(I doubt this has any impact)
But assuming everything is IPv6 is too soon.
There are more obvious changes like :
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
*skb, struct net *net)
/* When the interface is in promisc. mode, drop all the crap
* that it receives, do not try to analyse it.
*/
- if (skb->pkt_type == PACKET_OTHERHOST) {
+ if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
drop_reason = SKB_DROP_REASON_OTHERHOST;
goto drop;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
*skb, struct net_device *dev,
u32 pkt_len;
struct inet6_dev *idev;
- if (skb->pkt_type == PACKET_OTHERHOST) {
+ if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
return NULL;
Powered by blists - more mailing lists