lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZwVGW7FexXBNm_8F@gmail.com>
Date: Tue, 8 Oct 2024 15:48:59 +0100
From: Breno Leitao <leitao@...ian.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Paolo Abeni <pabeni@...hat.com>, David Ahern <dsahern@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, rmikey@...a.com,
	kernel-team@...a.com, horms@...nel.org,
	"open list:NETWORKING [IPv4/IPv6]" <netdev@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] net: Optimize IPv6 path in ip_neigh_for_gw()

Hello Eric,

On Tue, Oct 08, 2024 at 04:15:37PM +0200, Eric Dumazet wrote:
> On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@...ian.org> wrote:
> >
> > Hello Paolo,
> >
> > On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote:
> > > On 10/4/24 19:37, Breno Leitao wrote:
> > > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote:
> > > > > On 10/4/24 10:27 AM, Breno Leitao wrote:
> > > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts
> > > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
> > > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks,
> > > > > > this commit adjusts the function to favor the IPv6 path.
> > > > > >
> > > > > > Swap the order of the conditional statements and move the 'likely'
> > > > > > annotation to the IPv6 case. This change aims to improve performance in
> > > > > > IPv6-dominant environments by reducing branch mispredictions.
> > > > > >
> > > > > > This optimization aligns with the trend of IPv6 becoming the default IP
> > > > > > version in many deployments, and should benefit modern network
> > > > > > configurations.
> > > > > >
> > > > > > Signed-off-by: Breno Leitao <leitao@...ian.org>
> > > > > > ---
> > > > > >   include/net/route.h | 6 +++---
> > > > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/include/net/route.h b/include/net/route.h
> > > > > > index 1789f1e6640b..b90b7b1effb8 100644
> > > > > > --- a/include/net/route.h
> > > > > > +++ b/include/net/route.h
> > > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
> > > > > >         struct net_device *dev = rt->dst.dev;
> > > > > >         struct neighbour *neigh;
> > > > > > -       if (likely(rt->rt_gw_family == AF_INET)) {
> > > > > > -               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > > -       } else if (rt->rt_gw_family == AF_INET6) {
> > > > > > +       if (likely(rt->rt_gw_family == AF_INET6)) {
> > > > > >                 neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
> > > > > >                 *is_v6gw = true;
> > > > > > +       } else if (rt->rt_gw_family == AF_INET) {
> > > > > > +               neigh = ip_neigh_gw4(dev, rt->rt_gw4);
> > > > > >         } else {
> > > > > >                 neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
> > > > > >         }
> > > > >
> > > > > This is an IPv4 function allowing support for IPv6 addresses as a
> > > > > nexthop. It is appropriate for IPv4 family checks to be first.
> > > >
> > > > Right. In which case is this called on IPv6 only systems?
> > > >
> > > > On my IPv6-only 200 systems, the annotated branch predictor is showing
> > > > it is mispredicted 100% of the time.
> > >
> > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag;
> > > perf script
> > >
> > > should give you an hint.
> >
> > Thanks. That proved to be very useful.
> >
> > As I said above, all the hosts I have a webserver running, I see this
> > that likely mispredicted. Same for this server:
> >
> >         # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw
> >          correct incorrect  %        Function                  File              Line
> >                0    17127 100 ip_neigh_for_gw                route.h              393
> >
> > It is mostly coming from ip_finish_output2() and tcp_v4. Important to
> > say that these machine has no IPv4 configured, except 127.0.0.1
> > (localhost).
> 
> Now run the experiment on a typical server using IPv4 ?
> 
> I would advise removing the likely() if it really bothers you.
> (I doubt this has any impact)

Thanks. I am mostly concerned about likely/unlikely() that are wrong
100% the time when running production workloads in modern hardware.

I just got a few hundreds host to able to run annotated
branches enabled, and I am looking on how it can help us to understand
our code flow better.

Regarding performance impact, I agree with you that performance is
minimal (at least on x86). On other architectures, such as powerpc
things can be more evident, given that the branch hint is encoded in the
instruction itself, so, the hardware knows where to predict.

That said, I think the best approach is just to remove the likely() from
that code path.

> But assuming everything is IPv6 is too soon.
> 
> There are more obvious changes like :
> 
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea
> 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
> *skb, struct net *net)
>         /* When the interface is in promisc. mode, drop all the crap
>          * that it receives, do not try to analyse it.
>          */
> -       if (skb->pkt_type == PACKET_OTHERHOST) {
> +       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
>                 dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
>                 drop_reason = SKB_DROP_REASON_OTHERHOST;
>                 goto drop;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60
> 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff
> *skb, struct net_device *dev,
>         u32 pkt_len;
>         struct inet6_dev *idev;
> 
> -       if (skb->pkt_type == PACKET_OTHERHOST) {
> +       if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) {
>                 dev_core_stats_rx_otherhost_dropped_inc(skb->dev);
>                 kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST);
>                 return NULL;

Agree, that would be obvious changes also.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ