[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Yjnrz7vL9HqE5UBz@shredder>
Date: Tue, 22 Mar 2022 17:31:27 +0200
From: Ido Schimmel <idosch@...sch.org>
To: David Ahern <dsahern@...nel.org>
Cc: netdev@...r.kernel.org, kuba@...nel.org, davem@...emloft.net,
greearb@...delatech.com
Subject: Re: [PATCH net-next] net: Add l3mdev index to flow struct and avoid
oif reset for port devices
On Tue, Mar 22, 2022 at 08:26:48AM -0600, David Ahern wrote:
> On 3/22/22 3:22 AM, Ido Schimmel wrote:
> > On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote:
> >> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> >> index 2af2b99e0bea..fb0e49c36c2e 100644
> >> --- a/net/ipv4/fib_trie.c
> >> +++ b/net/ipv4/fib_trie.c
> >> @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> >> !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> >> return false;
> >>
> >> - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
> >> - if (flp->flowi4_oif &&
> >> - flp->flowi4_oif != nhc->nhc_oif)
> >> - return false;
> >> - }
> >> + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> >> + return false;
> >
> > David, we have several test cases that are failing which I have tracked
> > down to this patch.
> >
> > Before the patch, if the original output interface was enslaved to a
> > VRF, the output interface in the flow struct would be updated to the VRF
> > and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above
> > check to be skipped.
> >
> > After the patch, the check is no longer skipped, as original output
> > interface is retained and the flag was removed.
> >
> > This breaks scenarios where a GRE tunnel specifies a dummy device
> > enslaved to a VRF as its physical device. The purpose of this
> > configuration is to redirect the underlay lookup to the table associated
> > with the VRF to which the dummy device is enslaved to. The check fails
> > because 'flp->flowi4_oif' points to the dummy device, whereas
> > 'nhc->nhc_oif' points to the interface via which the encapsulated packet
> > should egress.
> >
> > Skipping the check when an l3mdev was set seems to solve the problem:
> >
> > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> > index fb0e49c36c2e..cf1164e05d92 100644
> > --- a/net/ipv4/fib_trie.c
> > +++ b/net/ipv4/fib_trie.c
> > @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> > !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> > return false;
> >
> > - if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> > + if (!flp->flowi4_l3mdev &&
> > + flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> > return false;
> >
> > return true;
> >
> > AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels
> > because 'RT6_LOOKUP_F_IFACE' is not set in
> > ip6_route_output_flags_noref() in this case.
> >
> > WDYT? I plan to test this patch in our regression, but I'm not sure if I
> > missed other cases that might remain broken.
>
> one of the requests with VRF has been to bind a socket to a port device
> and expect the lookup to enforce use of that egress port (e.g.,
> multipath). Switching the oif to the VRF device and then ignoring the
> oif check was making that check too flexible for that use case.
I see
>
> What's the callchain for this failure? Perhaps the
> FLOWI_FLAG_SKIP_NH_OIF needs to be kept for this particular use case.
This is the stack trace for the failure:
fib_lookup_good_nhc+5
fib_table_lookup+3281
fib4_rule_action+501
fib_rules_lookup+858
__fib_lookup+233
fib_lookup.constprop.0+926
ip_route_output_key_hash_rcu+3707
ip_route_output_key_hash+392
ip_route_output_flow+33
ip_tunnel_xmit+1794
gre_tap_xmit+1312
dev_hard_start_xmit+448
sch_direct_xmit+615
__dev_queue_xmit+4841
The GRE tap is using a dummy device enslaved to a VRF as its physical
device.
Powered by blists - more mailing lists