[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200923235002.GA25818@ICIPI.localdomain>
Date: Wed, 23 Sep 2020 19:50:02 -0400
From: Stephen Suryaputra <ssuryaextr@...il.com>
To: David Ahern <dsahern@...il.com>
Cc: netdev@...r.kernel.org
Subject: Re: ip rule iif oif and vrf
On Tue, Sep 22, 2020 at 09:39:36AM -0600, David Ahern wrote:
> >
> > We have a use case where there are multiple user VRFs being leak routed
> > to and from tunnels that are on the core VRF. Traffic from user VRF to a
> > tunnel can be done the normal way by specifying the netdev directly on
> > the route entry on the user VRF route table:
> >
> > ip route add <prefix> via <tunnel_end_point_addr> dev <tunnel_netdev>
> >
> > But traffic received on the tunnel must be leak routed directly to the
> > respective a specific user VRF because multiple user VRFs can have
> > duplicate address spaces. I am thinking of using ip rule but when the
> > iif is an enslaved device, the rule doesn't get matched because the
> > ifindex in the skb is the master.
> >
> > My question is: is this a bug, or is there anything else that can be
> > done to make sure that traffic from a tunnel being routed directly to a
> > user VRF? If it is the later, I can work on a patch.
> >
Is there a better way to implement this use case? Seems that it's a
common one for VRFs.
>
> Might be a side effect of the skb dev change. I would like to remove
> that but it is going to be challenge at this point.
>
> take a look at:
> perf record -a -e fib:* -g
> <packets through the tunnel>
> <Ctrl-C>
> perf script
>
> What does it say for the lookups - input arguments, table, etc?
>
> Any chance you can re-recreate this using namespaces as the different nodes?
I have a reproducer using namespaces attached in this email (gre_setup.sh).
A ping is initiated from h0:
ip netns exec h0 ping -c 1 11.0.0.2
As I have seen on our target platform, the iif is the VRF device. In
this case it is the core VRF. Thus, the ip rule with iif equals to the
GRE tunnel doesn't get hit.
This is I think is the relevant perf script output. All the outputs are
in the attached perf_script.txt.
ksoftirqd/0 9 [000] 2933.555444: fib:fib_table_lookup: table 100 oif 0 iif 6 proto 0 10.0.0.2/0 -> 11.0.0.2/0 tos 0 scope 0 flags 4 ==> dev - gw 0.0.0.0/:: err -113
ffffffffbda04f2e fib_table_lookup+0x4ce ([kernel.kallsyms])
ffffffffbda04f2e fib_table_lookup+0x4ce ([kernel.kallsyms])
ffffffffbda0fbd6 fib4_rule_action+0x66 ([kernel.kallsyms])
ffffffffbd96cde3 fib_rules_lookup+0x133 ([kernel.kallsyms])
ffffffffbda0f6ea __fib_lookup+0x6a ([kernel.kallsyms])
ffffffffbd9ac4de ip_route_input_slow+0x98e ([kernel.kallsyms])
ffffffffbd9ac81a ip_route_input_rcu+0x15a ([kernel.kallsyms])
ffffffffbd9ac978 ip_route_input_noref+0x28 ([kernel.kallsyms])
ffffffffbd9aec0b ip_rcv_finish_core.isra.0+0x6b ([kernel.kallsyms])
ffffffffbd9aefcb ip_rcv_finish+0x6b ([kernel.kallsyms])
ffffffffbd9af9fc ip_rcv+0xbc ([kernel.kallsyms])
ffffffffbd935d78 __netif_receive_skb_one_core+0x88 ([kernel.kallsyms])
ffffffffbd935dc8 __netif_receive_skb+0x18 ([kernel.kallsyms])
ffffffffbd935e55 netif_receive_skb_internal+0x45 ([kernel.kallsyms])
ffffffffbd9378af napi_gro_receive+0xff ([kernel.kallsyms])
ffffffffbd989bce gro_cell_poll+0x5e ([kernel.kallsyms])
ffffffffbd936eda net_rx_action+0x13a ([kernel.kallsyms])
ffffffffbde000e1 __softirqentry_text_start+0xe1 ([kernel.kallsyms])
ffffffffbd0a809b run_ksoftirqd+0x2b ([kernel.kallsyms])
ffffffffbd0cf420 smpboot_thread_fn+0xd0 ([kernel.kallsyms])
ffffffffbd0c84a4 kthread+0x104 ([kernel.kallsyms])
ffffffffbdc00202 ret_from_fork+0x22 ([kernel.kallsyms])
The r1 namespace has these netdevs:
sudo ip netns exec r1 ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: gre0@...E: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0
3: gretap0@...E: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
4: erspan0@...E: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
5: vrf_r1t: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether ba:76:0d:e4:3b:93 brd ff:ff:ff:ff:ff:ff
6: vrf_r1c: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 82:90:68:d3:e1:ff brd ff:ff:ff:ff:ff:ff
7: gre10@..._r1c: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 65511 qdisc noqueue master vrf_r1c state UNKNOWN mode DEFAULT group default qlen 1000
link/gre 1.1.1.2 peer 1.1.1.1
26: r1_v10@...7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_r1c state UP mode DEFAULT group default qlen 1000
link/ether 12:00:73:33:bc:f2 brd ff:ff:ff:ff:ff:ff link-netns r0
29: r1_v11@...8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_r1t state UP mode DEFAULT group default qlen 1000
link/ether 8a:ff:65:ff:ba:58 brd ff:ff:ff:ff:ff:ff link-netns h1
The iif when the fib is being looked up is the vrf_r1c (6). There is
another error in the perf_script where the iif is the lo device and the
oif is vrf_r1c.
ping 4343 [000] 2933.554428: fib:fib_table_lookup: table 100 oif 6 iif 1 proto 1 10.0.0.2/0 -> 11.0.0.2/0 tos 0 scope 0 flags 4 ==> dev - gw 0.0.0.0/:: err -113
Thanks.
Download attachment "gre_setup.sh" of type "application/x-sh" (4054 bytes)
View attachment "perf_script.txt" of type "text/plain" (43085 bytes)
Powered by blists - more mailing lists