[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <df2bda3f-11ac-4c13-9d92-b44ea0f81da6@uliege.be>
Date: Thu, 6 Mar 2025 19:14:29 +0100
From: Justin Iurman <justin.iurman@...ege.be>
To: Ido Schimmel <idosch@...sch.org>
Cc: netdev@...r.kernel.org, davem@...emloft.net, dsahern@...nel.org,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
Alexander Aring <alex.aring@...il.com>, David Lebrun <dlebrun@...gle.com>
Subject: Re: [PATCH net v2 2/3] net: ipv6: fix lwtunnel loops in ioam6, rpl
and seg6
On 2/17/25 15:40, Ido Schimmel wrote:
> On Sun, Feb 16, 2025 at 06:31:06PM +0200, Ido Schimmel wrote:
>> On Thu, Feb 13, 2025 at 11:51:49PM +0100, Justin Iurman wrote:
>>> On 2/13/25 14:28, Ido Schimmel wrote:
>>>> On Tue, Feb 11, 2025 at 11:16:23PM +0100, Justin Iurman wrote:
>>>>> When the destination is the same post-transformation, we enter a
>>>>> lwtunnel loop. This is true for ioam6_iptunnel, rpl_iptunnel, and
>>>>> seg6_iptunnel, in both input() and output() handlers respectively, where
>>>>> either dst_input() or dst_output() is called at the end. It happens for
>>>>> instance with the ioam6 inline mode, but can also happen for any of them
>>>>> as long as the post-transformation destination still matches the fib
>>>>> entry. Note that ioam6_iptunnel was already comparing the old and new
>>>>> destination address to prevent the loop, but it is not enough (e.g.,
>>>>> other addresses can still match the same subnet).
>>>>>
>>>>> Here is an example for rpl_input():
>>>>>
>>>>> dump_stack_lvl+0x60/0x80
>>>>> rpl_input+0x9d/0x320
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> [...]
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> lwtunnel_input+0x64/0xa0
>>>>> ip6_sublist_rcv_finish+0x85/0x90
>>>>> ip6_sublist_rcv+0x236/0x2f0
>>>>>
>>>>> ... until rpl_do_srh() fails, which means skb_cow_head() failed.
>>>>>
>>>>> This patch prevents that kind of loop by redirecting to the origin
>>>>> input() or output() when the destination is the same
>>>>> post-transformation.
>>>>
>>>> A loop was reported a few months ago with a similar stack trace:
>>>> https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
Ido,
That loop is another beast which is out of scope of the series I'm about
to send. Indeed, what I'm doing right now is to prevent reentry loops
within lwtunnel_{input|output}(). Which, by the way, is also applied to
seg6_local no matter what. The reported loop above is an infinite ping
pong game between two fib rules (vs an infinite loop within the same fib
rule -- what I'm fixing). If we want to fix that issue as well, we may
reuse something like dev_xmit_recursion() in
lwtunnel_{input|output|xmit}() to catch these buggy cases. Thoughts?
>> [...]
>>
>> BTW, I noticed that bpf implements the xmit() hook in addition to
>> input()/output(). I wonder if a loop is possible in the following case:
>>
>> ip_finish_output2() <----+
>> lwtunnel_xmit() |
>> bpf_xmit() |
>> // bpf program does not change |
>> // the packet and returns |
>> // BPF_LWT_REROUTE |
>> bpf_lwt_xmit_reroute() |
>> // unmodified packet resolves |
>> // the same dst entry |
>> dst_output() |
>> ip_output() -------------+
>
> FWIW, verified that this is indeed the case. Reproducer:
>
> $ cat lwt_xmit_repo.bpf.c
> // SPDX-License-Identifier: GPL-2.0
> #include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
>
> SEC("lwt_xmit")
> int repo(struct __sk_buff *skb)
> {
> return BPF_LWT_REROUTE;
> }
> $ clang -O2 -target bpf -c lwt_xmit_repo.bpf.c -o lwt_xmit_repo.o
> # ip link add name dummy1 up type dummy
> # ip route add 192.0.2.0/24 nexthop encap bpf xmit obj ./lwt_xmit_repo.o sec lwt_xmit dev dummy1
> # ping 192.0.2.1
This one's also something special because it's neither input nor output,
it's xmit. In that case, we cannot apply the same fix as for the others
(ioam6, rpl, seg6, ila). Here, what I suggest is simply to disallow
BPF_LWT_REROUTE when the dst_entry remains unchanged (which is, IMO, a
buggy case), as follows:
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index ae74634310a3..ee3546d78903 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -180,6 +180,7 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
struct net_device *l3mdev =
l3mdev_master_dev_rcu(skb_dst(skb)->dev);
int oif = l3mdev ? l3mdev->ifindex : 0;
struct dst_entry *dst = NULL;
+ struct dst_entry *orig_dst;
int err = -EAFNOSUPPORT;
struct sock *sk;
struct net *net;
@@ -201,6 +202,8 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
net = dev_net(skb_dst(skb)->dev);
}
+ orig_dst = skb_dst(skb);
+
if (ipv4) {
struct iphdr *iph = ip_hdr(skb);
struct flowi4 fl4 = {};
@@ -254,6 +257,16 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
if (unlikely(err))
goto err;
+ /* avoid lwtunnel_xmit() reentry loop when destination is the same
+ * after transformation (i.e., disallow BPF_LWT_REROUTE when
dst_entry
+ * remains the same).
+ */
+ if (orig_dst->lwtstate == dst->lwtstate) {
+ dst_release(dst);
+ err = -EINVAL;
+ goto err;
+ }
+
skb_dst_drop(skb);
skb_dst_set(skb, dst);
Powered by blists - more mailing lists