[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d001e22-c9fe-60d2-a775-40e1c44a1c56@huawei.com>
Date: Fri, 24 May 2024 18:40:43 +0800
From: Yue Haibing <yuehaibing@...wei.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
<jhs@...atatu.com>, <xiyou.wangcong@...il.com>, <jiri@...nulli.us>,
<hannes@...essinduktion.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] net/sched: Add xmit_recursion level in
sch_direct_xmit()
On 2024/5/24 17:24, Eric Dumazet wrote:
> On Fri, May 24, 2024 at 10:49 AM Yue Haibing <yuehaibing@...wei.com> wrote:
>>
>> packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit
>> WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path while ipvlan
>> device has qdisc queue.
>>
>> WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70
>> Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper
>> CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
>> RIP: 0010:sk_mc_loop+0x2d/0x70
>> Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c
>> RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212
>> RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001
>> RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000
>> RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00
>> R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000
>> R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000
>> FS: 0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> <IRQ>
>> ? __warn+0x83/0x130
>> ? sk_mc_loop+0x2d/0x70
>> ? report_bug+0x18e/0x1a0
>> ? handle_bug+0x3c/0x70
>> ? exc_invalid_op+0x18/0x70
>> ? asm_exc_invalid_op+0x1a/0x20
>> ? sk_mc_loop+0x2d/0x70
>> ip6_finish_output2+0x31e/0x590
>> ? nf_hook_slow+0x43/0xf0
>> ip6_finish_output+0x1f8/0x320
>> ? __pfx_ip6_finish_output+0x10/0x10
>> ipvlan_xmit_mode_l3+0x22a/0x2a0 [ipvlan]
>> ipvlan_start_xmit+0x17/0x50 [ipvlan]
>> dev_hard_start_xmit+0x8c/0x1d0
>> sch_direct_xmit+0xa2/0x390
>> __qdisc_run+0x66/0xd0
>> net_tx_action+0x1ca/0x270
>> handle_softirqs+0xd6/0x2b0
>> __irq_exit_rcu+0x9b/0xc0
>> sysvec_apic_timer_interrupt+0x75/0x90
>
> Please provide full symbols in stack traces.
Call Trace:
<IRQ>
? __warn (kernel/panic.c:693)
? sk_mc_loop (net/core/sock.c:775 net/core/sock.c:760)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:239)
? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? sk_mc_loop (net/core/sock.c:775 net/core/sock.c:760)
ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1))
? nf_hook_slow (./include/linux/netfilter.h:154 net/netfilter/core.c:626)
ip6_finish_output (net/ipv6/ip6_output.c:211 net/ipv6/ip6_output.c:222)
? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215)
ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:498 drivers/net/ipvlan/ipvlan_core.c:538 drivers/net/ipvlan/ipvlan_core.c:602) ipvlan
ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan
dev_hard_start_xmit (./include/linux/netdevice.h:4882 ./include/linux/netdevice.h:4896 net/core/dev.c:3578 net/core/dev.c:3594)
sch_direct_xmit (net/sched/sch_generic.c:343)
__qdisc_run (net/sched/sch_generic.c:416)
net_tx_action (./include/net/sch_generic.h:219 ./include/net/pkt_sched.h:128 ./include/net/pkt_sched.h:124 net/core/dev.c:5286)
handle_softirqs (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:555)
__irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043 arch/x86/kernel/apic/apic.c:1043)
>
>> </IRQ>
>>
>> Fixes: f60e5990d9c1 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
>> Signed-off-by: Yue Haibing <yuehaibing@...wei.com>
>> ---
>> include/linux/netdevice.h | 17 +++++++++++++++++
>> net/core/dev.h | 17 -----------------
>> net/sched/sch_generic.c | 8 +++++---
>> 3 files changed, 22 insertions(+), 20 deletions(-)
>
> This patch seems unrelated to the WARN_ON_ONCE(1) met in sk_mc_loop()
>
> If sk_mc_loop() is called with a socket which is not inet, we are in trouble.
>
> Please fix the root cause instead of trying to shortcut sk_mc_loop() as you did.
First setup like this:
ip netns add ns0
ip netns add ns1
ip link add ip0 link eth0 type ipvlan mode l3 vepa
ip link add ip1 link eth0 type ipvlan mode l3 vepa
ip link set ip0 netns ns0
ip link exec ip link set ip0 up
ip link set ip1 netns ns1
ip link exec ip link set ip1 up
ip link exec tc qdisc add dev ip1 root netem delay 10ms
Second, build and send a raw ipv6 multicast packet as attached repro in ns1
packet_sendmsg
packet_snd //skb->sk is packet sk
__dev_queue_xmit
__dev_xmit_skb //q->enqueue is not NULL
__qdisc_run
qdisc_restart
sch_direct_xmit
dev_hard_start_xmit
netdev_start_xmit
ipvlan_start_xmit
ipvlan_xmit_mode_l3 //l3 mode
ipvlan_process_outbound //vepa flag
ipvlan_process_v6_outbound //skb->protocol is ETH_P_IPV6
ip6_local_out
...
__ip6_finish_output
ip6_finish_output2 //multicast packet
sk_mc_loop //dev_recursion_level is 0
WARN_ON_ONCE //sk->sk_family is AF_PACKET
> .
>
View attachment "repro.c" of type "text/plain" (12818 bytes)
Powered by blists - more mailing lists