[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <eb5601d0-2c95-4485-31b2-e5cb5ff3037c@redhat.com>
Date: Thu, 19 Jan 2023 14:18:12 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: brouer@...hat.com, Jakub Kicinski <kuba@...nel.org>,
netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
pabeni@...hat.com
Subject: Re: [PATCH net-next V2 2/2] net: kfree_skb_list use
kmem_cache_free_bulk
On 19/01/2023 12.46, Eric Dumazet wrote:
> On Thu, Jan 19, 2023 at 12:22 PM Jesper Dangaard Brouer
> <jbrouer@...hat.com> wrote:
>>
>>
>>
>> On 19/01/2023 11.28, Eric Dumazet wrote:
>>> On Thu, Jan 19, 2023 at 11:18 AM Jesper Dangaard Brouer
>>> <jbrouer@...hat.com> wrote:
>>>>
>>>>
>>>>
>>>> On 19/01/2023 03.26, Jakub Kicinski wrote:
>>>>> On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote:
>>>>>>> + skb_mark_not_on_list(segs);
>>>>>>
>>>>>> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list().
>>>>>>
>>>>>> I don't understand why I cannot clear skb->next here?
>>>>>
>>>>> Some of the skbs on the list are not private?
>>>>> IOW we should only unlink them if skb_unref().
>>>>
>>>> Yes, you are right.
>>>>
>>>> The skb_mark_not_on_list() should only be called if __kfree_skb_reason()
>>>> returns true, meaning the SKB is ready to be free'ed (as it calls/check
>>>> skb_unref()).
>>>
>>>
>>> This was the case already before your changes.
>>>
>>> skb->next/prev can not be shared by multiple users.
>>>
>>> One skb can be put on a single list by definition.
>>>
>>> Whoever calls kfree_skb_list(list) owns all the skbs->next|prev found
>>> in the list
>>>
>>> So you can mangle skb->next as you like, even if the unref() is
>>> telling that someone
>>> else has a reference on skb.
>>
>> Then why does the bug go way if I remove the skb_mark_not_on_list() call
>> then?
>>
>
> Some side effects.
>
> This _particular_ repro uses a specific pattern that might be defeated
> by a small change.
> (just working around another bug)
>
> Instead of setting skb->next to NULL, try to set it to
>
> skb->next = (struct sk_buff *)0x800;
>
> This might show a different pattern.
Nice trick, I'll use this next time.
I modified to code and added a kfree_skb tracepoint with a known reason
(PROTO_MEM) to capture the callstack. See end of email. Which shows
multicast code path is involved.
trace_kfree_skb(segs, __builtin_return_address(0),
SKB_DROP_REASON_PROTO_MEM);
>>>>
>>>> I will send a proper fix patch shortly... after syzbot do a test on it.
>>>>
>>
>> I've send a patch for syzbot that only calls skb_mark_not_on_list() when
>> unref() and __kfree_skb_reason() "permits" this.
>> I tested it locally with reproducer and it also fixes/"removes" the bug.
>
> This does not mean we will accept a patch with no clear explanation
> other than "this removes a syzbot bug, so this must be good"
>
> Make sure to give precise details on _why_ this is needed or not.
>
> Again, the user of kfree_skb_list(list) _owns_ skb->next for sure.
> If you think this assertion is not true, we are in big trouble.
>
I think I have found an explanation, why/when refcnt can be elevated on
an SKB-list. The skb_shinfo(skb)->flag_list can increase refcnt.
See code:
static void skb_clone_fraglist(struct sk_buff *skb)
{
struct sk_buff *list;
skb_walk_frags(skb, list)
skb_get(list);
}
This walks the SKBs on the shinfo->frag_list and increase the refcnt
(skb->users).
Notice that kfree_skb_list is also called when freeing the SKBs
"frag_list" in skb_release_data().
IMHO this explains why we can only remove the SKB from the list, when
"permitted" by skb_unref(), e.g. if __kfree_skb_reason() returns true.
--Jesper
Call-stack of case with elevated refcnt when walking SKB-list:
--------------------------------------------------------------
repro 3048 [003] 101.689670: skb:kfree_skb:
skbaddr=0xffff888104086600 protocol=0
location=skb_release_data.cold+0x25 reason: PROTO_MEM
ffffffff81bb6448 kfree_skb_list_reason.cold+0x3b
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81bb6448 kfree_skb_list_reason.cold+0x3b
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81bb6484 skb_release_data.cold+0x25
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819137e1 kfree_skb_reason+0x41
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81a1e246 igmp_rcv+0xf6
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819c5e85 ip_protocol_deliver_rcu+0x165
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819c5f12 ip_local_deliver_finish+0x72
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81930b89 __netif_receive_skb_one_core+0x69
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81930e31 process_backlog+0x91
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff8193197b __napi_poll+0x2b
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81931e86 net_rx_action+0x276
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81bcf083 __do_softirq+0xd3
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81087243 do_softirq+0x63
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff810872e4 __local_bh_enable_ip+0x64
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff8192ba8f netif_rx+0xdf
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff8192bb27 dev_loopback_xmit+0x77
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819c9195 ip_mc_finish_output+0x65
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819cc597 ip_mc_output+0x137
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819cd2b8 ip_push_pending_frames+0xa8
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81a04f67 raw_sendmsg+0x607
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff8190153b sock_sendmsg+0x8b
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff8190323b __sys_sendto+0xeb
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff819032b0 __x64_sys_sendto+0x20
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81bb9ffa do_syscall_64+0x3a
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
ffffffff81c000aa entry_SYSCALL_64+0xaa
(/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+)
10af3d syscall+0x1d (/usr/lib64/libc.so.6)
46d3 do_sandbox_none+0x81
(/home/jbrouer/syzbot-kfree_skb_list/repro)
48ce main+0xa5
(/home/jbrouer/syzbot-kfree_skb_list/repro)
29550 __libc_start_call_main+0x80 (/usr/lib64/libc.so.6)
Resolving some symbols:
ip_mc_finish_output+0x65
------------------------
[net-next]$ ./scripts/faddr2line net/ipv4/ip_output.o
ip_mc_finish_output+0x65
ip_mc_finish_output+0x65/0x190:
ip_mc_finish_output at net/ipv4/ip_output.c:356
ip_mc_output+0x137
------------------
[net-next]$ ./scripts/faddr2line vmlinux ip_mc_output+0x137
ip_mc_output+0x137/0x2a0:
skb_network_header at include/linux/skbuff.h:2829
(inlined by) ip_hdr at include/linux/ip.h:21
(inlined by) ip_mc_output at net/ipv4/ip_output.c:401
igmp_rcv+0xf6
-------------
[net-next]$ ./scripts/faddr2line vmlinux igmp_rcv+0xf6
igmp_rcv+0xf6/0x2e0:
igmp_rcv at net/ipv4/igmp.c:1130
Powered by blists - more mailing lists