netdev - Re: [Bug reporting] kernel panic during handle the dst unreach icmp msg.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da4182b2-61ab-3ac8-b8a7-1b2b90dcbd54@gmail.com>
Date:   Thu, 14 Feb 2019 09:15:22 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     soukjin.bae@...sung.com,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc:     박종언 <jongeon.park@...sung.com>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: [Bug reporting] kernel panic during handle the dst unreach icmp
 msg.



On 02/13/2019 11:46 PM, 배석진 wrote:
> Dear all,
> 
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg256527.html
> 
> as we concerned before at above mail thread,
> we faced a problem cased by not removed socket.
> 
> (from now, 'the socket' means the socket alloced at 0xFFFFFFC0051E5E00)
> 
> #1. the socket is state in TIME_WAIT1. maybe it's process closed the socket.
>     below is memory dump information with Trace32.
> 
>   (struct sock *)0xFFFFFFC0051E5E00 = 0xFFFFFFC0051E5E00 = end+0x3FF9E4CE00 -> (
>     __sk_common = (
>        ...
>        skc_rcv_saddr = 0x0200A8C0,   ==> 192.168.0.2
>        ...
>        skc_state = 4,                ==> TIME_WAIT1
>        ...
>        skc_flags = 0x4301,           ==> SOCK_DEAD(0x01) set
> 
> 
> #2. user changed WIFI AP to another one, so previous netdevice deleted and destroied it's sockets.
> 
> [60392.948657][4:            netd] 02-13 00:39:32.095  5249  5323 I NetdDestroyed 30 sockets on 192.168.0.2 in 2.7 ms
> [60392.948705][4:            netd] 02-13 00:39:32.095  5249  5323 D Netdnotify() code: 614, msg: Address removed 192.168.0.2/24 wlan0 128 0
> 
>   --> the socket will be exist for a while.
>       because of 'sock_diag_destory() -> tcp_abort()' can not call tcp_done() for the socket.
>       but clearing the socket's sk_write_queue by calling tcp_write_queue_purge(sk).
> 
> 
> #3. icmp msg(dst unreach) came for sent packet by the socket.
>     to retransmit them, lookup sk and fint it. (because the socket still exist)
>     but it's sk_write_queue was already cleared so has no skb to send.
>     and make the kernel bug.
> 
> <4>[60392.948306] I[1:    ksoftirqd/1:   19] ------------[ cut here ]------------
> <0>[60392.948334] I[1:    ksoftirqd/1:   19] kernel BUG at net/ipv4/tcp_ipv4.c:519!
> <2>[60392.948344] I[1:    ksoftirqd/1:   19] sec_debug_set_extra_info_fault = BUG / 0xffffff80090351d0
> <0>[60392.948386] I[1:    ksoftirqd/1:   19] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> ...
> <4>[60392.950676] I[1:    ksoftirqd/1:   19] PC is at tcp_v4_err+0x4b0/0x4bc
> <4>[60392.950684] I[1:    ksoftirqd/1:   19] LR is at tcp_v4_err+0x3ac/0x4bc
> 
> 
> 370 void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
> 371 {
>         ...
> 516		icsk->icsk_rto = inet_csk_rto_backoff(icsk, TCP_RTO_MAX);
> 517
> 518		skb = tcp_write_queue_head(sk);
> 519		BUG_ON(!skb);
> 520
> 521		tcp_mstamp_refresh(tp);
> 
> 
> we know that the line 519 removed on latest state. instead this will be shown to kernel panic.
> how about below change? do not retransmit packets when socket was already closed.
> 
> best regards,
> 
> 
> 
> From: soukjin bae <soukjin.bae@...sung.com>
> Date: Wen, 14 Jan 2019 14:26:35 +0900
> Subject: net: Don't retransmit packets when socket was already closed
>  
> Signed-off-by: soukjin bae <soukjin.bae@...sung.com>
> Signed-off-by: jongeon park <jongeon.park@...sung.com>
> ---
>  net/ipv4/tcp_ipv4 | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/tcp_ipv4 b/net/ipv4/tcp_ipv4
> index fe4daf6..654bd19 100755
> --- a/net/ipv4/tcp_ipv4
> +++ b/net/ipv4/tcp_ipv4
> 
> @@ -442,6 +465,10 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
>  		err = EPROTO;
>  		break;
>  	case ICMP_DEST_UNREACH:
> +		/* Don't retransmit packets when socket was already closed */
> +		if (sock_flag(sk, SOCK_DEAD))
> +			goto out;
> +
>  		if (code > NR_ICMP_UNREACH)
>  			goto out;
> 

I do not believe this patch is needed.

You probably hit another more serious bug, but since you do not post the full stack trace
it is hard to help.

Are you using vti tunnel ?

I just got a syzbot report that might give us a clue :

(I suspect commit 61220ab349485d911083d0b7990ccd3db6c63297 vti6: Enable namespace changing
was wrong, since vti tunnels have t->net assigned to a struct net without holding a reference)

So we end up freeing a struct net (and associated resources) too soon.


BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: slab-out-of-bounds in queued_spin_trylock include/asm-generic/qspinlock.h:69 [inline]
BUG: KASAN: slab-out-of-bounds in do_raw_spin_trylock+0x6a/0x180 kernel/locking/spinlock_debug.c:119
Read of size 4 at addr ffff888066405d9c by task syz-executor.4/10575

CPU: 0 PID: 10575 Comm: syz-executor.4 Not tainted 5.0.0-rc6+ #70
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 kasan_check_read+0x11/0x20 mm/kasan/common.c:100
 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
 queued_spin_trylock include/asm-generic/qspinlock.h:69 [inline]
 do_raw_spin_trylock+0x6a/0x180 kernel/locking/spinlock_debug.c:119
 __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
 _raw_spin_trylock+0x1c/0x80 kernel/locking/spinlock.c:128
 spin_trylock include/linux/spinlock.h:339 [inline]
 icmp_xmit_lock net/ipv4/icmp.c:219 [inline]
 icmp_send+0x54c/0x1400 net/ipv4/icmp.c:665
 ipv4_link_failure+0x2c/0x210 net/ipv4/route.c:1187
 dst_link_failure include/net/dst.h:427 [inline]
 vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
 vti6_tnl_xmit+0x10db/0x1c6e net/ipv6/ip6_vti.c:553
 __netdev_start_xmit include/linux/netdevice.h:4385 [inline]
 netdev_start_xmit include/linux/netdevice.h:4394 [inline]
 xmit_one net/core/dev.c:3278 [inline]
 dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3294
 __dev_queue_xmit+0x26e5/0x2fe0 net/core/dev.c:3864
 dev_queue_xmit+0x18/0x20 net/core/dev.c:3897
 neigh_direct_output+0x16/0x20 net/core/neighbour.c:1516
 neigh_output include/net/neighbour.h:508 [inline]
 ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
 ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc4/0x1b0 net/ipv4/ip_output.c:124
 __ip_queue_xmit+0x86f/0x1bf0 net/ipv4/ip_output.c:505
 ip_queue_xmit+0x5a/0x70 include/net/ip.h:198
 __tcp_transmit_skb+0x1a5f/0x3680 net/ipv4/tcp_output.c:1160
 tcp_transmit_skb net/ipv4/tcp_output.c:1176 [inline]
 tcp_write_xmit+0xe89/0x5160 net/ipv4/tcp_output.c:2401
 __tcp_push_pending_frames+0xb4/0x350 net/ipv4/tcp_output.c:2577
 tcp_send_fin+0x149/0xbb0 net/ipv4/tcp_output.c:3122
 tcp_close+0xddf/0x10c0 net/ipv4/tcp.c:2405
 inet_release+0x105/0x1f0 net/ipv4/af_inet.c:428
 __sock_release+0xd3/0x250 net/socket.c:579
 sock_close+0x1b/0x30 net/socket.c:1139
 __fput+0x2df/0x8d0 fs/file_table.c:278
 ____fput+0x16/0x20 fs/file_table.c:309
 task_work_run+0x14a/0x1c0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_32_irqs_on arch/x86/entry/common.c:341 [inline]
 do_fast_syscall_32+0xa9d/0xc98 arch/x86/entry/common.c:397
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fe8869
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:000000000845fdac EFLAGS: 00000216 ORIG_RAX: 0000000000000006
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 9609:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc mm/kasan/common.c:504 [inline]
 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:411
 kmem_cache_alloc_node+0x144/0x710 mm/slab.c:3633
 alloc_task_struct_node kernel/fork.c:158 [inline]
 dup_task_struct kernel/fork.c:845 [inline]
 copy_process.part.0+0x1d08/0x79a0 kernel/fork.c:1753
 copy_process kernel/fork.c:1710 [inline]
 _do_fork+0x257/0xfe0 kernel/fork.c:2227
 __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline]
 __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline]
 __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236
 do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
 do_int80_syscall_32+0x14d/0x670 arch/x86/entry/common.c:349
 entry_INT80_compat+0x76/0x80 arch/x86/entry/entry_64_compat.S:413

Freed by task 9627:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kmem_cache_free+0x86/0x260 mm/slab.c:3749
 free_task_struct kernel/fork.c:163 [inline]
 free_task+0xdd/0x120 kernel/fork.c:458
 __put_task_struct+0x20a/0x4e0 kernel/fork.c:731
 put_task_struct include/linux/sched/task.h:98 [inline]
 delayed_put_task_struct+0x1fd/0x350 kernel/exit.c:181
 __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
 rcu_do_batch kernel/rcu/tree.c:2452 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2773 [inline]
 rcu_process_callbacks+0x928/0x1390 kernel/rcu/tree.c:2754
 __do_softirq+0x266/0x95a kernel/softirq.c:292

The buggy address belongs to the object at ffff888066404540
 which belongs to the cache task_struct(81:syz5) of size 6080
The buggy address is located 156 bytes to the right of
 6080-byte region [ffff888066404540, ffff888066405d00)
The buggy address belongs to the page:
page:ffffea0001990100 count:1 mapcount:0 mapping:ffff888092e85080 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00026efe08 ffffea0002554f08 ffff888092e85080
raw: 0000000000000000 ffff888066404540 0000000100000001 ffff8880602fe480
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8880602fe480

Memory state around the buggy address:
 ffff888066405c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888066405d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888066405d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                            ^
 ffff888066405e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888066405e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc