netdev - Re: INFO: rcu detected stall in skb_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+YnjQ0Pie9roWzb_1=Dfgv=Uws6QhhcVKsLo2YY==0ahA@mail.gmail.com>
Date:   Sat, 26 May 2018 17:38:37 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     syzbot <syzbot+cac7c17ec0aca89d3c45@...kaller.appspotmail.com>
Cc:     Andrei Vagin <avagin@...tuozzo.com>,
        David Miller <davem@...emloft.net>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        LKML <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: INFO: rcu detected stall in skb_free_head

On Sun, Apr 29, 2018 at 6:33 PM, syzbot
<syzbot+cac7c17ec0aca89d3c45@...kaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +0000)
> Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=cac7c17ec0aca89d3c45
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6517400396627968
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+cac7c17ec0aca89d3c45@...kaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: rcu_sched self-detected stall on CPU
>         1-...!: (117917 ticks this GP) idle=036/1/4611686018427387906
> softirq=114416/114416 fqs=32
>          (t=125000 jiffies g=60712 c=60711 q=345938)
> rcu_sched kthread starved for 124847 jiffies! g60712 c60711 f0x2
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> RCU grace-period kthread stack dump:
> rcu_sched       R  running task    23592     9      2 0x80000000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> NMI backtrace for cpu 1
> CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.17.0-rc1+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events rht_deferred_worker
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
>  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
>  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:kfree+0x124/0x260 mm/slab.c:3814
> RSP: 0018:ffff8801db105450 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: ffff88006c118040 RCX: 1ffff1003b3059e7
> RDX: 0000000000000000 RSI: ffff8801d982cf90 RDI: 0000000000000286
> RBP: ffff8801db105470 R08: ffff8801d982ce78 R09: 0000000000000002
> R10: ffff8801d982c640 R11: 0000000000000000 R12: 0000000000000286
> R13: ffff8801dac00ac0 R14: ffffffff85bd7b69 R15: ffff88006c0f8180
>  skb_free_head+0x99/0xc0 net/core/skbuff.c:550
>  skb_release_data+0x690/0x860 net/core/skbuff.c:570
>  skb_release_all+0x4a/0x60 net/core/skbuff.c:627
>  __kfree_skb net/core/skbuff.c:641 [inline]
>  kfree_skb+0x195/0x560 net/core/skbuff.c:659
>  enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968
>  netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181
>  netif_rx+0xba/0x400 net/core/dev.c:4206
>  loopback_xmit+0x283/0x741 drivers/net/loopback.c:91
>  __netdev_start_xmit include/linux/netdevice.h:4087 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4096 [inline]
>  xmit_one net/core/dev.c:3053 [inline]
>  dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069
>  __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
>  neigh_hh_output include/net/neighbour.h:472 [inline]
>  neigh_output include/net/neighbour.h:480 [inline]
>  ip_finish_output2+0x1046/0x1840 net/ipv4/ip_output.c:229
>  ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
>  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>  ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
>  dst_output include/net/dst.h:444 [inline]
>  ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
>  ip_queue_xmit+0x9d7/0x1f70 net/ipv4/ip_output.c:504
>  sctp_v4_xmit+0x108/0x140 net/sctp/protocol.c:983
>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
>  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406

#syz fix: sctp: not allow to set rto_min with a value below 200 msecs


>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1046
>  </IRQ>
>  do_softirq.part.17+0x14d/0x190 kernel/softirq.c:329
>  do_softirq arch/x86/include/asm/preempt.h:23 [inline]
>  __local_bh_enable_ip+0x1ec/0x230 kernel/softirq.c:182
>  __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:176 [inline]
>  _raw_spin_unlock_bh+0x30/0x40 kernel/locking/spinlock.c:200
>  spin_unlock_bh include/linux/spinlock.h:355 [inline]
>  rhashtable_rehash_chain lib/rhashtable.c:292 [inline]
>  rhashtable_rehash_table lib/rhashtable.c:333 [inline]
>  rht_deferred_worker+0x1058/0x1fb0 lib/rhashtable.c:432
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 126s!
> BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 126s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=8/256
>     in-flight: 24:rht_deferred_worker
>     pending: rht_deferred_worker, defense_work_handler,
> defense_work_handler, perf_sched_delayed, defense_work_handler,
> switchdev_deferred_process_work, cache_reap
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=11/256
>     pending: jump_label_update_timeout, defense_work_handler,
> defense_work_handler, defense_work_handler, defense_work_handler,
> defense_work_handler, defense_work_handler, check_corruption,
> key_garbage_collector, vmstat_shepherd, cache_reap
> workqueue events_long: flags=0x0
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
>     pending: br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=5/256
>     pending: br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup, br_fdb_cleanup,
> br_fdb_cleanup
> workqueue events_unbound: flags=0x2
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512
>     in-flight: 10434:fsnotify_connector_destroy_workfn
> workqueue events_power_efficient: flags=0x80
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: neigh_periodic_work
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256
>     pending: check_lifetime, gc_worker, do_cache_clean, neigh_periodic_work
> workqueue rcu_gp: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: process_srcu
> workqueue mm_percpu_wq: flags=0x8
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>     pending: vmstat_update
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
>     pending: vmstat_update
> workqueue writeback: flags=0x4e
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256
>     in-flight: 13940:wb_workfn
> workqueue kblockd: flags=0x18
>   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=2/256
>     pending: blk_mq_timeout_work, blk_mq_timeout_work
> workqueue dm_bufio_cache: flags=0x8
>   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
>     pending: work_fn
> workqueue ipv6_addrconf: flags=0x40008
>   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
>     pending: addrconf_verify_work
> workqueue krdsd: flags=0xe000a
>   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1
>     in-flight: 301:rds_connect_worker
> pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=127s workers=3 idle: 4831 4860
> pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=8 idle: 301 22 6 7437 6882
> 50
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@...glegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@...glegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000799e6c056aff4a6f%40google.com.
> For more options, visit https://groups.google.com/d/optout.