[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140708021149.GA12956@localhost>
Date: Tue, 8 Jul 2014 10:11:49 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Tejun Heo <tj@...nel.org>
Cc: Jet Chen <jet.chen@...el.com>, Su Tao <tao.su@...el.com>,
Yuanhan Liu <yuanhan.liu@...el.com>, LKP <lkp@...org>,
linux-kernel@...r.kernel.org
Subject: [workqueue] WARNING: CPU: 0 PID: 76 at drivers/block/floppy.c:968
schedule_bh()
Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit b22ce2785d97423846206cceec4efee0c4afd980
Author: Tejun Heo <tj@...nel.org>
AuthorDate: Wed Aug 28 17:33:37 2013 -0400
Commit: Tejun Heo <tj@...nel.org>
CommitDate: Thu Aug 29 09:19:28 2013 -0400
workqueue: cond_resched() after processing each work item
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine. Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs. The two would deadlock.
Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports. With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.
Fix it by invoking cond_resched() after executing each work item.
Signed-off-by: Tejun Heo <tj@...nel.org>
Reported-by: Jamie Liu <jamieliu@...gle.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@...r.kernel.org
--
kernel/workqueue.c | 9 +++++++++
1 file changed, 9 insertions(+)
+---------------------------------------------------------+------------+------------+------------------+
| | c95389b4cd | b22ce2785d | v3.16-rc4_070708 |
+---------------------------------------------------------+------------+------------+------------------+
| boot_successes | 900 | 249 | 79 |
| boot_failures | 0 | 51 | 2 |
| WARNING:CPU:PID:at_drivers/block/floppy.c:schedule_bh() | 0 | 51 | 2 |
| backtrace:user_reset_fdc | 0 | 51 | 2 |
| backtrace:floppy_async_init | 0 | 51 | 2 |
| backtrace:async_run_entry_fn | 0 | 51 | 2 |
+---------------------------------------------------------+------------+------------+------------------+
[ 45.121121] PCI: setting IRQ 11 as level-triggered
[ 45.407538] e1000 0000:00:03.0: setting latency timer to 64
[ 45.686040] ------------[ cut here ]------------
[ 45.975060] WARNING: CPU: 0 PID: 76 at drivers/block/floppy.c:968 schedule_bh+0x47/0x80()
[ 47.312537] CPU: 0 PID: 76 Comm: kworker/u2:1 Not tainted 3.11.0-rc7-00025-gb22ce27 #8
[ 47.848171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 48.433222] Workqueue: events_unbound async_run_entry_fn
[ 48.725491] 00000000 d25d7e0c c2745b10 d25d7e3c c105472d c2ff0b10 00000000 0000004c
[ 49.494639] c30df57c 000003c8 c1a2fb57 c1a2fb57 00000001 c1a33aa0 c39a0ca0 d25d7e4c
[ 51.548252] c1054812 00000009 00000000 d25d7e5c c1a2fb57 00000000 00000000 d25d7e84
[ 56.319716] Call Trace:
[ 58.164457] [<c2745b10>] dump_stack+0x16/0x18
[ 61.355641] [<c105472d>] warn_slowpath_common+0x8d/0xb0
[ 61.822789] [<c1a2fb57>] ? schedule_bh+0x47/0x80
[ 62.103589] [<c1a2fb57>] ? schedule_bh+0x47/0x80
[ 62.538170] [<c1a33aa0>] ? floppy_hardint+0x1a0/0x1a0
[ 62.920432] [<c1054812>] warn_slowpath_null+0x22/0x30
[ 63.411821] [<c1a2fb57>] schedule_bh+0x47/0x80
[ 66.271423] [<c1a33ed5>] wait_til_done+0x15/0x230
[ 71.033237] [<c1063260>] ? cascade+0xa0/0xa0
[ 75.990887] [<c1a348d7>] user_reset_fdc+0xa7/0x110
[ 78.921153] [<c384c939>] floppy_async_init+0x9ee/0x1263
[ 83.512228] [<c108a0e4>] async_run_entry_fn+0x44/0x1e0
[ 85.519262] [<c16901b2>] ? __list_del_entry+0x202/0x220
[ 87.120223] [<c1075687>] ? pwq_dec_nr_in_flight+0x137/0x140
[ 89.499077] [<c1075f5b>] process_one_work+0x33b/0x840
[ 90.274017] [<c108f542>] ? ttwu_do_wakeup+0x12/0x90
[ 91.294072] [<c16901b2>] ? __list_del_entry+0x202/0x220
[ 91.634105] [<c1076874>] worker_thread+0x414/0x890
[ 91.994817] [<c1076460>] ? process_one_work+0x840/0x840
[ 92.265965] [<c1080ba6>] kthread+0xa6/0xb0
[ 92.658144] [<c2763f57>] ret_from_kernel_thread+0x1b/0x28
[ 92.971880] [<c1080b00>] ? __kthread_unpark+0x70/0x70
[ 93.254541] ---[ end trace 857f8343e4eb716b ]---
[ 93.508601] floppy0: no floppy controllers found
git bisect start v3.11 v3.10 --
git bisect good 89d0abe3d695103505c025dde6e07b9c3dd772f4 # 13:01 300+ 0 Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
git bisect good 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8 # 13:18 300+ 0 userns: limit the maximum depth of user_namespace->parent chain
git bisect good 94fc5d9de5bd757ad46f0d94bc4ebf617c4487f6 # 13:35 300+ 0 proc: return on proc_readdir error
git bisect good 5e25e4f304ce53004dcd01725938c0bc45c75d35 # 13:48 300+ 0 Merge tag 'usb-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good 06a557f7a68e126181f09831cb2fac6f7a7f43c2 # 14:22 300+ 1 Merge tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
git bisect bad 936dbcc3f2fc7314f3c1645fc39ea51117f6d73b # 14:36 15- 2 Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 155e3a353943a3e713cd33357cfd58e995900a81 # 14:50 13- 1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect bad ff497452636f4687e517964817b7e2bd99f4b44b # 15:06 48- 2 Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
git bisect bad b22ce2785d97423846206cceec4efee0c4afd980 # 15:51 13- 2 workqueue: cond_resched() after processing each work item
# first bad commit: [b22ce2785d97423846206cceec4efee0c4afd980] workqueue: cond_resched() after processing each work item
git bisect good c95389b4cd6a4b52af78bea706a274453e886251 # 16:51 900+ 0 Merge branch 'akpm' (patches from Andrew Morton)
git bisect bad 98610962bbb418f6d847c402637ca14e78c38344 # 16:51 0- 2 0day head guard for 'devel-hourly-2014070708'
git bisect bad cd3de83f147601356395b57a8673e9c5ff1e59d1 # 16:51 0- 34 Linux 3.16-rc4
git bisect bad d584f2acf5cc6c4a14c330930222d8e70fe024a6 # 18:02 62- 1 Add linux-next specific files for 20140707
This script may reproduce the error.
-----------------------------------------------------------------------------
#!/bin/bash
kernel=$1
initrd=yocto-minimal-i386.cgz
wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
kvm=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $kernel
-initrd $initrd
-smp 2
-m 256M
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio
-net user,vlan=0
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-serial stdio
-display none
-monitor null
)
append=(
debug
sched_debug
apic=debug
ignore_loglevel
sysrq_always_enabled
panic=10
prompt_ramdisk=0
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
)
"${kvm[@]}" --append "${append[*]}"
-----------------------------------------------------------------------------
Thanks,
Fengguang
View attachment "dmesg-yocto-ivb41-99:20140707155130:i386-randconfig-c1-07070935::" of type "text/plain" (48547 bytes)
Download attachment "i386-randconfig-c1-07070935-98610962bbb418f6d847c402637ca14e78c38344-WARNING:---at----schedule_bh+-x-71922.log" of type "application/octet-stream" (62212 bytes)
View attachment "config-3.11.0-rc7-00025-gb22ce27" of type "text/plain" (84480 bytes)
_______________________________________________
LKP mailing list
LKP@...ux.intel.com
Powered by blists - more mailing lists