[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150614085259.GA1911@wfg-t540p.sh.intel.com>
Date: Sun, 14 Jun 2015 16:52:59 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: fengguang.wu@...el.com, LKP <lkp@...org>,
LKML <linux-kernel@...r.kernel.org>
Subject: [sched] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1188
do_set_cpus_allowed()
Hi Peter,
0day kernel testing robot got the below dmesg and the first bad commit is
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
commit 2ea0fdfb65e03094bba3a705b6e00c9a500d5101
Author: Peter Zijlstra <peterz@...radead.org>
AuthorDate: Fri May 15 17:43:34 2015 +0200
Commit: Peter Zijlstra <peterz@...radead.org>
CommitDate: Tue Jun 2 15:46:15 2015 +0200
sched: Fix a race between __kthread_bind() and sched_setaffinity()
Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY
without locks, a caller might observe an old value and race with the
set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo
it.
__kthread_bind()
do_set_cpus_allowed()
<SYSCALL>
sched_setaffinity()
if (p->flags & PF_NO_SETAFFINITIY)
set_cpus_allowed_ptr()
p->flags |= PF_NO_SETAFFINITY
Fix the issue by putting everything under the regular scheduler locks.
This also closes a hole in the serialization of
task_struct::{nr_,}cpus_allowed.
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: mingo@...nel.org
Cc: riel@...hat.com
Cc: dedekind1@...il.com
Cc: mgorman@...e.de
Cc: rostedt@...dmis.org
Cc: juri.lelli@....com
Acked-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org
+-------------------------------------------------------+------------+------------+------------+
| | 94309ae387 | 2ea0fdfb65 | a28f5721ea |
+-------------------------------------------------------+------------+------------+------------+
| boot_successes | 900 | 289 | 209 |
| boot_failures | 0 | 11 | 3 |
| WARNING:at_kernel/sched/core.c:#do_set_cpus_allowed() | 0 | 11 | 3 |
| backtrace:smpboot_thread_fn | 0 | 11 | 3 |
+-------------------------------------------------------+------------+------------+------------+
[ 25.671242] EDD information not available.
[ 25.672183] Unregister pv shared memory for cpu 0
[ 25.986014] ------------[ cut here ]------------
[ 25.986014] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1188 do_set_cpus_allowed+0x8e/0xf0()
[ 25.986014] CPU: 0 PID: 11 Comm: migration/0 Not tainted 4.1.0-rc6-00115-g2ea0fdf #1
[ 25.986014] 00000000 00000000 50a4bd48 41a09dc0 00000000 50a4bd7c 4105710c 41ce1bec
[ 25.986014] 00000000 0000000b 41ce30cc 000004a4 4108324e 000004a4 4108324e 4af38190
[ 25.986014] 00000002 4af382d8 50a4bd8c 41057200 00000009 00000000 50a4bda4 4108324e
[ 25.986014] Call Trace:
[ 25.986014] [<41a09dc0>] dump_stack+0x48/0x60
[ 25.986014] [<4105710c>] warn_slowpath_common+0x8c/0xc0
[ 25.986014] [<4108324e>] ? do_set_cpus_allowed+0x8e/0xf0
[ 25.986014] [<4108324e>] ? do_set_cpus_allowed+0x8e/0xf0
[ 25.986014] [<41057200>] warn_slowpath_null+0x20/0x30
[ 25.986014] [<4108324e>] do_set_cpus_allowed+0x8e/0xf0
[ 25.986014] [<4108359b>] select_fallback_rq+0x2eb/0x300
[ 25.986014] [<41084d1c>] migration_call+0xec/0x290
[ 25.986014] [<41562ece>] ? __delay+0xe/0x10
[ 25.986014] [<41079c89>] notifier_call_chain+0x59/0x70
[ 25.986014] [<4107a19e>] __raw_notifier_call_chain+0x1e/0x30
[ 25.986014] [<41057256>] cpu_notify+0x26/0x50
[ 25.986014] [<41a01462>] take_cpu_down+0x22/0x40
[ 25.986014] [<410f152d>] multi_cpu_stop+0xcd/0x140
[ 25.986014] [<410f1460>] ? cpu_stop_park+0x60/0x60
[ 25.986014] [<410f17d5>] cpu_stopper_thread+0x85/0x120
[ 25.986014] [<410a3fec>] ? trace_hardirqs_on_caller+0x13c/0x1e0
[ 25.986014] [<410a409b>] ? trace_hardirqs_on+0xb/0x10
[ 25.986014] [<410aa65e>] ? do_raw_spin_lock+0xe/0x1c0
[ 25.986014] [<410a3fec>] ? trace_hardirqs_on_caller+0x13c/0x1e0
[ 25.986014] [<410a409b>] ? trace_hardirqs_on+0xb/0x10
[ 25.986014] [<4107c5f4>] smpboot_thread_fn+0x144/0x250
[ 25.986014] [<4107c4b0>] ? sort_range+0x30/0x30
[ 25.986014] [<410789aa>] kthread+0xba/0xd0
[ 25.986014] [<41080000>] ? ftrace_raw_output_sched_wake_idle_without_ipi+0x20/0x70
[ 25.986014] [<41a13461>] ret_from_kernel_thread+0x21/0x30
[ 25.986014] [<410788f0>] ? insert_kthread_work+0x90/0x90
[ 25.986014] ---[ end trace 39f17a9725435b83 ]---
[ 26.079472] CPU 0 is now offline
git bisect start a28f5721ea3e387cabb9bdd69a56fa0519b827bf c65b99f046843d2455aa231747b5a07a999a9f3d --
git bisect bad 9d7edaa508475a065be5db19af006a06e999c64f # 17:53 0- 6 Merge 'bluetooth/master' into devel-spot-201506030813
git bisect bad 2ec19bfaf5097b0d51f637127ec322b8a07aad53 # 17:53 0- 19 Merge 'x86-mpx/mpx-v24' into devel-spot-201506030813
git bisect bad 75b43e7865e3d49fe5d26439295affa5c7dbfa13 # 17:53 0- 18 Merge 'dledford/k.o/for-4.2' into devel-spot-201506030813
git bisect good f6ed40e1cf411383f199b0c5444296b3ac19f547 # 17:53 300+ 0 Merge 'peterz-queue/perf/core' into devel-spot-201506030813
git bisect bad 2a8e405a6d774f98ca0a74b0c4a8e6e177c4a79c # 17:53 0- 17 Merge 'amirv/for-upstream' into devel-spot-201506030813
git bisect bad 4dda3866e84d40a66817a61a4cd34939aa5fd691 # 17:53 0- 20 Merge 'peterz-queue/master' into devel-spot-201506030813
git bisect good 2689c71810a6c5de51b8745921d242daf3f71005 # 18:01 300+ 224 kvm tools: Stop init if check_extensions failed
git bisect good 6ec97151b4c1f95bb7ca383d6b91b3e990471259 # 18:01 300+ 0 Merge branch 'x86/fpu'
git bisect good 915111910c2ffcefe5fa876d64727fdb647286fc # 18:13 300+ 376 kvm tools: use iovec functions in uip_rx
git bisect good 43fad805ca8dd7c7f46b7e52e819c4e0752de7dc # 18:13 300+ 0 Merge branch 'tools/kvm'
git bisect good 0868aa22167d93dd974c253d259c3e6fd47a16c8 # 18:23 300+ 3 Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD
git bisect bad 59a906d08da336582829285caeb654d48677dab3 # 18:23 0- 20 sched: Change sched_class::set_cpus_allowed calling context
git bisect bad e22b8a5970d1a2c544a319105e715e7f266a62b8 # 18:24 0- 17 sched: Make sched_class::set_cpus_allowed() unconditional
git bisect good 30f9ab57b5415c7a92d870f800e14c6c4ad46252 # 18:26 300+ 0 sched,rt: Convert switched_{from,to}_rt() / prio_changed_rt() to balance callbacks
git bisect good dbd10e5378726f2e3e17b7e932d63133da4fba55 # 07:09 300+ 0 Merge branch 'sched/urgent'
git bisect bad 2ea0fdfb65e03094bba3a705b6e00c9a500d5101 # 07:13 14- 1 sched: Fix a race between __kthread_bind() and sched_setaffinity()
git bisect good 94309ae3871464d3e4105234ba66cb92fa63c1f8 # 19:09 300+ 0 sched: Move code around
# first bad commit: [2ea0fdfb65e03094bba3a705b6e00c9a500d5101] sched: Fix a race between __kthread_bind() and sched_setaffinity()
git bisect good 94309ae3871464d3e4105234ba66cb92fa63c1f8 # 19:26 900+ 0 sched: Move code around
# extra tests with DEBUG_INFO
git bisect bad 2ea0fdfb65e03094bba3a705b6e00c9a500d5101 # 19:36 11- 1 sched: Fix a race between __kthread_bind() and sched_setaffinity()
# extra tests on HEAD of linux-devel/devel-spot-201506030813
git bisect bad a28f5721ea3e387cabb9bdd69a56fa0519b827bf # 19:36 0- 3 0day head guard for 'devel-spot-201506030813'
# extra tests on tree/branch peterz-queue/sched/core
git bisect bad d4374648628fdefd467393b3bf797f348ecad42d # 19:49 68- 2 sched: prevent throttle in early pick_next_task_fair
# extra tests on tree/branch linus/master
git bisect good df5f4158415b6fc4a2d683c6fdc806be6da176bc # 09:09 900+ 1 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
# extra tests on tree/branch next/master
git bisect good d9b5ec5b1b4d4055e256674de4a5337f6a66d284 # 22:12 900+ 591 Add linux-next specific files for 20150612
This script may reproduce the error.
----------------------------------------------------------------------------
#!/bin/bash
kernel=$1
initrd=quantal-core-i386.cgz
wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
kvm=(
qemu-system-x86_64
-enable-kvm
-cpu kvm64
-kernel $kernel
-initrd $initrd
-m 300
-smp 2
-device e1000,netdev=net0
-netdev user,id=net0
-boot order=nc
-no-reboot
-watchdog i6300esb
-rtc base=localtime
-serial stdio
-display none
-monitor null
)
append=(
hung_task_panic=1
earlyprintk=ttyS0,115200
systemd.log_level=err
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
drbd.minor_count=8
)
"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------
Thanks,
Fengguang
View attachment "dmesg-quantal-vp-27:20150612071229:i386-randconfig-b0-06030907:4.1.0-rc6-00115-g2ea0fdf:1" of type "text/plain" (199464 bytes)
Powered by blists - more mailing lists