[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150125043428.GA6109@wfg-t540p.sh.intel.com>
Date: Sat, 24 Jan 2015 20:34:28 -0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Peter Zijlstra <peterz@...radead.org>, LKP <lkp@...org>,
linux-kernel@...r.kernel.org
Subject: [perf] WARNING: CPU: 0 PID: 1457 at kernel/events/core.c:890
add_event_to_ctx()
Greetings,
0day kernel testing robot got the below dmesg and the first bad commit is
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
commit d26bb7f73a2881f2412c340a27438b185f0cc3dc
Author: Mark Rutland <mark.rutland@....com>
AuthorDate: Wed Jan 7 15:01:54 2015 +0000
Commit: Peter Zijlstra <peterz@...radead.org>
CommitDate: Fri Jan 23 15:17:56 2015 +0100
perf: decouple unthrottling and rotating
Currently the adjusments made as part of perf_event_task_tick use the
percpu rotation lists to iterate over any active PMU contexts, but these
are not used by the context rotation code, having been replaced by
separate (per-context) hrtimer callbacks. However, some manipulation of
the rotation lists (i.e. removal of contexts) has remained in
perf_rotate_context. This leads to the following issues:
* Contexts are not always removed from the rotation lists. Removal of
PMUs which have been placed in rotation lists, but have not been
removed by a hrtimer callback can result in corruption of the rotation
lists (when memory backing the context is freed).
This has been observed to result in hangs when PMU drivers built as
modules are inserted and removed around the creation of events for
said PMUs.
* Contexts which do not require rotation may be removed from the
rotation lists as a result of a hrtimer, and will not be considered by
the unthrottling code in perf_event_task_tick.
This patch solves these issues by moving any and all removal of contexts
from rotation lists to only occur when the final event is removed from a
context, mirroring the addition which only occurs when the first event
is added to a context. The vestigal manipulation of the rotation lists
is removed from perf_event_rotate_context.
As the rotation_list variables are not used for rotation, these are
renamed to active_ctx_list, which better matches their current function.
perf_pmu_rotate_{start,stop} are renamed to
perf_pmu_ctx_{activate,deactivate}.
Cc: Will Deacon <will.deacon@....com>
Cc: Paul Mackerras <paulus@...ba.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Will Deacon <will.deacon@....com>
Cc: Paul Mackerras <paulus@...ba.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>
Signed-off-by: Mark Rutland <mark.rutland@....com>
Reported-by: Johannes Jensen <johannes.jensen@....com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Link: http://lkml.kernel.org/r/1420642914-22760-1-git-send-email-mark.rutland@arm.com
===================================================
PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
===================================================
Attached dmesg for the parent commit, too, to help confirm whether it is a noise error.
Fengguang: the old OOM errors look like independent noises.
+------------------------------------------------------------------+------------+------------+------------+
| | 2e67200461 | d26bb7f73a | b0f9997908 |
+------------------------------------------------------------------+------------+------------+------------+
| boot_successes | 0 | 0 | 0 |
| boot_failures | 1900 | 900 | 22 |
| page_allocation_failure:order:#,mode | 1040 | 567 | 10 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 1040 | 567 | 10 |
| backtrace:ring_buffer_consumer_thread | 1040 | 567 | 10 |
| backtrace:lock_torture_stats | 1040 | 567 | 10 |
| WARNING:at_net/netlink/genetlink.c:#genl_unbind() | 860 | 54 | |
| backtrace:do_group_exit | 860 | 5 | |
| backtrace:SyS_exit_group | 860 | 5 | |
| backtrace:netlink_setsockopt | 236 | 49 | |
| backtrace:SyS_setsockopt | 236 | 49 | |
| backtrace:SyS_socketcall | 236 | 49 | |
| WARNING:at_kernel/events/core.c:#add_event_to_ctx() | 0 | 333 | 12 |
| BUG:kernel_test_hang | 0 | 333 | 12 |
| backtrace:inherit_group | 0 | 328 | 12 |
| backtrace:perf_event_init_task | 0 | 328 | 12 |
| backtrace:do_fork | 0 | 328 | 12 |
| backtrace:SyS_clone | 0 | 328 | 12 |
| backtrace:perf_install_in_context | 0 | 5 | |
| backtrace:SyS_perf_event_open | 0 | 5 | |
+------------------------------------------------------------------+------------+------------+------------+
[main] Setsockopt(1 8 80d1000 4) on fd 86 [1:1:1]
[main] Setsockopt(1 2a 80d1000 4) on fd 87 [1:1:1]
[ 34.700861] ------------[ cut here ]------------
[ 34.701372] WARNING: CPU: 0 PID: 1457 at kernel/events/core.c:890 add_event_to_ctx+0x253/0x270()
[ 34.702515] CPU: 0 PID: 1457 Comm: trinity-main Not tainted 3.19.0-rc4-gd26bb7f #2
[ 34.702931] 00000000 00000000 c0911e2c cd8a61df c0911e48 cd052cfa 0000037a cd0d32f3
[ 34.702931] c0c206d0 c0c20590 d3c9e0a0 c0911e58 cd052dd4 00000009 00000000 c0911e78
[ 34.702931] cd0d32f3 d3c9e214 00000000 00000000 c0c20598 00000246 c0c50990 c0911e90
[ 34.702931] Call Trace:
[ 34.702931] [<cd8a61df>] dump_stack+0x16/0x18
[ 34.702931] [<cd052cfa>] warn_slowpath_common+0x6a/0xa0
[ 34.702931] [<cd0d32f3>] ? add_event_to_ctx+0x253/0x270
[ 34.702931] [<cd052dd4>] warn_slowpath_null+0x14/0x20
[ 34.702931] [<cd0d32f3>] add_event_to_ctx+0x253/0x270
[ 34.702931] [<cd0da60f>] inherit_event+0xef/0x240
[ 34.702931] [<cd0da778>] inherit_group+0x18/0x70
[ 34.702931] [<cd0d2884>] ? alloc_perf_context+0x24/0x50
[ 34.702931] [<cd0db927>] perf_event_init_task+0x117/0x310
[ 34.702931] [<cd050c67>] copy_process+0x477/0x14f0
[ 34.702931] [<cd052063>] do_fork+0xb3/0x430
[ 34.702931] [<cd0923fd>] ? do_setitimer+0x13d/0x220
[ 34.702931] [<cd09251a>] ? alarm_setitimer+0x3a/0x60
[ 34.702931] [<cd05246b>] SyS_clone+0x1b/0x20
[ 34.702931] [<cd8ad3bd>] syscall_call+0x7/0x7
[ 34.702931] [<cd8a0000>] ? xen_chk_extra_mem+0x10/0x70
[ 34.702931] ---[ end trace 19d6cac21f26a758 ]---
git bisect start b0f99979082f6aafe6f2d4342e44907a4bb6b710 ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc --
git bisect bad 7c4e3ef2ae4f008776d1d2d13c862179146bbb07 # 08:44 0- 28 Merge 'arm-platforms/irq/die-gic-arch-extn-die-die-die' into devel-roam-rand-201501240027
git bisect bad 5dcbd81bc253c6fb786a3c4d0c2304d00353cc83 # 08:45 0- 928 Merge 'peterz-queue/perf/urgent' into devel-roam-rand-201501240027
git bisect good bd0e15d797d00b7115e1950ee13fec7ce001f064 # 09:09 900+ 900 Merge 'peterz-queue/locking/core' into devel-roam-rand-201501240027
git bisect bad d8c008a82490f75ca16101567e167213486288aa # 10:14 351- 358 Merge 'peterz-queue/perf/core' into devel-roam-rand-201501240027
git bisect bad 18966e0b34261132be50b8624be368db80b529cf # 11:19 291- 293 perf, x86: use context switch callback to flush LBR stack
git bisect good 44b4c3b252ffefe36900df247d528e9550ee20c4 # 12:31 900+ 900 perf: Add pmu callbacks to track event mapping and unmapping
git bisect bad d26bb7f73a2881f2412c340a27438b185f0cc3dc # 13:34 509- 510 perf: decouple unthrottling and rotating
git bisect good e8923a02fab8e3a2e74cebace2ae73cbf1f0dd09 # 14:00 900+ 900 x86, perf: Only allow rdpmc if a perf_event is mapped
git bisect good 2e67200461d1eec17062de4947d07f3e6afd0848 # 14:26 900+ 900 x86, perf: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
# first bad commit: [d26bb7f73a2881f2412c340a27438b185f0cc3dc] perf: decouple unthrottling and rotating
git bisect good 2e67200461d1eec17062de4947d07f3e6afd0848 # 14:44 1000+ 1900 x86, perf: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
# extra tests with DEBUG_INFO
# extra tests on HEAD of linux-devel/devel-roam-rand-201501240027
git bisect bad b0f99979082f6aafe6f2d4342e44907a4bb6b710 # 14:48 0- 22 0day head guard for 'devel-roam-rand-201501240027'
# extra tests on tree/branch peterz-queue/perf/core
git bisect bad 6f637dfc22bc3e963c6936cdf1bb6550a9d3e955 # 16:02 274- 278 perf,powerpc: Fix up flush_branch_stack users
# extra tests on tree/branch linus/master
git bisect good c4e00f1d31c4c83d15162782491689229bd92527 # 17:12 1000+ 644 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
# extra tests on tree/branch next/master
git bisect good de3d2c5b941c632685ab58613f981bf14a42676f # 17:23 1000+ 528 Add linux-next specific files for 20150123
This script may reproduce the error.
----------------------------------------------------------------------------
#!/bin/bash
kernel=$1
initrd=yocto-minimal-i386.cgz
wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
kvm=(
qemu-system-x86_64
-cpu kvm64
-enable-kvm
-kernel $kernel
-initrd $initrd
-m 320
-smp 1
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-rtc base=localtime
-serial stdio
-display none
-monitor null
)
append=(
hung_task_panic=1
earlyprintk=ttyS0,115200
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
drbd.minor_count=8
)
"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------
Thanks,
Fengguang
View attachment "dmesg-yocto-ivb41-39:20150124134027:i386-randconfig-r2-0121:3.19.0-rc4-gd26bb7f:2" of type "text/plain" (316200 bytes)
View attachment "dmesg-quantal-client1-10:20150124141037:i386-randconfig-r2-0121:3.19.0-rc4-g2e67200:4" of type "text/plain" (147475 bytes)
View attachment "config-3.19.0-rc4-gd26bb7f" of type "text/plain" (88469 bytes)
_______________________________________________
LKP mailing list
LKP@...ux.intel.com
Powered by blists - more mailing lists