[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171122121849.ybg4loxtctlu7jkk@wfg-t540p.sh.intel.com>
Date: Wed, 22 Nov 2017 20:18:49 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Luca Abeni <luca.abeni@...tannapisa.it>,
Tejun Heo <tj@...nel.org>,
Matt Fleming <matt@...eblueprint.co.uk>,
Nicolas Pitre <nicolas.pitre@...aro.org>,
linux-kernel@...r.kernel.org, lkp@...org,
"Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [migration_cpu_stop] WARNING: CPU: 0 PID: 11 at
kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6
Hi Paul,
On Tue, Nov 21, 2017 at 08:13:35AM -0800, Paul E. McKenney wrote:
>On Tue, Nov 21, 2017 at 09:51:01PM +0800, Fengguang Wu wrote:
>> On Tue, Nov 21, 2017 at 02:34:03PM +0100, Peter Zijlstra wrote:
>> >On Tue, Nov 21, 2017 at 08:04:20PM +0800, Fengguang Wu wrote:
>> >>Hello,
>> >>
>> >>FYI this happens in mainline kernel 4.14.0-00902-g43ff2f4.
>> >>It happens since 4.13.
>> >>
>> >>It occurs in 7 out of 36 boots.
>> >>
>> >>[ 58.171538] no ifx modem active;
>> >>[ 58.211860] ACPI: Preparing to enter system sleep state S5
>> >>[ 58.216823] reboot: Power down
>> >>[ 58.244526] acpi_power_off called
>> >>[ 61.460467] ------------[ cut here ]------------
>> >>[ 61.465481] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6:
>> >> set_task_cpu at kernel/sched/core.c:1187
>> >
>> >Never seen that before.. but I can't even figure out how CPUs end up
>> >offline in that reboot path :/
>> >
>> >So far I just get lost in the callback maze... Rafael any clues?
>>
>> I noticed that it's doing rcu-perf testing before the warning.
>> And it's immediately followed by another warning:
>>
>> [ 61.653178] sched: Unexpected reschedule of offline CPU#1!
>> [ 61.658501] ------------[ cut here ]------------
>> [ 61.663130] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x59/0xbf
>> [ 61.673690] Modules linked in:
>> [ 61.676737] CPU: 0 PID: 11 Comm: migration/0 Tainted: G W 4.14.0-00902-g43ff2f4 #1
>> [ 61.684975] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> [ 61.692977] task: ffff880016c00000 task.stack: ffff880016c08000
>> [ 61.698610] RIP: 0010:native_smp_send_reschedule+0x59/0xbf
>> [ 61.703900] RSP: 0018:ffff880016c0fc68 EFLAGS: 00010086
>> [ 61.710817] RAX: 000000000000002e RBX: 0000000000000001 RCX: 0000000000000000
>> [ 61.721649] RDX: 000000000000002e RSI: 0000000000000003 RDI: ffffed0002d81f83
>> [ 61.732727] RBP: ffff88000ac94640 R08: 0000000000000001 R09: 0000000000000001
>> [ 61.743330] R10: ffff880016c0fbc8 R11: 0000000000000001 R12: ffff88000e54c800
>> [ 61.753861] R13: dffffc0000000000 R14: ffff8800177e0700 R15: 0000000000000000
>> [ 61.764513] FS: 0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
>> [ 61.776370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 61.785022] CR2: 00007f3eca6152c0 CR3: 0000000007a15003 CR4: 00000000000206b0
>> [ 61.797309] Call Trace:
>> [ 61.801165] check_preempt_wakeup+0x386/0x5aa
>> [ 61.807806] check_preempt_curr+0xe3/0x22e
>> [ 61.814220] move_queued_task+0x39b/0x3ab
>> [ 61.820415] migration_cpu_stop+0x216/0x2f2
>> [ 61.826797] ? sched_ttwu_pending+0x1fd/0x1fd
>> [ 61.833542] ? lock_acquire+0xed/0x14b
>> [ 61.838654] ? sched_ttwu_pending+0x1fd/0x1fd
>> [ 61.844336] cpu_stopper_thread+0x1c9/0x25e
>> [ 61.849812] ? cpu_stop_create+0x6a/0x6a
>> [ 61.855049] smpboot_thread_fn+0x615/0x64b
>> [ 61.860459] ? sort_range+0x17/0x17
>> [ 61.863922] ? schedule+0x16e/0x190
>> [ 61.867626] kthread+0x365/0x374
>> [ 61.871029] ? sort_range+0x17/0x17
>> [ 61.874491] ? kthread_stop+0xc8/0xc8
>> [ 61.878189] ret_from_fork+0x1f/0x30
>> [ 61.881912] Code: 74 05 e8 db c5 29 00 3b 1d 87 54 08 02 72 02 0f ff 89 d8 48 0f a3 05 15 53 08 02 72 12 89 de 48 c7 c7 00 ac 42 a6 e8 07 00 11 00 <0f> ff eb 5f 48 c7 c7 c0 56 a2 a6 48 b8 00 00 00 00 00 fc ff df
>> [ 61.900994] ---[ end trace 764cf6ead543b299 ]---
>>
>> which we have bisect record for:
>
>Which, by design, shuts down the system at the end of the test.
>
>Don't get me wrong, I am quite happy for you guys to test rcuperf, but if
>you do test it, the scaffolding either needs to suppress the end-of-test
>shutdown or must recognize the end-of-test shutdown as normal behavior.
>The people using rcuperf for its intended purpose are well-served by the
>end-of-test shutdown, so I am not going to revert the commit shown below.
Do you mean to explicitly disable auto shutdown with this parameter
for CONFIG_RCU_PERF_TEST=y boots? Yes I can do this.
torture_param(bool, shutdown, !IS_ENABLED(MODULE),
"Shutdown at end of performance tests.");
>I need other testing to either stop testing rcuperf on the one hand or to
>have their rcuperf tests account for expected rcuperf behavior on the other.
>
>So how can we best resolve this?
To me the problem is not about auto shutdown, but how come the
shutdown will trigger the warnings here. I noticed you recommend
trying these 2 commits in the other thread:
a54daabfde1a ("sched: Stop resched_cpu() from sending IPIs to offline CPUs")
e600277461ed ("sched: Stop switched_to_rt() from sending IPIs to offline CPUs")
I'll try them out. Sorry for the delay!
Regards,
Fengguang
>> commit 492b95e59735998312f678d77a2d5fe20af6b0b9
>> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>> AuthorDate: Fri Apr 21 16:09:15 2017 -0700
>> Commit: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>> CommitDate: Thu Jun 8 08:25:31 2017 -0700
>>
>> rcuperf: Set more user-friendly defaults
>>
>> Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
>> as a module, rcuperf.shutdown. This commit therefore sets the default
>> for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
>> to zero if rcuperf is built as a module and to one otherwise.
>>
>> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>>
>> 3ddf20c953 srcu: Shrink Tiny SRCU a bit more
>> 492b95e597 rcuperf: Set more user-friendly defaults
>> bb176f6709 Linux 4.14-rc6
>> 36ef71cae3 Add linux-next specific files for 20171018
>> +--------------------------------------------------------------+------------+------------+-----------+---------------+
>> | | 3ddf20c953 | 492b95e597 | v4.14-rc6 | next-20171018 |
>> +--------------------------------------------------------------+------------+------------+-----------+---------------+
>> | boot_successes | 910 | 21 | 199 | 60 |
>> | boot_failures | 0 | 9 | 111 | 21 |
>> | WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0 | 9 | 111 | 21 |
>> +--------------------------------------------------------------+------------+------------+-----------+---------------+
>>
>> [ 29.173654] augmented rbtree testing
>> [ 32.263469] -> 83223 cycles
>> [ 32.540332] gpio_it87: no device
>> [ 32.541422] sched: Unexpected reschedule of offline CPU#1!
>> [ 32.542355] ------------[ cut here ]------------
>> [ 32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x83/0xbb
>> [ 32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc4-00028-g492b95e #1
>> [ 32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> [ 32.546381] task: ffff88001e2e0000 task.stack: ffff88001e2e8000
>> [ 32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
>> [ 32.547642] RSP: 0000:ffff88001e2ebca0 EFLAGS: 00010086
>> [ 32.548225] RAX: 000000000000002e RBX: 0000000000000003 RCX: 0000000000000000
>> [ 32.548989] RDX: 0000002e00000001 RSI: 0000000000000000 RDI: 0000000000000096
>> [ 32.550034] RBP: ffff88001e2ebcb8 R08: 0000000000000003 R09: 0000000000000000
>> [ 32.551039] R10: ffff88001e2e0758 R11: 0000000000021001 R12: 0000000000000001
>> [ 32.551996] R13: 0000000000000000 R14: ffff88001e380000 R15: ffff88001e500000
>> [ 32.552727] FS: 0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000
>> [ 32.553585] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 32.554227] CR2: 0000000000000000 CR3: 000000000260f000 CR4: 00000000000006b0
>> [ 32.555288] Call Trace:
>> [ 32.555687] smp_send_reschedule+0xa/0xc
>> [ 32.556313] try_to_wake_up+0x27e/0x2e4
>> [ 32.556870] wake_up_process+0x15/0x17
>> [ 32.557374] wake_up_worker+0x5f/0x6a
>> [ 32.557982] pwq_adjust_max_active+0x154/0x191
>> [ 32.558444] link_pwq+0xc3/0x10a
>> [ 32.558828] __alloc_workqueue_key+0x325/0x695
>> [ 32.559288] ? ftrace_likely_update+0x78/0x81
>> [ 32.559810] ? pci_epf_init+0x52/0x52
>> [ 32.560222] ? do_early_param+0xfd/0xfd
>> [ 32.560643] pci_epf_test_init+0x30/0x7c
>> [ 32.561063] ? pci_epf_init+0x52/0x52
>> [ 32.561431] do_one_initcall+0xbc/0x1f7
>> [ 32.562099] ? do_early_param+0xfd/0xfd
>> [ 32.562760] kernel_init_freeable+0x13f/0x233
>> [ 32.563497] ? rest_init+0xd6/0xd6
>> [ 32.564046] kernel_init+0x14/0x1c5
>> [ 32.564565] ? rest_init+0xd6/0xd6
>> [ 32.564987] ret_from_fork+0x2a/0x40
>> [ 32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82
>> [ 32.568020] ---[ end trace 6590e27884081b3a ]---
>> [ 32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing
>>
>> Thanks,
>> Fengguang
>>
>
Powered by blists - more mailing lists