lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Nov 2017 08:13:35 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Fengguang Wu <fengguang.wu@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tejun Heo <tj@...nel.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Nicolas Pitre <nicolas.pitre@...aro.org>,
        linux-kernel@...r.kernel.org, lkp@...org,
        "Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [migration_cpu_stop] WARNING: CPU: 0 PID: 11 at
 kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6

On Tue, Nov 21, 2017 at 09:51:01PM +0800, Fengguang Wu wrote:
> On Tue, Nov 21, 2017 at 02:34:03PM +0100, Peter Zijlstra wrote:
> >On Tue, Nov 21, 2017 at 08:04:20PM +0800, Fengguang Wu wrote:
> >>Hello,
> >>
> >>FYI this happens in mainline kernel 4.14.0-00902-g43ff2f4.
> >>It happens since 4.13.
> >>
> >>It occurs in 7 out of 36 boots.
> >>
> >>[   58.171538] no ifx modem active;
> >>[   58.211860] ACPI: Preparing to enter system sleep state S5
> >>[   58.216823] reboot: Power down
> >>[   58.244526] acpi_power_off called
> >>[   61.460467] ------------[ cut here ]------------
> >>[   61.465481] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6:
> >>						set_task_cpu at kernel/sched/core.c:1187
> >
> >Never seen that before.. but I can't even figure out how CPUs end up
> >offline in that reboot path :/
> >
> >So far I just get lost in the callback maze... Rafael any clues?
> 
> I noticed that it's doing rcu-perf testing before the warning.
> And it's immediately followed by another warning:
> 
> [   61.653178] sched: Unexpected reschedule of offline CPU#1!
> [   61.658501] ------------[ cut here ]------------
> [   61.663130] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x59/0xbf
> [   61.673690] Modules linked in:
> [   61.676737] CPU: 0 PID: 11 Comm: migration/0 Tainted: G        W       4.14.0-00902-g43ff2f4 #1
> [   61.684975] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [   61.692977] task: ffff880016c00000 task.stack: ffff880016c08000
> [   61.698610] RIP: 0010:native_smp_send_reschedule+0x59/0xbf
> [   61.703900] RSP: 0018:ffff880016c0fc68 EFLAGS: 00010086
> [   61.710817] RAX: 000000000000002e RBX: 0000000000000001 RCX: 0000000000000000
> [   61.721649] RDX: 000000000000002e RSI: 0000000000000003 RDI: ffffed0002d81f83
> [   61.732727] RBP: ffff88000ac94640 R08: 0000000000000001 R09: 0000000000000001
> [   61.743330] R10: ffff880016c0fbc8 R11: 0000000000000001 R12: ffff88000e54c800
> [   61.753861] R13: dffffc0000000000 R14: ffff8800177e0700 R15: 0000000000000000
> [   61.764513] FS:  0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
> [   61.776370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   61.785022] CR2: 00007f3eca6152c0 CR3: 0000000007a15003 CR4: 00000000000206b0
> [   61.797309] Call Trace:
> [   61.801165]  check_preempt_wakeup+0x386/0x5aa
> [   61.807806]  check_preempt_curr+0xe3/0x22e
> [   61.814220]  move_queued_task+0x39b/0x3ab
> [   61.820415]  migration_cpu_stop+0x216/0x2f2
> [   61.826797]  ? sched_ttwu_pending+0x1fd/0x1fd
> [   61.833542]  ? lock_acquire+0xed/0x14b
> [   61.838654]  ? sched_ttwu_pending+0x1fd/0x1fd
> [   61.844336]  cpu_stopper_thread+0x1c9/0x25e
> [   61.849812]  ? cpu_stop_create+0x6a/0x6a
> [   61.855049]  smpboot_thread_fn+0x615/0x64b
> [   61.860459]  ? sort_range+0x17/0x17
> [   61.863922]  ? schedule+0x16e/0x190
> [   61.867626]  kthread+0x365/0x374
> [   61.871029]  ? sort_range+0x17/0x17
> [   61.874491]  ? kthread_stop+0xc8/0xc8
> [   61.878189]  ret_from_fork+0x1f/0x30
> [   61.881912] Code: 74 05 e8 db c5 29 00 3b 1d 87 54 08 02 72 02 0f ff 89 d8 48 0f a3 05 15 53 08 02 72 12 89 de 48 c7 c7 00 ac 42 a6 e8 07 00 11 00 <0f> ff eb 5f 48 c7 c7 c0 56 a2 a6 48 b8 00 00 00 00 00 fc ff df
> [   61.900994] ---[ end trace 764cf6ead543b299 ]---
> 
> which we have bisect record for:

Which, by design, shuts down the system at the end of the test.

Don't get me wrong, I am quite happy for you guys to test rcuperf, but if
you do test it, the scaffolding either needs to suppress the end-of-test
shutdown or must recognize the end-of-test shutdown as normal behavior.
The people using rcuperf for its intended purpose are well-served by the
end-of-test shutdown, so I am not going to revert the commit shown below.
I need other testing to either stop testing rcuperf on the one hand or to
have their rcuperf tests account for expected rcuperf behavior on the other.

So how can we best resolve this?  

							Thanx, Paul

> commit 492b95e59735998312f678d77a2d5fe20af6b0b9
> Author:     Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> AuthorDate: Fri Apr 21 16:09:15 2017 -0700
> Commit:     Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> CommitDate: Thu Jun 8 08:25:31 2017 -0700
> 
>    rcuperf: Set more user-friendly defaults
> 
>    Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
>    as a module, rcuperf.shutdown.  This commit therefore sets the default
>    for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
>    to zero if rcuperf is built as a module and to one otherwise.
> 
>    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> 
> 3ddf20c953  srcu: Shrink Tiny SRCU a bit more
> 492b95e597  rcuperf: Set more user-friendly defaults
> bb176f6709  Linux 4.14-rc6
> 36ef71cae3  Add linux-next specific files for 20171018
> +--------------------------------------------------------------+------------+------------+-----------+---------------+
> |                                                              | 3ddf20c953 | 492b95e597 | v4.14-rc6 | next-20171018 |
> +--------------------------------------------------------------+------------+------------+-----------+---------------+
> | boot_successes                                               | 910        | 21         | 199       | 60            |
> | boot_failures                                                | 0          | 9          | 111       | 21            |
> | WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0          | 9          | 111       | 21            |
> +--------------------------------------------------------------+------------+------------+-----------+---------------+
> 
> [   29.173654] augmented rbtree testing
> [   32.263469]  -> 83223 cycles
> [   32.540332] gpio_it87: no device
> [   32.541422] sched: Unexpected reschedule of offline CPU#1!
> [   32.542355] ------------[ cut here ]------------
> [   32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x83/0xbb
> [   32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc4-00028-g492b95e #1
> [   32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [   32.546381] task: ffff88001e2e0000 task.stack: ffff88001e2e8000
> [   32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
> [   32.547642] RSP: 0000:ffff88001e2ebca0 EFLAGS: 00010086
> [   32.548225] RAX: 000000000000002e RBX: 0000000000000003 RCX: 0000000000000000
> [   32.548989] RDX: 0000002e00000001 RSI: 0000000000000000 RDI: 0000000000000096
> [   32.550034] RBP: ffff88001e2ebcb8 R08: 0000000000000003 R09: 0000000000000000
> [   32.551039] R10: ffff88001e2e0758 R11: 0000000000021001 R12: 0000000000000001
> [   32.551996] R13: 0000000000000000 R14: ffff88001e380000 R15: ffff88001e500000
> [   32.552727] FS:  0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000
> [   32.553585] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   32.554227] CR2: 0000000000000000 CR3: 000000000260f000 CR4: 00000000000006b0
> [   32.555288] Call Trace:
> [   32.555687]  smp_send_reschedule+0xa/0xc
> [   32.556313]  try_to_wake_up+0x27e/0x2e4
> [   32.556870]  wake_up_process+0x15/0x17
> [   32.557374]  wake_up_worker+0x5f/0x6a
> [   32.557982]  pwq_adjust_max_active+0x154/0x191
> [   32.558444]  link_pwq+0xc3/0x10a
> [   32.558828]  __alloc_workqueue_key+0x325/0x695
> [   32.559288]  ? ftrace_likely_update+0x78/0x81
> [   32.559810]  ? pci_epf_init+0x52/0x52
> [   32.560222]  ? do_early_param+0xfd/0xfd
> [   32.560643]  pci_epf_test_init+0x30/0x7c
> [   32.561063]  ? pci_epf_init+0x52/0x52
> [   32.561431]  do_one_initcall+0xbc/0x1f7
> [   32.562099]  ? do_early_param+0xfd/0xfd
> [   32.562760]  kernel_init_freeable+0x13f/0x233
> [   32.563497]  ? rest_init+0xd6/0xd6
> [   32.564046]  kernel_init+0x14/0x1c5
> [   32.564565]  ? rest_init+0xd6/0xd6
> [   32.564987]  ret_from_fork+0x2a/0x40
> [   32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82
> [   32.568020] ---[ end trace 6590e27884081b3a ]---
> [   32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing
> 
> Thanks,
> Fengguang
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ