lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANRm+Cx-BdnfcaD9Eb6q05SakSTrF3W8+GgYvDZNxcjHeoSVUw@mail.gmail.com>
Date:   Tue, 23 Aug 2016 08:47:41 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>, dipankar@...ibm.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Josh Triplett <josh@...htriplett.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>, dhowells@...hat.com,
        Eric Dumazet <edumazet@...gle.com>, dvhart@...ux.intel.com,
        Frédéric Weisbecker <fweisbec@...il.com>,
        oleg@...hat.com, pranith kumar <bobby.prani@...il.com>
Subject: Re: [PATCH tip/core/rcu 3/5] sched: Make wake_up_nohz_cpu() handle
 CPUs going offline

2016-08-23 8:45 GMT+08:00 Paul E. McKenney <paulmck@...ux.vnet.ibm.com>:
> On Tue, Aug 23, 2016 at 06:57:20AM +0800, Wanpeng Li wrote:
>> 2016-08-22 23:30 GMT+08:00 Paul E. McKenney <paulmck@...ux.vnet.ibm.com>:
>> > Both timers and hrtimers are maintained on the outgoing CPU until
>> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
>> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
>> > will splat in native_smp_send_reschedule() when attempting to wake up
>> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
>> >
>> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
>> > [ 7976.741595] Modules linked in:
>> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
>> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
>> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
>> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
>> > [ 7976.741595] Call Trace:
>> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
>> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
>> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
>> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
>> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
>> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
>> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
>> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
>> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
>> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
>> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
>> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
>> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
>> >
>> > However, in this case, the wakeup is redundant, because the timer
>> > migration will reprogram timer hardware as needed.  Note that the fact
>> > that preemption is disabled does not avoid the splat, as the offline
>> > operation has already passed both the synchronize_sched() and the
>> > stop_machine() that would be blocked by disabled preemption.
>> >
>> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
>> > to wake up offline CPUs.  It also adds a comment stating that the
>> > caller must tolerate lost wakeups when the target CPU is going offline,
>> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
>>
>> Interesting, I have a patch which posted several weeks ago fix another
>> similar issue, https://lkml.org/lkml/2016/8/4/143 Anyway, if my patch
>> also fixes your bug?
>
> I will see your several weeks and raise you more than a month:
>
> http://lkml.kernel.org/g/20160630175845.GA10269@linux.vnet.ibm.com
>
> So you try mine and then I will try yours.  ;-)
>
> Especially given that I am not seeing how the code path in my trace
> above reaches your change in sched_can_stop_tick()...

Agreed, they are different bugs.

Regards,
Wanpeng Li

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ