lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <0D14734F-0402-434E-936C-C9213ABA7CCC@linux.vnet.ibm.com>
Date:   Mon, 24 Jun 2019 22:39:36 +0530
From:   Sachin Sant <sachinp@...ux.vnet.ibm.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     linuxppc-dev@...ts.ozlabs.org,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        linux-next@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [next][PowerPC] RCU stalls while booting linux-next on PowerVM
 LPAR



> On 24-Jun-2019, at 8:12 PM, David Hildenbrand <david@...hat.com> wrote:
> 
> On 24.06.19 16:09, Sachin Sant wrote:
>> Latest -next fails to boot on POWER9 PowerVM LPAR due to RCU stalls.
>> 
>> This problem was introduced with next-20190620 (dc636f5d78).
>> next-20190619 was last good kernel.
>> 
>> Reverting following commit allows the kernel to boot.
>> 2fd4aeea6b603 : mm/memory_hotplug: move and simplify walk_memory_blocks()
>> 
>> 
>> [    0.014409] Using shared cache scheduler topology
>> [    0.016302] devtmpfs: initialized
>> [    0.031022] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
>> [    0.031034] futex hash table entries: 16384 (order: 5, 2097152 bytes, linear)
>> [    0.031575] NET: Registered protocol family 16
>> [    0.031724] audit: initializing netlink subsys (disabled)
>> [    0.031796] audit: type=2000 audit(1561344029.030:1): state=initialized audit_enabled=0 res=1
>> [    0.032249] cpuidle: using governor menu
>> [    0.032403] pstore: Registered nvram as persistent store backend
>> [   60.061246] rcu: INFO: rcu_sched self-detected stall on CPU
>> [   60.061254] rcu: 	0-....: (5999 ticks this GP) idle=1ea/1/0x4000000000000002 softirq=5/5 fqs=2999 
>> [   60.061261] 	(t=6000 jiffies g=-1187 q=0)
>> [   60.061265] NMI backtrace for cpu 0
>> [   60.061269] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-next-20190621-autotest-autotest #1
>> [   60.061275] Call Trace:
>> [   60.061280] [c0000018ee85f380] [c000000000b624ec] dump_stack+0xb0/0xf4 (unreliable)
>> [   60.061287] [c0000018ee85f3c0] [c000000000b6d464] nmi_cpu_backtrace+0x144/0x150
>> [   60.061293] [c0000018ee85f450] [c000000000b6d61c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
>> [   60.061300] [c0000018ee85f4f0] [c0000000000692c8] arch_trigger_cpumask_backtrace+0x28/0x40
>> [   60.061306] [c0000018ee85f510] [c0000000001c5f90] rcu_dump_cpu_stacks+0x10c/0x16c
>> [   60.061313] [c0000018ee85f560] [c0000000001c4fe4] rcu_sched_clock_irq+0x744/0x990
>> [   60.061318] [c0000018ee85f630] [c0000000001d5b58] update_process_times+0x48/0x90
>> [   60.061325] [c0000018ee85f660] [c0000000001ea03c] tick_periodic+0x4c/0x120
>> [   60.061330] [c0000018ee85f690] [c0000000001ea150] tick_handle_periodic+0x40/0xe0
>> [   60.061336] [c0000018ee85f6d0] [c00000000002b5cc] timer_interrupt+0x10c/0x2e0
>> [   60.061342] [c0000018ee85f730] [c000000000009204] decrementer_common+0x134/0x140
>> [   60.061350] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
>> [   60.061350]     LR = arch_local_irq_restore+0x84/0x90
>> [   60.061357] [c0000018ee85fa30] [c0000018ee85fbac] 0xc0000018ee85fbac (unreliable)
>> [   60.061364] [c0000018ee85fa50] [c000000000b88300] _raw_spin_unlock_irqrestore+0x50/0x80
>> [   60.061369] [c0000018ee85fa70] [c000000000b69da4] klist_next+0xb4/0x150
>> [   60.061376] [c0000018ee85fac0] [c000000000766ea0] subsys_find_device_by_id+0xf0/0x1a0
>> [   60.061382] [c0000018ee85fb20] [c000000000797a94] walk_memory_blocks+0x84/0x100
>> [   60.061388] [c0000018ee85fb80] [c000000000795ea0] link_mem_sections+0x40/0x60
>> [   60.061395] [c0000018ee85fbb0] [c000000000f28c28] topology_init+0xa0/0x268
>> [   60.061400] [c0000018ee85fc10] [c000000000010448] do_one_initcall+0x68/0x2c0
>> [   60.061406] [c0000018ee85fce0] [c000000000f247dc] kernel_init_freeable+0x318/0x47c
>> [   60.061411] [c0000018ee85fdb0] [c0000000000107c4] kernel_init+0x24/0x150
>> [   60.061417] [c0000018ee85fe20] [c00000000000ba54] ret_from_kernel_thread+0x5c/0x68
>> [   88.016563] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
>> [   88.016569] Modules linked in:
>> 
> 
> Hi, thanks! Please see
> 
> https://lkml.org/lkml/2019/6/21/600
> 
> and especially
> 
> https://lkml.org/lkml/2019/6/21/908
> 
> Does this fix your problem? The fix is on its way to next.

Yes, this patch fixes the problem for me.

Thanks
-Sachin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ