lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F391BBA.5020506@linux.intel.com>
Date:	Mon, 13 Feb 2012 06:18:34 -0800
From:	Arjan van de Ven <arjan@...ux.intel.com>
To:	Michael Neuling <mikey@...ling.org>
CC:	Stephen Rothwell <sfr@...b.auug.org.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, gregkh@...uxfoundation.org,
	linux-next@...r.kernel.org,
	ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
	Milton Miller <miltonm@....com>
Subject: Re: Boot failure with next-20120208

On 2/12/2012 7:04 PM, Michael Neuling wrote:
>> Just a quick note to say I got a boot OOPs with next-20120208 and 9 on a
>> Power7 blade (my other PowerPC boot tests are ok.  I'll investigate this
>> further on Monday.
>>
>> The line referenced below is:
>>
>> BUG_ON(!kobj || !kobj->sd || !attr);
>>
>> in sysfs_create_file().
>>
>> calling  .topology_init+0x0/0x1ac @ 1
>> initcall 7_.async_cpu_up+0x0/0x40 returned 0 after 9765 usecs
>> async_continuing @ 20 after 9765 usec
>> ------------[ cut here ]------------
>> kernel BUG at fs/sysfs/file.c:573!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> SMP NR_CPUS=32 NUMA pSeries
>> Modules linked in:
>> NIP: c00000000024a35c LR: c0000000004ee050 CTR: c00000000083ca24
>> REGS: c0000003fd9e7560 TRAP: 0700   Not tainted  (3.3.0-rc2-autokern1)
>> MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 88002082  XER: 0000000f
>> CFAR: c00000000024a370
>> TASK = c0000003fd9e8000[20] 'kworker/u:6' THREAD: c0000003fd9e4000 CPU: 0
>> GPR00: 0000000000000001 c0000003fd9e77e0 c000000000d19bb8 0000000000000000 
>> GPR04: c000000000bf37a8 0000000000000008 8000000002096400 0000000000000000 
>> GPR08: 0000000000000000 c000000000f80028 c000000000d52bd8 0000000000000000 
>> GPR12: 0000000048002088 c00000000f33b000 0000000001affa78 00000000009aa000 
>> GPR16: 0000000000e1f3c8 0000000002d517f0 0000000001aff984 0000000000000060 
>> GPR20: 0000000000000000 ffffffffffffffff 0000000000000000 c000000000c45128 
>> GPR24: 0000000000000000 0000000000000008 0000000000000000 c000000000c44200 
>> GPR28: c000000000f80028 0000000000000008 c000000000c85038 0000000000000002 
>> NIP [c00000000024a35c] .sysfs_create_file+0x1c/0x40
>> LR [c0000000004ee050] .device_create_file+0x20/0x40
>> Call Trace:
>> [c0000003fd9e77e0] [c0000003fd9e78a0] 0xc0000003fd9e78a0 (unreliable)
>> [c0000003fd9e7850] [c00000000083c9a4] .register_cpu_online+0x1d0/0x250
>> [c0000003fd9e7900] [c00000000083ca8c] .sysfs_cpu_notify+0x68/0x28c
>> [c0000003fd9e79b0] [c00000000083769c] .notifier_call_chain+0x9c/0x100
>> [c0000003fd9e7a50] [c0000000000a5878] .__cpu_notify+0x38/0x80
>> [c0000003fd9e7ad0] [c00000000083e124] ._cpu_up+0x10c/0x178
>> [c0000003fd9e7b90] [c00000000083e2c8] .cpu_up+0x138/0x164
>> [c0000003fd9e7c20] [c000000000ba46d0] .async_cpu_up+0x28/0x40
>> [c0000003fd9e7ca0] [c0000000000d81ec] .async_run_entry_fn+0xbc/0x1f0
>> [c0000003fd9e7d50] [c0000000000c7cbc] .process_one_work+0x19c/0x590
>> [c0000003fd9e7e10] [c0000000000c8618] .worker_thread+0x188/0x4b0
>> [c0000003fd9e7ed0] [c0000000000ce57c] .kthread+0xbc/0xd0
>> [c0000003fd9e7f90] [c000000000021448] .kernel_thread+0x54/0x70
>> Instruction dump:
>> 7fa3eb78 ebe1fff8 eba1ffe8 7c0803a6 4e800020 2c230000 41820024 e8630030 
>> 7c800074 7800d182 2fa30000 419e0014 <0b000000> 38a00002 4bfffebc e8630030 
>> ---[ end trace 31fd0ba7d8756001 ]---
>> initcall .topology_init+0x0/0x1ac returned 0 after 0 usecs
>> calling  .pcibios_init+0x0/0xe8 @ 1
>> PCI: Probing PCI hardware
>> PCI: Probing PCI hardware done
>> initcall .pcibios_init+0x0/0xe8 returned 0 after 0 usecs
>> calling  .add_system_ram_resources+0x0/0x140 @ 1
>> initcall .add_system_ram_resources+0x0/0x140 returned 0 after 0 usecs
>> calling  .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 @ 1
>> initcall .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 returned 0 after 0 usecs
>> calling  .opal_init+0x0/0x1cc @ 1
>> opal: Node not found
>> initcall .opal_init+0x0/0x1cc returned -19 after 0 usecs
>> calling  .__machine_initcall_pseries_ioei_init+0x0/0xa0 @ 1
> 
> Reverting "smp: start up non-boot CPUs asynchronously" (8de7a96405 from
> next-20120208) fixes this problem for me.  
> 
if that fixes it, it means PPC has a race somewhere in the cpu hotplug
code, since all the patch does is hotplug the cpus one by one (which the
normal kernel also does, just not in parallel with other work)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ