linux-kernel - Re: [BUG 2.6.27-rc1] find_busiest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 13 Nov 2010 11:12:20 -0800
From:	Yinghai Lu <yinghai@...nel.org>
To:	Wu Fengguang <fengguang.wu@...el.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Nikanth Karthikesan <knikanth@...e.de>,
	David Rientjes <rientjes@...gle.com>,
	"Zheng, Shaohui" <shaohui.zheng@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-hotplug@...r.kernel.org" <linux-hotplug@...r.kernel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Bjorn Helgaas <bjorn.helgaas@...com>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Nikhil Rao <ncrao@...gle.com>,
	Takuya Yoshikawa <yoshikawa.takuya@....ntt.co.jp>
Subject: Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP

On 11/13/2010 05:10 AM, Wu Fengguang wrote:
> On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
>> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
>>> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
>>>> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
>>>>>> Will try and figure out how the heck that's happening, Ingo any clue?
>>>>>
>>>>> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
>>>>> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
>>>>>
>>>>> The interesting part is, the commit was introduced in 
>>>>> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
>>>>
>>>> Argh, that commit again..
>>>>
>>>> Does this fix it: http://lkml.org/lkml/2010/11/12/8
>>>
>>> No it still panics. Here is the dmesg.
>>
>> OK, I'll let Nikanth have a look, if all else fails we can always
>> revert that patch.
> 
> It's the same bug.
> 
> Just tried another machine, I get the same divide error.  The patch
> posted in lkml/2010/11/12/8 does not fix it. But after reverting
> commit 50f2d7f682f9, it boots OK.
> 
> Thanks,
> Fengguang
> ---
> PS. dmesg with divide error
> 
> [    0.000000] console [ttyS0] enabled, bootconsole disabled
> [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> [    0.000000] ... MAX_LOCK_DEPTH:          48
> [    0.000000] ... MAX_LOCKDEP_KEYS:        8191
> [    0.000000] ... CLASSHASH_SIZE:          4096
> [    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
> [    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
> [    0.000000] ... CHAINHASH_SIZE:          16384
> [    0.000000]  memory used by lock dependency info: 6367 kB
> [    0.000000]  per task-struct memory footprint: 2688 bytes
> [    0.000000] allocated 167772160 bytes of page_cgroup
> [    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
> [    0.000000] ODEBUG: 15 of 15 active objects replaced
> [    0.000000] hpet clockevent registered
> [    0.001000] Fast TSC calibration using PIT
> [    0.002000] Detected 2800.469 MHz processor.
> [    0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj=2800469)
> [    0.010818] pid_max: default: 32768 minimum: 301
> [    0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
> [    0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
> [    0.044553] Mount-cache hash table entries: 256
> [    0.049469] Initializing cgroup subsys debug
> [    0.053834] Initializing cgroup subsys ns
> [    0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
> [    0.066968] Initializing cgroup subsys cpuacct
> [    0.071511] Initializing cgroup subsys memory
> [    0.075988] Initializing cgroup subsys devices
> [    0.080527] Initializing cgroup subsys freezer
> [    0.085107] CPU: Physical Processor ID: 0
> [    0.089209] CPU: Processor Core ID: 0
> [    0.092974] mce: CPU supports 9 MCE banks
> [    0.097095] CPU0: Thermal monitoring enabled (TM1)
> [    0.101990] using mwait in idle threads.
> [    0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
> [    0.113535] ... version:                3
> [    0.117641] ... bit width:              48
> [    0.121828] ... generic registers:      4
> [    0.125926] ... value mask:             0000ffffffffffff
> [    0.131328] ... max period:             000000007fffffff
> [    0.136734] ... fixed-purpose events:   3
> [    0.140839] ... event mask:             000000070000000f
> [    0.147297] ACPI: Core revision 20101013
> [    0.175646] ftrace: allocating 24175 entries in 95 pages
> [    0.190912] Setting APIC routing to flat
> [    0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.211643] CPU0: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz stepping 01
> [    0.325243] lockdep: fixing up alternatives.
> [    0.330242] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
> [    0.430140]  #2lockdep: fixing up alternatives.
> [    0.526962]  #3lockdep: fixing up alternatives.
> [    0.623755]  #4lockdep: fixing up alternatives.
> [    0.720588]  Ok.
> [    0.722525] Booting Node   1, Processors  #5lockdep: fixing up alternatives.
> [    0.822389]  Ok.
> [    0.824327] Booting Node   0, Processors  #6
> [    0.919089] TSC synchronization [CPU#0 -> CPU#6]:
> [    0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.003999] Marking TSC unstable due to check_tsc_sync_source failed
> [    0.557048] lockdep: fixing up alternatives.
> [    0.558041]  Ok.
> [    0.559004] Booting Node   1, Processors  #7 Ok.
> [    0.632157] Brought up 8 CPUs
> [    0.633006] Total of 8 processors activated (44799.46 BogoMIPS).

assume that when you have 
CONFIG_NR_CPUS=16
instead of
CONFIG_NR_CPUS=8

it will boot ok?

Thanks

	Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/