lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101114173208.GA23017@localhost>
Date:	Mon, 15 Nov 2010 01:32:08 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Nikanth Karthikesan <knikanth@...e.de>,
	David Rientjes <rientjes@...gle.com>,
	"Zheng, Shaohui" <shaohui.zheng@...el.com>,
	"linux-hotplug@...r.kernel.org" <linux-hotplug@...r.kernel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Bjorn Helgaas <bjorn.helgaas@...com>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Nikhil Rao <ncrao@...gle.com>,
	Takuya Yoshikawa <yoshikawa.takuya@....ntt.co.jp>
Subject: Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
 num limitation

Hi,

I just found another problem. When passing "mem=256" to 2.6.37-rc1,
it dies hard early (not able to print any boot log). With this patch
applied, it's a bit better: it shows a kernel panic, but still dies
hard (not able to reboot with "panic=10").

Attached is the screenshot in kvm (it's not specific to kvm, it dies
hard on two more physical boxes). The screenshot shows that it panics
inside reserve_trampoline_memory().

Thanks,
Fengguang

On Sun, Nov 14, 2010 at 09:38:41AM +0800, Yinghai Lu wrote:
> 
> Recent Intel new system have different order in MADT, aka will list all thread0
> at first, then all thread1.
> But SRAT table still old order, it will list cpus in one socket all together.
> 
> If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
> to put some cpus apic id to node mapping into apicid_to_node[].
> 
> for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
> 
> [    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
> [    9.235021] divide error: 0000 [#1] SMP 
> [    9.235315] last sysfs file: 
> [    9.235481] CPU 1 
> [    9.235592] Modules linked in:
> [    9.245398] 
> [    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
> [    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> [    9.265835] RSP: 0000:ffff88103f8d1c40  EFLAGS: 00010046
> [    9.285550] RAX: 0000000000000000 RBX: ffff88103f887de0 RCX: 0000000000000000
> [    9.305356] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
> [    9.305711] RBP: ffff88103f8d1d10 R08: 0000000000000200 R09: ffff88103f887e38
> [    9.325709] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> [    9.326038] R13: ffff88107e80dfb0 R14: 0000000000000001 R15: ffff88103f887e40
> [    9.345655] FS:  0000000000000000(0000) GS:ffff88107e800000(0000) knlGS:0000000000000000
> [    9.365503] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    9.365776] CR2: 0000000000000000 CR3: 0000000002417000 CR4: 00000000000006e0
> [    9.385583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    9.405368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    9.405713] Process kthreadd (pid: 2, threadinfo ffff88103f8d0000, task ffff88305c8aa2d0)
> [    9.425563] Stack:
> [    9.425668]  ffff88103f8d1cb0 0000000000000046 0000000000000000 0000000200000000
> [    9.445509]  0000000000000000 0000000100000000 0000000000000046 ffffffff82bd1ce0
> [    9.465350]  000000015c8aa2d0 00000000001d2540 00000000001d2540 0000007d3f8d1d28
> [    9.465763] Call Trace:
> [    9.465875]  [<ffffffff810747c3>] wake_up_new_task+0x3c/0x10e
> [    9.485486]  [<ffffffff8107b2e3>] do_fork+0x28c/0x35f
> [    9.485753]  [<ffffffff810ab832>] ? __lock_acquire+0x1801/0x1813
> [    9.505474]  [<ffffffff8106f2bd>] ? finish_task_switch+0x80/0xf4
> [    9.525264]  [<ffffffff8106f286>] ? finish_task_switch+0x49/0xf4
> [    9.525575]  [<ffffffff8109da72>] ? local_clock+0x2b/0x3c
> [    9.545281]  [<ffffffff8103da76>] kernel_thread+0x70/0x72
> [    9.545544]  [<ffffffff81097c83>] ? kthread+0x0/0xa8
> [    9.545797]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
> [    9.565519]  [<ffffffff81098099>] kthreadd+0xe8/0x12b
> [    9.585185]  [<ffffffff81037994>] kernel_thread_helper+0x4/0x10
> [    9.585485]  [<ffffffff81cd793c>] ? restore_args+0x0/0x30
> [    9.605192]  [<ffffffff81097fb1>] ? kthreadd+0x0/0x12b
> [    9.605479]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
> [    9.625295] Code: a0 be 00 02 00 00 ff c2 48 63 d2 e8 f8 67 3b 00 3b 05 86 8e 52 01 48 89 c7 89 45 c8 7c c1 48 8b 45 b0 8b 4b 08 31 d2 48 c1 e0 0a <48> f7 f1 45 85 e4 75 08 48 3b 45 b8 72 08 eb 0d 48 89 45 a8 eb 
> [    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> [    9.665356]  RSP <ffff88103f8d1c40>
> [    9.665568] ---[ end trace 2296156d35fdfc87 ]---
> 
> So let just parse all cpu entries in SRAT.
> 
> Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
> apicid_to_node[].
> 
> it should fix following bug too.
> https://bugzilla.kernel.org/show_bug.cgi?id=22662
> 
> Reported-and-Tested-by: Wu Fengguang <fengguang.wu@...el.com>
> Reported-by: Bjorn Helgaas <bjorn.helgaas@...com>
> Signed-off-by: Yinghai Lu <yinghai@...nel.org>
> 
> ---
>  arch/x86/kernel/acpi/boot.c |    7 +++++++
>  arch/x86/mm/srat_64.c       |    8 ++++++++
>  drivers/acpi/numa.c         |   14 ++++++++++++--
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>  {
>  	unsigned int ver = 0;
>  
> +#ifdef CONFIG_X86_64
> +	if (id >= (MAX_APICS-1)) {
> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
> +		return;
> +	}
> +#endif
> +
>  	if (!enabled) {
>  		++disabled_cpus;
>  		return;
> Index: linux-2.6/arch/x86/mm/srat_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/srat_64.c
> +++ linux-2.6/arch/x86/mm/srat_64.c
> @@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
>  	}
>  
>  	apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> @@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct
>  		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
>  	else
>  		apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> Index: linux-2.6/drivers/acpi/numa.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/numa.c
> +++ linux-2.6/drivers/acpi/numa.c
> @@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
>  int __init acpi_numa_init(void)
>  {
>  	int ret = 0;
> +	int nr_cpu_entries = nr_cpu_ids;
> +
> +#ifdef CONFIG_X86_64
> +	/*
> +	 * Should not limit number with cpu num that will handle,
> +	 * SRAT cpu entries could have different order with that in MADT.
> +	 * So go over all cpu entries in SRAT to get apicid to node mapping.
> +	 */
> +	nr_cpu_entries = MAX_LOCAL_APIC;
> +#endif
>  
>  	/* SRAT: Static Resource Affinity Table */
>  	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
>  		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
> -				     acpi_parse_x2apic_affinity, nr_cpu_ids);
> +				     acpi_parse_x2apic_affinity, nr_cpu_entries);
>  		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
> -				     acpi_parse_processor_affinity, nr_cpu_ids);
> +				     acpi_parse_processor_affinity, nr_cpu_entries);
>  		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>  					    acpi_parse_memory_affinity,
>  					    NR_NODE_MEMBLKS);

Download attachment "panic-reserve_trampoline_memory.png" of type "image/png" (18830 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ