lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 24 Apr 2014 17:41:20 +0000
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Jiang Liu <jiang.liu@...ux.intel.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [Bugfix] sched: fix possible invalid memory access caused by
 CPU hot-addition

>> 	The BIOS always sends CPU hot-addition events before memory
>> hot-addition events, so it's hard to change the order.
>> 	And we couldn't completely solve this performance penalty because the
>> affected code tries to allocate memory for all possible
>> CPUs instead of onlined CPUs.
>
> So the BIOS is fucked, news at 11, one would have hoped Intel would have
> _some_ say in it, but alas. So how about instead you force memory online
> when you online the first CPU, screw whatever the BIOS does or does not?

Certainly an interesting implementation choice by the BIOS. The only logical
order to use to bring components of a modern cpu online is:

1) Memory - so we have a place to allocate structure needed for following steps
2) Cores - so we have a place to direct interrupts from next step
3) I/O

We should log a bug against the BIOS ... but systems are already shipping so we will
have to deal with this.

Either we use your existing patch - and systems with silly BIOS will work, but with a
small NUMA penalty for objects allocated remotely

or ... we implement some crazy queuing scheme ... where we delay bringing cores
online for a while to see whether more things like memory and I/O start showing
up too.  We can't wait forever - people sometimes do configure systems with
memory-less nodes.

I think your existing solution is the better choice ... the penalties probably aren't
all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction.

Maybe a one-time printk() so the user knows they have a buggy BIOS might help
provide back pressure to BIOS teams to do this right in the future? But it isn't
a bug for the memory-less node case.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ