lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140403064337.GA29274@gmail.com>
Date:	Thu, 3 Apr 2014 08:43:37 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Igor Mammedov <imammedo@...hat.com>
Cc:	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
	x86@...nel.org, bp@...e.de, paul.gortmaker@...driver.com,
	JBeulich@...e.com, prarit@...hat.com, drjones@...hat.com,
	toshi.kani@...com, riel@...hat.com, gong.chen@...ux.intel.com
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU
 with infinite wait loop


* Igor Mammedov <imammedo@...hat.com> wrote:

> > I've seen that. Kernel still boots. With your patch it would hang.

Nonsense, not booting is OK when critical hardware is genuinely bad - 
this isn't a disk drive or networking where bad IO 'happens sometimes' 
and failure is something we have to engineer for - this is the CPU!

If a critical piece of hardware like the CPU or RAM is non-functional 
then it should be excluded by the user explicitly, not worked around 
after some ugly, non-deterministic and fragile timeout.

The timeout in the SMP bringup code was really an ancient property, 
introduced back more than a decade ago when hardware makers were 
ignorant of Linux we were ignorant of how to properly interface with 
SMP hardware.

Today a 'timeout' means one of 3 things:

  - bad, fragile hardware - this we don't want to hide, unless 
    explicitly told so by the user. I've seen such symptoms related to 
    overclocking for example - so not booting is perfectly justified, 
    it can prevent reporting a bogus kernel crash down the line.

  - buggy SMP bringup. That is a bug that needs to be fixed, not 
    worked around.

  - timeout fragility in virtualized environments

I'm not aware of any genuine case where timing out is the correct 
thing to do.

So the patches look fine to me as-is, I planned on looking at them 
more closely after the merge window.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ