lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 20 Jun 2014 11:41:27 -0400
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	Borislav Petkov <bp@...en8.de>
CC:	tony.luck@...el.com, linux-kernel@...r.kernel.org,
	linux-edac@...r.kernel.org, mattieu.souchaud@...e.fr
Subject: Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error
 path

On 06/20/2014 11:23 AM, Borislav Petkov wrote:
> On Fri, Jun 20, 2014 at 10:28:13AM -0400, Boris Ostrovsky wrote:
>> Commit 9c15a24b038f4d8da93a2bc2554731f8953a7c17 (x86/mce: Improve
>> mcheck_init_device() error handling) unregisters (or never registers)
>> MCE's hotplug notifier if an error is encountered.
> Well, mcheck_init_device() did encounter errors before that commit too,
> can you please go into detail on how exactly you're triggering this?
> Which error are you talking about exactly?

You can simulate this on baremetal by having, for example, 
misc_register() fail (just add 'err = -EOI' after the call). Or you can 
return an error right upon entry to mcheck_init_device() (I haven't 
tested that though).

Then, after you are booted do a couple of
     echo 0 > /sys/devices/system/cpu/cpu1/online
     echo 1 > /sys/devices/system/cpu/cpu1/online

Then sit still for about 10 minutes. I don't think any activity is 
necessary.

You are dead now. If you are lucky you may see messages about soft 
lockups or RCU stalls but often nothing.

> Lemme guess: some xen special handling which baremetal doesn't need.

Only in the sense that on Xen misc_register() often fails. But any 
failure on baremetal will result in the same behavior.

>
>> Since unplugging a CPU would normally result in the notifier deleting
>> MCE timer we are now left with the timer running if a CPU is removed on
>> a system where mcheck_init_device() had failed.
>>
>> If we later hotplug this CPU back we add this timer again in
>> mcheck_cpu_init()). Eventually the two timers start intefering with each
>> other, causing soft lockups or system hangs.
>>
>> We should leave the notifier always on and, in fact, set it up early
>> during the boot.
> We do leave it always on - we only unregister it if we've encountered an
> error.

Right. And I think we shouldn't because we leave undeleted timers.

-boris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ