lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F181ACF.20505@linux.vnet.ibm.com>
Date:	Thu, 19 Jan 2012 18:59:51 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Kay Sievers <kay.sievers@...y.org>,
	Alan Stern <stern@...land.harvard.edu>,
	"Luck, Tony" <tony.luck@...el.com>, Greg KH <gregkh@...e.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Sergei Trofimovich <slyich@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linux PM mailing list <linux-pm@...r.kernel.org>,
	Borislav Petkov <bp@...64.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"prasad@...ux.vnet.ibm.com" <prasad@...ux.vnet.ibm.com>,
	Ming Lei <tom.leiming@...il.com>,
	Djalal Harouni <tixxdz@...ndz.org>,
	Borislav Petkov <borislav.petkov@....com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	Andi Kleen <ak@...ux.intel.com>,
	"gouders@...bocholt.fh-gelsenkirchen.de" 
	<gouders@...bocholt.fh-gelsenkirchen.de>,
	Marcos Souza <marcos.mage@...il.com>,
	"justinmattock@...il.com" <justinmattock@...il.com>,
	Jeff Chua <jeff.chua.linux@...il.com>
Subject: Re: [PATCH] mce: fix warning messages about static struct mce_device

On 01/19/2012 06:02 PM, Ingo Molnar wrote:

> 
> * Kay Sievers <kay.sievers@...y.org> wrote:
> 
>>> There's nothing special about the driver model code in this 
>>> respect. The same restriction applies wherever object 
>>> lifetimes are controlled by reference counting.
>>
>> Right. But it might not be obvious what 's the background 
>> here:
>>
>> An allocated device object(memory) usually represents an 
>> actual device(hardware). The object can have N users. Every of 
>> the users is required to take a reference to the object, which 
>> pins the object's memory as long as any of the N users might 
>> need to access it.
>>
>> In a hotplug world, we deal with device-removal.  On 
>> disconnect, we usually just orphan the object, we remove it 
>> from visibility, disconnect the device <-> object relation.
>>
>> All of the N users with a reference can still access the 
>> memory, they just do not talk to a real device anymore. The 
>> invalidated/orphaned state is communicated otherwise by locks 
>> and flags in the device object. Only after all of the N users 
>> left the object alone, the memory of the orphan if free'd.
> 
> But this is not what happened here - it's a special piece of 
> fundamental hardware that doesnt hot-plug separately from the 
> CPU and that has just a single "user".
> 
> So i'm curious, why wasn't the memset() enough? It should have 
> resolved the bug AFAICS.
> 


 It did! The memset _did_ fix the bug.

See  commit a3301b7 (x86/mce: Fix CPU hotplug and suspend regression
related to MCE).

Just to clarify: the bug was that a CPU offline + CPU online would
lead to usage of stale pointers in some device structure related
to MCE and hence, suspend-resume would not work on the second attempt
to suspend. And (as expected), the other symptom of this bug was: a
CPU offline + CPU online would cause the machine to oops because it
tried to dereference some invalid pointer.

And the memset() fixed this bug. Completely.

But what still remained after the memset, was only a harmless warning
about machinecheck not having a release() function. This was only a
reflection of the semantics that the driver-core imposed, but not
really a bug as such. (And as I mentioned in one of my earlier posts,
this warning existed in much older kernels too, but was hidden because
pr_debug() was used to print it. Now that the callpaths changed after
the change over from sysdev to struct device, we now started hitting
a WARN(), instead of a mild pr_debug(). But the message conveyed
by either of these was exactly the same.)

So, the discussion in this thread was about how best to get rid of
that warning, by playing by the rules of the driver-core instead of
circumventing it by having a dummy release function just to silence
the warning.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ