[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F10F117.40006@linux.vnet.ibm.com>
Date: Sat, 14 Jan 2012 08:35:59 +0530
From: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Ming Lei <tom.leiming@...il.com>,
Djalal Harouni <tixxdz@...ndz.org>,
Borislav Petkov <borislav.petkov@....com>,
Tony Luck <tony.luck@...el.com>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
Ingo Molnar <mingo@...e.hu>, Andi Kleen <ak@...ux.intel.com>,
linux-kernel@...r.kernel.org, Greg Kroah-Hartman <gregkh@...e.de>,
Kay Sievers <kay.sievers@...y.org>,
gouders@...bocholt.fh-gelsenkirchen.de,
Marcos Souza <marcos.mage@...il.com>,
Linux PM mailing list <linux-pm@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
"tglx@...utronix.de" <tglx@...utronix.de>,
prasad@...ux.vnet.ibm.com, justinmattock@...il.com,
Jeff Chua <jeff.chua.linux@...il.com>,
Suresh B Siddha <suresh.b.siddha@...el.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mel Gorman <mgorman@...e.de>,
Gilad Ben-Yossef <gilad@...yossef.com>
Subject: Re: x86/mce: machine check warning during poweroff
On 01/14/2012 08:23 AM, Linus Torvalds wrote:
> On Fri, Jan 13, 2012 at 6:41 PM, Srivatsa S. Bhat
> <srivatsa.bhat@...ux.vnet.ibm.com> wrote:
>>
>> YES!! Finally I have a fix for this whole MCE thing! :-)
>
> Goodie.
>
>> The patch below works perfectly for me - I tested multiple CPU hotplug
>> operations as well as multiple pm_test runs at core level. Please let me
>> know if this solves the suspend issue as well..
>
> Ok, I'll try, and I bet it does.
>
> HOWEVER.
>
> I'd be a whole lot happier knowing exactly which field in "struct
> device" that needed to be NULL before it gets registered.
>
> I don't like how
>
> device_register() + device_create_file(dev)..
>
> is not sufficiently undone by
>
> .. device_remove_file(dev) + device_unregister()
>
> so that it can't be repeated. Exactly *what* state is stale and
> re-used incorrectly if you do that device_register() a second time.
>
> It smells like a misfeature of the device core handling.
>
> But that does obviously explain why this started happening with a
> fairly straightforward conversion from sysdev to struct device. It
> just makes me worry about any *other* such conversions.
>
> Of course, normal users will allocate and free the memory, so never
> see this "re-use the same piece of memory" issue. But still..
>
I totally agree with you. Even I had set out to find out *exactly* what
was going wrong. After spending significant amount of time digging through
the code (unsuccessfully), this idea of zeroing out everything struck me
and it worked, as expected. Yes, it is definitely important to know the
exact issue so that we can fix the driver core and avoid other mishaps,
but I guess finding that out is not all that simple.. as of now I am
rather exhausted following those zillions of pointers continuously
for the past few hours.. ;-/
Regards,
Srivatsa S. Bhat
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists