[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1201141121300.7598-100000@netrider.rowland.org>
Date: Sat, 14 Jan 2012 11:30:03 -0500 (EST)
From: Alan Stern <stern@...land.harvard.edu>
To: Greg KH <gregkh@...e.de>
cc: Linus Torvalds <torvalds@...ux-foundation.org>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
Ming Lei <tom.leiming@...il.com>,
Djalal Harouni <tixxdz@...ndz.org>,
Borislav Petkov <borislav.petkov@....com>,
Tony Luck <tony.luck@...el.com>,
Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
Ingo Molnar <mingo@...e.hu>, Andi Kleen <ak@...ux.intel.com>,
<linux-kernel@...r.kernel.org>, Kay Sievers <kay.sievers@...y.org>,
<gouders@...bocholt.fh-gelsenkirchen.de>,
Marcos Souza <marcos.mage@...il.com>,
Linux PM mailing list <linux-pm@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
"tglx@...utronix.de" <tglx@...utronix.de>,
<prasad@...ux.vnet.ibm.com>, <justinmattock@...il.com>,
Jeff Chua <jeff.chua.linux@...il.com>,
Suresh B Siddha <suresh.b.siddha@...el.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mel Gorman <mgorman@...e.de>,
Gilad Ben-Yossef <gilad@...yossef.com>
Subject: Re: x86/mce: machine check warning during poweroff
On Sat, 14 Jan 2012, Greg KH wrote:
> On Fri, Jan 13, 2012 at 06:53:04PM -0800, Linus Torvalds wrote:
> > On Fri, Jan 13, 2012 at 6:41 PM, Srivatsa S. Bhat
> > <srivatsa.bhat@...ux.vnet.ibm.com> wrote:
> > >
> > > YES!! Finally I have a fix for this whole MCE thing! :-)
> >
> > Goodie.
> >
> > > The patch below works perfectly for me - I tested multiple CPU hotplug
> > > operations as well as multiple pm_test runs at core level. Please let me
> > > know if this solves the suspend issue as well..
> >
> > Ok, I'll try, and I bet it does.
> >
> > HOWEVER.
> >
> > I'd be a whole lot happier knowing exactly which field in "struct
> > device" that needed to be NULL before it gets registered.
> >
> > I don't like how
> >
> > device_register() + device_create_file(dev)..
> >
> > is not sufficiently undone by
> >
> > .. device_remove_file(dev) + device_unregister()
> >
> > so that it can't be repeated. Exactly *what* state is stale and
> > re-used incorrectly if you do that device_register() a second time.
> >
> > It smells like a misfeature of the device core handling.
>
> It has to do with the fact that this is a "static" device that is being
> reused. Normally it would be cleaned up properly in the release
> function, but as there isn't one, some fields are being left in a bad
> state.
That's exactly right. In general, device structures should never be
reused. Apart from the reinitialization issues, in the general case
you have the problem that the references to the previous incarnation
may not all have been dropped. Now, perhaps in the MCE case you _do_
know that they're all gone (I can't tell), but relying on it is
dangerous.
The driver core isn't designed to handle device structures that get
unregistered and then spring back to life; callers are supposed to
allocate a fresh new structure instead. (We had to solve this very
same problem in the USB subsystem a number of years ago; figuring it
all out was tricky even back then.) And this is true regardless of
whether the original structure was allocated dynamically or not.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists