linux-kernel - Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1360010058.23410.169.camel@misato.fc.hp.com>
Date:	Mon, 04 Feb 2013 13:34:18 -0700
From:	Toshi Kani <toshi.kani@...com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Greg KH <gregkh@...uxfoundation.org>, lenb@...nel.org,
	akpm@...ux-foundation.org, linux-acpi@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
	bhelgaas@...gle.com, isimatu.yasuaki@...fujitsu.com,
	jiang.liu@...wei.com, wency@...fujitsu.com, guohanjun@...wei.com,
	yinghai@...nel.org, srivatsa.bhat@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device
 hotplug framework

On Mon, 2013-02-04 at 21:12 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > >   :
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > 
> > > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > > time isn't a good idea.
> > > > > >
> > > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > > 
> > > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > > realized that they just had to accept the fact that removing a PCI
> > > > > > device could either happen by:
> > > > > > 	- a user yanking out the device, at which time the OS better
> > > > > > 	  clean up properly no matter what happens
> > > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > > 	  as long as it wants to complete that action, including
> > > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > > 	  always letting the action succeed.
> > > > > > 
> > > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > > can't, because memory is being used by someone else, either move them
> > > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > > 
> > > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > > if it is safe to unplug something at this particualr time (think about the
> > > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > > users' demand AFAIR).
> > > > > 
> > > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > > "forcible eject", whichever they prefer.
> > > > 
> > > > For system device hot-plug, it always needs to be "safe eject".  This
> > > > feature will be implemented on mission critical servers, which are
> > > > managed by professional IT folks.  Crashing a server causes serious
> > > > money to the business.
> > > 
> > > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > > in my opinion.
> > > 
> > > That said I would be perfectly fine with not supporting the "forcible eject" to
> > > start with and waiting for the first request to add support for it.  I also
> > > would be fine with taking bets on how much time it's going to take for such a
> > > request to appear. :-)
> > 
> > Sounds good.  In my experience, though, it actually takes a LONG time to
> > convince customers that "safe eject" is actually safe.  Enterprise
> > customers are so afraid of doing anything risky that might cause the
> > system to crash or hang due to some defect.  I would be very surprised
> > to see a customer asking for a force operation when we do not guarantee
> > its outcome.  I have not seen such enterprise customers yet.
> 
> But we're talking about a kernel that is supposed to run on mobile phones too,
> among other things.

I think using this feature for RAS i.e. replacing a faulty device
on-line, will continue to be limited for high-end systems.  For low-end
systems, it does not make sense for customers to pay much $$ for this
feature.  They can just shut the system down for replacement, or they
can simply buy a new system instead of repairing.

That said, using this feature on VM for workload balancing does not
require any special hardware.  So, I can see someone willing to try out
to see how it goes with a force option on VM for personal use.   

Thanks,
-Toshi


 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/