lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bn1nlibx.fsf@x220.int.ebiederm.org>
Date:	Sun, 24 Jul 2016 00:24:02 -0500
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	Joel Stanley <joel.stanley@....ibm.com>,
	Jeremy Kerr <jeremy.kerr@....ibm.com>,
	Greg KH <gregkh@...uxfoundation.org>
Subject: Re: kexec: device shutdown vs. remove

Benjamin Herrenschmidt <benh@...nel.crashing.org> writes:

> Hi !
>
> This is somewhat of a recurring issue, some of my previous attempts on
> lkml, I suspect, were just drowned in the noise. Eric, we had a quick
> discussion about this a while back but I don't think we reached a
> conclusion.
>
> A bit of context: On OpenPOWER machines, we have a Linux based
> bootloader, so we rely heavily on kexec to boot distro kernels, and
> this has been causing us grief, mostly in the device driver space.
>
> Device drivers need to be quiesced before kexec. More specifically
> the device *hardware* needs that, ie we want DMAs to stop and the
> device to be put into a state where it can reliably be picked up by the
> driver in the new kernel.
>
> Today, kexec calls device_shutdown() to achieve that. I argue that this
> is the wrong thing to do and instead we should do someting that causes
> the various drivers ->remove() function to be called (whether that
> implies actually unbinding the driver or not).
>
> I believe we do this for historical reasons, as ->remove() used to
> depend on CONFIG_HOTPLUG while ->shutdown() was always around but that
> is no longer the case.
>
> The most visible issue with ->shutdown() that we encouter is that a lot
> of drivers simply don't implement it.
>
> The *real* issue however is that it's the wrong thing to do anyway. It
> is a call intended to be called when the machine will be shutdown, as
> such not only it is very much optional (and rarely implemented), but it
> can also (and will in some cases) power bits of hardware off which is
> not what you want to do if a new driver will try to pick up the pieces.
>
> Arguably, the most correct semantic is provided by ->remove() since
> that corresponds to removing a driver and binding a new one to the
> device. IE. the same flow as doing rmmod/insmod of a new driver.
>
> In practice, we obseve that a lot more drivers implement ->remove(). A
> few were "fixed" to have ->shutdown() for kexec stake over time, but in
> many case it's a duplication of ->remove() (ugh...).
>
> So I would like to discuss this or at least get feedback and an overall
> agreement. I can provide patches to test fairly soon.

I thought I had given that feedback awhile ago.

To recap.  I wanted the reboot path and the kexec path to be the same.
(Because arguably they are the same and have the same requirements,
 although a lot of firmware toggles the machines reset line in that case
 making that less true).
 
People didn't want to have all of the non-hardware specific cleanup
people do in the reboot path because it might cause problems with
machines.  So shutdown was born.

In practice as you have observed the remove code is tested and the
shutdown code is not.

In practice we have an emergency reboot path that doesn't do any
hardware shutdown.  Which probably better fills the original need
of a reboot that doesn't spend time cleaning up.

If you are willing to do the work to merge shutdown into remove and
simplify the drivers, perform the testing and the other state I am in
favor of the change.  I think we have had enough time to see if have two
methods was maintainable for the driver authors.

Eric


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ