lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Sep 2016 22:11:06 +0200
From:   "Luis R. Rodriguez" <mcgrof@...nel.org>
To:     Ming Lei <ming.lei@...onical.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J. Wysocki" <rjw@...k.pl>, Takashi Iwai <tiwai@...e.de>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        NeilBrown <neilb@...e.com>, Oleg Nesterov <oleg@...hat.com>
Cc:     Daniel Wagner <wagi@...om.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Daniel Wagner <daniel.wagner@...-carit.de>,
        "Luis R . Rodriguez" <mcgrof@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Dmitry Torokhov <dmitry.torokhov@...il.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Arend Van Spriel <arend@...adcom.com>,
        Johannes Berg <johannes@...solutions.net>,
        Mimi Zohar <zohar@...ux.vnet.ibm.com>,
        David Howells <dhowells@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Jeff Mahoney <jeffm@...e.com>,
        Andy Lutomirski <luto@...capital.net>,
        Wu Fengguang <fengguang.wu@...el.com>,
        Julia Lawall <Julia.Lawall@...6.fr>
Subject: Re: [PATCH v4 1/4] firmware: Move umh locking code into
 fw_load_from_user_helper()

On Thu, Sep 08, 2016 at 11:37:54PM +0800, Ming Lei wrote:
> On Wed, Sep 7, 2016 at 4:45 PM, Daniel Wagner <wagi@...om.org> wrote:
> > From: Daniel Wagner <daniel.wagner@...-carit.de>
> >
> > When we load the firmware directly we don't need to take the umh
> > lock.
> 
> I am wondering if it can be wrong.

If you disable the firmware UMH why would we need to lock if the lock is being
shown only used for the firmare UMH ?

> Actually in case of firmware loading, the usermode helper lock doesn't
> only mean the user helper is usable, and it also may serve to mark the
> filesystem/block device is ready for firmware loading, and of couse direct
> loading need fs/block to be ready too.

Yes but that's a race I've identified a while ago present even if you use initramfs *and*
use kernel_read_file_from_path() on any part of the kernel [0], I proposed a possible
solution recently [1], given tons of folks were *complaining* about this but despite there
being one solution proposed [2] it was still incorrect, as only *userspace* can know
for sure when your critical filesystems are mounted. Since we now have a shared
"read file fs" helper kernel_read_file_from_path() it implicates the race is possible
elsewhere as well.

The race is present given you simply cannot know when the root filesystem
(consider a series of pivot_root() calls, hey -- anyone can do that, who are we to
tell them they can't), or in this particular case importance, only considering firmware,
when /lib/firmware/ is ready.  In short I am saying this whole race thing is a
bigger issue, and as much as Linus hates my proposed solution I have not heard
any proposal alternatives to address this properly, not even clarifications on
what our expectations for drivers then are if they use kernel_read_file_from_path()
early on their driver code.

The original goal of the usermode helper lock came can be traced back in
time via a144c6a ("PM: Print a warning if firmware is requested when task
are frozen) when Rafael noticed drivers were calling to request firmware
on *resume* calls! Why would that be an issue? It was because of the stupid
delays incurred on resume when firmware *was not found* __due__ to the stupid
user mode helper timeout delay and people believing this often meant users
confusing resume *stalling* for resume was *broken! Later b298d289c7
("PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()")
addressed races. That code in turn was later massaged into shape to better
accept direct FS loading via commit 4e0c92d015235 ("firmware: Refactoring for
splitting user-mode helper code").

So for a while we've been nagging driver developers to not add request_firmware()
calls on resume calls. In fact the drivers/base/firmware_class.c code *kills*
firmware UMH calls when we go to suspend for the firmware cache, as such even
on suspend its a stupid idea to use the UMH, not only because it can
stall suspend but *also* because we kill these UMH calls. Its why I recently
proposed removing the possibility to call the firmware UMH from the firmware
cache [3]. If this patch is wrong please do chime in!

So, as I see it the user mode helper lock should have *nothing* to do with
the race between reading files from the kernel and having a path ready. If
it was *used* for that, that was papering over the real issue. Its no
different of a hack for instance as if a driver using request_firmware() tried
to use async probe to avoid the same race. Yes it will help, but no, it does
not mean it is deterministically safe.

Reviewing commit 247bc03742545 ("PM / Sleep: Mitigate race between the freezer
and request_firmware()") which originally extended umh state machine from
just being enabled/disabled, with the concepts of UMH_ENABLED, UMH_FREEZING,
UMH_DISABLED -- its goal was to prevent UMH uses during suspend. So -- the
"UMH lock" on firmware was actually added to help avoid races between freezing
and request_firmware(). We should not re-use UMH status notifiers when the
firmware UMH is disabled for the same concepts -- if we needed such a concept
then we should take this out from UMH code and generalize it.

In fact this shows there's still an issue here for UMH code as well. The
usermodehelper_enable() call 

rest_init() --> kernel_thread(kernel_init, NULL, CLONE_FS); -->
	kernel_init --> kernel_init_freeable() --> wait_for_completion(&kthreadd_done);

So after kernel_init_freeable() kthreadd_done should be done but note that
only userspace will know what partition set up to use. Shortly after
this though in kernel_init_freeable() we call prepare_namespace() so if
initramfs was set up its set up after.

Meanwhile driver's inits are called via do_initcalls() called from do_basic_setup()
and that is called within kernel_init_freeable() *prior* to prepare_namespace().

So calling usermodehelper_enable() doesn't necessarily ensure that the paths
for the UMH used will actually work either. This path also has a race.

We need to address both things then:

  1) *freeze* races / stalls
  2)  userspace paths ready for whatever the kernel needs to read off of the
      filesystem

These issues are not particular to firmware -- this applies to all
kernel_read_file_from_path() and as is being revealed now the UMH code. It gets
me wondering if we have any shared code possible between UMH and
kernel_read_file_from_path().

For 1) we could just generalize the code.

For 2) I proposed a solution as RFC although some hate it, my latest preference
would be either *declare* these uses early reads clearly as not supported or
have a proper init level that does guarantee to be safe against the userspace
paths.

I'm all ears for alternatives.

[0] https://marc.info/?t=147286207700002&r=1&w=2
[1] http://lkml.kernel.org/r/CAB=NE6UBRa0K7=PomJzKxsoj4GzAqkYrkp=O+UfVvu2fwM25pA@mail.gmail.com
[2] https://lkml.org/lkml/2014/6/15/47
[3] https://lkml.kernel.org/r/1473208930-6835-6-git-send-email-mcgrof@kernel.org

  Luis

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ