lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Sep 2021 14:48:48 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Saravana Kannan <saravanak@...gle.com>
Cc:     Andrew Lunn <andrew@...n.ch>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Russell King <linux@...linux.org.uk>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, Len Brown <lenb@...nel.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Vladimir Oltean <olteanv@...il.com>,
        "Cc: Android Kernel" <kernel-team@...roid.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>
Subject: Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD

On Tue, Sep 21, 2021 at 10:07 PM Saravana Kannan <saravanak@...gle.com> wrote:
>
> Sorry I've been busy with LPC and some other stuff and could respond earlier.
>
> On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@...n.ch> wrote:
> >
> > > It works at a device level, so it doesn't know about resources.  The
> > > only information it has is of the "this device may depend on that
> > > other device" type and it uses that information to figure out a usable
> > > probe ordering for drivers.
> >
> > And that simplification is the problem. A phandle does not point to a
> > device, it points to a resource of a device. It should really be doing
> > what the driver would do, follow the phandle to the resource and see
> > if it exists yet. If it does not exist then yes it can defer the
> > probe. If the resource does exist, allow the driver to probe.
> >
> > > Also if the probe has already started, it may still return
> > > -EPROBE_DEFER at any time in theory
> >
> > Sure it can, and does. And any driver which is not broken will
> > unregister its resources on the error path. And that causes users of
> > the resources to release them. It all nicely unravels, and then tries
> > again later. This all works, it is what these drivers do.
>
> One of the points of fw_devlink=on is to avoid the pointless deferred
> probes that'd happen in this situation. So saying "let this happen"
> when fw_devlink=on kinda beats the point of it. See further below.

Well, you need to define "pointless deferred probes" in the first
place.  fw_devlink adds deferred probes by itself, so why are those
not pointless whereas the others are?

> >
> > > However, making children wait for their parents to complete probing is
> > > generally artificial, especially in the cases when the children are
> > > registered by the parent's driver.  So waiting should be an exception
> > > in these cases, not a rule.
>
> Rafael,
>
> There are cases where the children try to probe too quickly (before
> the parent has had time to set up all the resources it's setting up)
> and the child defers the probe. Even Andrew had an example of that
> with some ethernet driver where the deferred probe is attempted
> multiple times wasting time and then it eventually succeeds.

You seem to be arguing that it may be possible to replace multiple
probe attempts that each are deferred with one probe deferral which
then is beneficial from the performance perspective.

Yes, there are cases like that, but when this is used as a general
rule, it introduces a problem if it does a deferred probe when there
is no need for a probe deferral at all (like in the specific problem
case at hand).  Also if the probing of the child is deferred just
once, adding an extra dependency on the parent to it doesn't really
help.

> Considering there's no guarantee that a device_add() will result in
> the device being bound immediately, why shouldn't we make the child
> device wait until the parent has completely probed and we know all the
> resources from the parent are guaranteed to be available? Why can't we
> treat drivers that assume a device will get bound as soon as it's
> added as the exception (because we don't guarantee that anyway)?

Because this adds artificial constraints that otherwise aren't there
in some cases to the picture and asking drivers to mark themselves as
"please don't add these artificial constraints for me" is not
particularly straightforward.  Moreover, it does that retroactively
causing things that are entirely correct and previously worked just
fine to now have to paint themselves red to continue working as
before.

The fact that immediate probe completion cannot be guaranteed in
general doesn't mean that it cannot be assumed in certain situations.
For example, a parent driver registering a child may know what the
child driver is and so it may know that the child will either probe
successfully right away or the probing of it will fail and your extra
constraint breaks that assumption.  You can't really know how many of
such cases there are and trying to cover them with a new flag is a
retroactive whack-a-mole game.

> Also, this assumption that the child will be bound successfully upon
> addition forces the parent/child drivers to play initcall chicken --
> the child's driver has to be registered before the parent's driver.

That's true, but why is this a general problem?  For example, they
both may be registered by the same function in the right order.
What's wrong with that?

> We should be getting away from those by fixing the parent driver that's
> making these assumptions (I'll be glad to help with that). We need to
> be moving towards reducing pointless deferred probes and initcall
> ordering requirements instead of saying "this bad assumption used to
> work, so allow me to continue doing that".

It is not always a bad assumption.  It may be code designed this way.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ