lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190618204746.GE203031@google.com>
Date:   Tue, 18 Jun 2019 13:47:46 -0700
From:   Sandeep Patil <sspatil@...roid.com>
To:     Saravana Kannan <saravanak@...gle.com>
Cc:     Rob Herring <robh+dt@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Frank Rowand <frowand.list@...il.com>,
        Mark Rutland <mark.rutland@....com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        David Collins <collinsd@...eaurora.org>,
        devicetree@...r.kernel.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [RESEND PATCH v1 1/5] of/platform: Speed up
 of_find_device_by_node()

On Wed, Jun 12, 2019 at 12:29:18PM -0700, 'Saravana Kannan' via kernel-team wrote:
> On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@...nel.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> > <gregkh@...uxfoundation.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > > <gregkh@...uxfoundation.org> wrote:
> > > > >
> > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@...roid.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@...il.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Saravana,
> > > > > > > > >
> > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > > comments on the original?
> > > > > > > > >
> > > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > > >
> > > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > >
> > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > > >
> > > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > > correct?
> > > > > > > >
> > > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > > >
> > > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > > systems/archs too, but I can't speak for those).
> > > > > > > >
> > > > > > > > 3) Concerns about backwards compatibility
> > > > > > > >
> > > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > > >
> > > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > > functionalities but a limited driver.
> > > > > > >
> > > > > > > <snip>
> > > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > > load device drivers as modules for Android.
> > > > > > >
> > > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > > direction?
> > > > > >
> > > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > > for the kernel is you have to address these one by one and can't have
> > > > > > a generic property the driver core code can parse. After that's in
> > > > > > place, then maybe we can consider handling any additional dependencies
> > > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > > driver core already does that now?).
> > > > > >
> > > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > > platform's responsibility for ensuring a consistent state.
> > > > > >
> > > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > > userspace in order to make progress if dependencies are missing.
> > > > >
> > > > > People have tried to do this multiple times, and you never really know
> > > > > when "boot is done" due to busses that have discoverable devices and
> > > > > async probing of other busses.
> > > >
> > > > Yes, I know which is why I proposed the second name with more limited
> > > > meaning/function.
> > >
> > > I still don't want to have the kernel have to rely on this.
> > >
> > > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > > try to load init, but given initramfs and the fact that modules are
> > > > > usually included on them, that's not really a good indication that
> > > > > anything is "finished".
> > > > >
> > > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > > you should be finished now!", as that's an async notification that is
> > > > > going to be ripe for problems.
> > > >
> > > > The usecase I care about here is when the DT has the dependency
> > > > information, but the kernel doesn't have the driver and the dependency
> > > > is never resolved.
> > >
> > > Then we have the same situation as today and nothing different happens,
> > > right?
> >
> > Huh?
> >
> > This works today, but not for modules.
> 
> Replying to this a bit further down.
> 
> >
> > >
> > > > The same problem has to be solved with a
> > > > 'depends-on' property. This easily happens with a new DT with added
> > > > dependencies like pinctrl and an old kernel that doesn't have the
> > > > "new" driver.
> 
> Isn't this the perfect example of an "implicit dependency" in a DT
> node not being a mandatory dependency? The old kernel worked fine with
> older DT without the added pinctrl dependency, so treating it as a
> mandatory dependency seems wrong for that particular device?
> depends-on avoids all this because the older kernel won't parse
> depends-on. And for newer kernels, it'll parse only what depends-on
> says are dependencies and not make wrong assumptions.
> 
> > > > Another example is IOMMUs. We need some way to say stop
> > > > waiting for dependencies. It is really just a debug option (of course,
> > > > how to prevent a debug option from being used in production?). This
> > > > works now for built-in cases with the same late_initcall abuse.
> > >
> > > What is a debug option?  We need something "for real".
> > >
> > > > Using late_initcall_sync as an indicator has all the same problems
> > > > with userspace indicating boot finished. We should get rid of the
> > > > late_initcall_sync abuses and stop trying to work around them.
> > >
> > > I agree, but that's not the issue here.
> >
> > It is because the cover letter mentions it and downstream work around it.
> 
> This patch series is trying to remove the use of late_initcall_sync
> and instead relying on dependency information coming from DT. So, you
> are agreeing with the patchset.
> 
> > > > > I really like the "depends-on" information, as it shows a topology that
> > > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > > to be any other way to properly "know" this.
> > > >
> > > > As I said, DT *does* have this dependency information already. The
> > > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > > discuss dependencies the DT doesn't provide that the kernel needs.
> > >
> > > Where can the kernel probing be fixed to use it?  What am I missing that
> > > can be done instead of what this patchset does?
> >
> > Somewhere, either in each subsystem or in the DT or core code creating
> > struct devices, you need to iterate thru the dependencies. Take clocks
> > as an example:
> >
> > for each node:
> >   for each 'clocks' phandle
> >     Lookup struct device from clock phandle
> >     Add the clock provider struct device to node's struct device links
> >
> > Now, repeat this for regulators, interrupts, etc.
> 
> I'm more than happy to do this if the maintainers can accept this as a
> solution, but then we need to agree that we need an override property
> if the implicit dependency isn't a mandatory dependency.

I don't quite understand what you mean by "isn't a mandatory dependency"
here. I think IIUC, what Rob said will solve the probe order problem,
correct?

Is there a problem if we split this in two and handle the
late_initcall_sync() + regulators separately and solve the probe ordering
here as suggested above?

I know the original intention of the series is to resolve the
late_initcall_sync() assumption and probe order was a "side-effect". However,
I think probing in the dependency order is still extremely valuable and can
resolve boot time issues ahead of time.

> We also need
> to agree on how we do this without breaking backwards compatibility.
> Either as a config option for this feature or have a property in the
> "chosen" node to say it's okay to assume existing bindings imply
> mandatory dependencies (it's just describing the DT more explicitly
> and the kernel will use it to enable this feature).
> 
> Although regulator binding are a "problem" because the kernel will
> have to parse every property in a node to check if it ends with
> -supply and then treat it as if it's a regulator binding (even though
> it might not be). So regulators will need some kind of "opt out" in
> the kernel (not DT).

Agree and it is going to immediately conflict with 'power-supply' for
example. If we are going this route, then we need to fix and agree on
standard regulator bindings too and make the changes everywhere in the
kernel.

> 
> > This series is pretty much doing the same thing, you just have to
> > parse each provider rather than only 'depends-on'.
> >
> > One issue is the struct device for the dependency may not be created
> > yet. I think this series would have the same issue, but haven't dug
> > into how it avoids that or whether it just ignores it and falls back
> > to deferring probe.
> 
> The patch series handles this properly and doesn't fall back to
> deferred probing.
> 
> > I'm also not clear on how you create struct devices and add
> > dependencies before probing gets attempted. If a driver is already
> > registered, probe is going to be attempted before any dependencies are
> > added. I guess the issue is avoided with drivers being modules, but
> > any solution should work for built-in too.
> 
> This is also handled properly in the series.
> 
> I've actually boot tested both these scenarios you call out and the
> patch series handles them properly.
> 
> But you are missing the main point here. The goal isn't to just
> eliminate deferred probing (it's a great side effect even it it just
> stops 99% of them), but also remove the bad assumption that
> late_initcall_sync() means all the devices are probed. The suppliers
> need a better signal (which the patch series provides) to tell when
> they can "unfreeze" the resources left on at boot.
> 

Is the summary here that we need to figure out a different approach / fix
regulator framework, or something else ? It wasn't clear from all other
emails from this thread, sorry for noise if I missed it.

- ssp

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ