linux-kernel - Re: [PATCH] regulator: wm8994: Use PROBE_FORCE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=WUCqhq-wCoiir-1sNQpTNJfr-c2vAYvyJc6hKi8U4u_w@mail.gmail.com>
Date:   Fri, 24 Mar 2023 08:06:15 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Charles Keepax <ckeepax@...nsource.cirrus.com>
Cc:     Marek Szyprowski <m.szyprowski@...sung.com>,
        linux-kernel@...r.kernel.org, linux-samsung-soc@...r.kernel.org,
        Liam Girdwood <lgirdwood@...il.com>,
        Mark Brown <broonie@...nel.org>, patches@...nsource.cirrus.com
Subject: Re: [PATCH] regulator: wm8994: Use PROBE_FORCE_SYNCHRONOUS

Hi,

On Fri, Mar 24, 2023 at 2:23 AM Charles Keepax
<ckeepax@...nsource.cirrus.com> wrote:
>
> On Thu, Mar 23, 2023 at 11:00:32AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, Mar 23, 2023 at 10:45 AM Charles Keepax
> > <ckeepax@...nsource.cirrus.com> wrote:
> > >
> > > I think really the best place to look at this would be at the
> > > regulator level. It is fine if mfd_add_devices passes, the problem
> > > really is that the regulator core doesn't realise the regulator is
> > > going to be arriving, and thus returns a dummy regulator, rather
> > > than returning EPROBE_DEFER. If it did the MFD driver would probe
> > > defer at the point of requesting the regulator, which would all
> > > make sense.
> >
> > I think something like your suggestion could be made to work for the
> > "microphone" supply in the arizona MFD, but not for the others looked
> > at here.
> >
> > The problem is that if the MFD driver gets -EPROBE_DEFER then it will
> > go through its error handling path and call mfd_remove_devices().
> > That'll remove the sub-device providing the regulator. If you try
> > again, you'll just do the same. :-)
> >
> > Specifically in wm8994 after we've populated the regulator sub-devices
> > then we turn them on and start talking to the device.
> >
> > I think the two options I have could both work for wm8994's case:
> > either add some type of "my children have done probing" to MFD and
> > move the turning on of regulators / talking to devices there, or add
> > another stub-device and add it there. ;-)
>
> Is this true if we keep the regulator as sync though? Yes it will
> remove the children but when it re-adds them the reason that the
> regulator probe deferred in the first place will hopefully be
> removed. So it will now fully probe in path.

Ah, I see. So you keep it as synchronous _and_ make it so that it
won't get a dummy. Yeah, I was trying to brainstorm ways we could fix
it if we kept the regulator async. If we keep it as sync and fix the
dummy problem then, indeed, it should work as you say.

I've spent a whole lot of time dealing with similar issues, though,
and I think there is actually another related concern with that design
(where the regulator is synchronous). ;-) If the child device ends up
depending on a resource that _never_ shows up then you can get into an
infinite probe deferral loop at bootup. If it works the way it did
last time I analyzed similar code:

1. Your MFD starts to probe and kicks off probing of its children
(including the regulator).

2. Your regulator tries to probe and tries to get a resource that will
never exist, maybe because of a bug in dts or maybe because it won't
show up until userspace loads a module. It returns -EPROBE_DEFER.

3. The MFD realizes that the regulator didn't come up and it also
returns -EPROBE_DEFER after removing all its children.

4. That fact that we added/removed devices in the above means that the
kernel thinks it should retry probes of previously deferred devices
because, maybe, the device showed up that everyone was waiting for.
Thus, we go back to step #1.

...the system can actually loop forever in steps #1 - #4 and we ended
up in that situation several times during development with similar
architected systems.

-Doug