[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=Vow5_jv=-O=f2v4_5Nb4DiOUB1sQUx6r=-y5A-6rP4hw@mail.gmail.com>
Date: Mon, 30 Nov 2020 09:04:15 -0800
From: Doug Anderson <dianders@...omium.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Arnd Bergmann <arnd@...nel.org>, SoC Team <soc@...nel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ulf Hansson <ulf.hansson@...aro.org>
Subject: Re: [GIT PULL] ARM: SoC fixes for v5.10, part 3
Hi,
On Fri, Nov 27, 2020 at 2:56 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Fri, Nov 27, 2020 at 12:51 PM Arnd Bergmann <arnd@...nel.org> wrote:
> >
> > - Some DT patches for the Rockchip RK3399 platform,
> > in particular fixing the MMC device ordering that
> > recently became nondeterministic with async probe.
>
> Uhhuh.
>
> I didn't realize this MMC breakage happened.
>
> That's just an MMC bug. Other subsystems have been able to do async
> probing without making device ordering be random.
>
> So this really smells wrong, and I just never realized.
>
> Added Douglas Anderson to the cc - the whole PROBE_PREFER_ASYNCHRONOUS
> behavior appears broken.
>
> You basically should do the device numbering synchronously (or better
> yet - asynchronously, but single-threaded for the subsystem), and then
> asynchronously probe the actual device details after you've numbered
> them reliably.
+Ulf too
IMO if a sane/consistent ordering is needed for MMC devices that the
correct solution is that they should be assigned statically and not
rely on the happenstance of which drivers happened to probe in the
system in which order. Assigning numbers statically is exactly what
we can do now in the device tree and I believe that the patches you
were merging from Arnd were doing exactly that. To me it feels like
this is the correct long term solution here.
Without static assignment, maybe we could do numbering of MMC devices
in some type of a pre-probe routine? Is that what you're suggesting?
If so, it seems like it might be a pretty big change to figure out how
that ought to work (if it can be made to work).
If numbering is done by a probe routine then I think we can't
guarantee order. Among other things, I think EPROBE_DEFER could cause
us problems, like in this example:
1. A regulator driver starts probing, but runs async to allow other
drivers to load.
2. The MMC driver's probe routine runs.
3. In the MMC driver's probe routine, we try to get the regulator.
Depending on how far the regulator driver has run (which may depend on
external factors), we might get it or we might be told EPROBE_DEFER.
Let's assume EPROBE_DEFER.
4. A second MMC driver's probe routine runs.
5. The regulator finishes loading.
6. The first MMC driver's probe routine runs again and works this time.
In this case the first MMC driver's probe routine ran _after_ the
second MMC driver's probe routine even without the MMC drivers being
async.
I believe there are other examples.
Now, of course, it should be noted that on a given board with a given
kernel configuration it might be impossible to ever hit a case where
the order was inconsistent. However, I believe that there are
legitimate cases where the order would have been inconsistent. Also
note that code changes to other parts of the system (like adding async
to the regulator subsystem) could suddenly introduce different
ordering even if it was previously impossible.
> This is not something new - we do this for pretty much all the other
> block devices, and MMC is just doing things wrong.
Just to confirm, I assume you are specifically excluding all block
devices attached via USB, right? I believe that, by its very nature,
it's nearly impossible to guarantee the ordering of USB device
enumeration especially with complex USB hierarchies, but I certainly
could be wrong. Until devices are enumerated, we can't necessarily
know how many block devices will be provided.
> See SCSI probing for the traditional horrible cases (where just
> spinning up a disk could take tens of seconds). "Slow probing" is not
> an excuse of "random ordering".
It's odd that you say that. I actually fought for static numbering
years ago and one of the arguments _against_ allowing static numbering
for MMC devices [1] was that, to quote:
> Exactly the same issue arises if you have more than one ATA or SCSI
> adapter in your PC - things can be probed out of order and end up
> in different /dev/sd* slots.
That was over 4 years ago now, so perhaps things have changed in the
meantime, or perhaps the person who said that was wrong.
Ah, I just re-read more of the old thread and you're probably talking
about what Peter refers to [2]. I guess, though, that must rely on
all parent devices (like PCIe) doing their probes in a fast enough and
synchronous way, right? ...and I guess it relies on parent devices
not running into problems like the EPROBE_DEFER problem that I talked
about above? Specifically, someone could have a MMC or SCSI adapter
in a PCIe slot, right? Until we probe the PCIe slot we don't know if
it provides any block devices.
> Behaving randomly is simply not acceptable.
Agreed, which is why I'm a fan of the static assignment solution.
In the end, I believe that we could revert my patches and go back to
numbering being consistent for most (but not all) use cases. Even
before my patches, MMC already did most of its slow stuff async and my
patches only save ~40ms of boot time per MMC device, so the solution
you talked about for SCSI was already there, but I wanted to save the
extra 40 ms. If we revert, though, I'm not sure how we will ever get
to the goal of "async probe by default" then, but maybe that was a
pipe dream? ...presumably we'd also agree that we'll be limiting how
much async work can happen at boot in general since anything in the
path that might lead to a block device needs to be synchronous /
ordered, right?
In any cases, I'm interested to hear thoughts from you and others on the above.
[1] https://lore.kernel.org/r/20160429181248.GW19428@n2100.arm.linux.org.uk/
[2] https://lore.kernel.org/r/5723FCE3.9070308@hurleysoftware.com/
-Doug
Powered by blists - more mailing lists