[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220325175437.xb4wcfia2c3dgk2v@skbuf>
Date: Fri, 25 Mar 2022 17:54:37 +0000
From: Vladimir Oltean <vladimir.oltean@....com>
To: Robin Murphy <robin.murphy@....com>
CC: Marc Zyngier <maz@...nel.org>,
"devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Rob Herring <robh+dt@...nel.org>,
Shawn Guo <shawnguo@...nel.org>, Leo Li <leoyang.li@....com>,
Biwen Li <biwen.li@....com>, "Z.Q. Hou" <zhiqiang.hou@....com>,
Kurt Kanzenbach <kurt@...utronix.de>,
Rasmus Villemoes <linux@...musvillemoes.dk>
Subject: Re: [RFC PATCH devicetree 00/10] Do something about ls-extirq
interrupt-map breakage
On Fri, Mar 25, 2022 at 10:34:13AM +0000, Robin Murphy wrote:
> On 2022-03-24 19:09, Vladimir Oltean wrote:
> > On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote:
> > > > I was just raising this as what I thought would be a simple and
> > > > non-controversial counter example to your remark "If you change something,
> > > > you *must* guarantee forward *and* backward compatibility."
> > >
> > > If you change something *in the binding*, which was implicit in the
> > > context, and makes no sense out of context.
> > >
> > > > Practically speaking, what has happened is that the board DT appeared in
> > > > kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to
> > > > enable PHY interrupts in kernel N+2. That DT update practically broke
> > > > kernel N from running correctly on DTs taken from kernel N+2 onwards.
> > > > This is the observable behavior, we can find as many justifications for
> > > > it as we wish.
> > >
> > > Well, you can also argue that the DT was broken at N and N+1 for not
> > > describing the HW correctly and completely. No binding has changed
> > > here. Your DT was incomplete, and someone fixed it for you.
> > >
> > > We can argue this things forever and a half. I've laid down the ground
> > > rules for the stuff I maintain. If you're not happy with this, you can
> > > fix it by either removing the NXP hardware from the tree, or taking
> > > over from me as the irqchip maintainer. I'd be perfectly happy with
> > > any (and even more, with both) of these outcomes.
> >
> > Ok, my intention wasn't to inflame you even though the way in which I
> > presented the problem might have suggested otherwise.
> >
> > With my developer hat I still don't agree with you even with the
> > additional clarification you've made that you were referring only to
> > bindings and not to any and all DT changes. The reason being that the DT
> > blob is a whole, and it doesn't matter if there's a regression because
> > of a binding change or something else, you still need to be prepared to
> > update it, sometimes in lockstep with the kernel, like it or not.
> >
> > But as a user, I just wanted to get an opinion from you what can we do
> > to deal better with this situation: optional interrupt provided by
> > device with missing driver, which of_irq_get() doesn't seem to understand.
>
> FWIW, of_irq_get() absolutely understands how to handle a missing IRQ
> provider driver; it returns -EPROBE_DEFER. If a caller considers the IRQ
> optional, then it's up to that caller to decide how long to keep waiting for
> the provider to appear until giving up and carrying on without it. If your
> phy driver is making the dumb decision to wait for ever for something which
> isn't critical, then you're free to fix it, or perhaps even propose for
> of_irq_get() to opt in to the driver_deferred_probe_check_state() mechanism
> if you believe it's a sufficiently general case.
Thanks, I really needed that suggestion, at the moment I made this
change which seems to do what I want when I force-disable the ls-extirq
driver (which in turn simulates an unwritten driver from an old kernel):
diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 1becb1a731f6..1c1584fca632 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -43,6 +43,11 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
int rc;
rc = fwnode_irq_get(child, 0);
+ /* Don't wait forever if the IRQ provider doesn't become available,
+ * just fall back to poll mode
+ */
+ if (rc == -EPROBE_DEFER)
+ rc = driver_deferred_probe_check_state(&phy->mdio.dev);
if (rc == -EPROBE_DEFER)
return rc;
The trickier part seems to be to adjust this change for older kernels
where the MDIO bus code calls of_irq_get() directly, and not get
regressions in. That, plus I need to understand what changes in behavior
when the irqchip driver is built as module and the MDIO bus driver is
built-in, and the "driver_deferred_probe_timeout" kernel boot parameter
has the default value of 0.
> If a new DT with an additional new property (either on an existing machine,
> or on a completely new machine which has the property from the start)
> exposes a bug in a driver, that's unfortunate, but it is entirely irrelevant
> to the ABI implications of changing the interpretation of an existing
> property.
>
> Robin.
Agree, but I'd like to at least be shot down for a point I _am_ trying
to make, not for one I am not.
When I resumed this discussion I wasn't really focused on the patch set
that proposed to change some DT bindings. That was a bad idea, I abandoned
it, issue was solved (more or less) through other means, end of story.
Instead I focused on one of the arguments that Marc brought, that being
able to roll back kernel independently of firmware is important.
As a realist, I can't help but remark that this is effectively a non-goal.
There is always a risk that old kernels are caught off guard by DT
changes which they aren't prepared to handle, and even if I was aware of
the fact that I'm making a breaking change for old kernels when I'm
adding the 'interrupts-extended' property to the PHY (which I wasn't),
I'd still do it 10 out of 10 times.
I guess I'll always treat the 'old kernel works with new DT blob' case
as a pleasant surprise rather than the inviolable norm that Marc is
trying to make it sound like. Mentality difference between NXP/Marvell
vs Socionext aside, I really don't see how you can systematically avoid
these issues, so it's just a losing game to try to claim that every
firmware blob should work with every kernel, for the simple reason that
you can't change the past.
As to whether this has any implication on the point you and Marc are
trying to make, that the firmware ABI contract shouldn't change,
maybe not, probably not, especially if there are alternatives. But bring
a more solid argument to the table. Changing DT bindings is not off the
table _because_ old kernels will stop working with new DT blobs.
Powered by blists - more mailing lists