[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXGO=5MsbLYvng4JWdNhJ3Nb0TSFKvnT-ZhjF2xcO9dZaw@mail.gmail.com>
Date: Sun, 18 Oct 2020 12:35:13 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Andrew Lunn <andrew@...n.ch>
Cc: Daniel Thompson <daniel.thompson@...aro.org>,
Sumit Garg <sumit.garg@...aro.org>,
Alex Bennée <alex.bennee@...aro.org>,
Masami Hiramatsu <mhiramat@...nel.org>, steve@...val.com,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
"open list:BPF JIT for MIPS (32-BIT AND 64-BIT)"
<netdev@...r.kernel.org>, Willy Liu <willy.liu@...ltek.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Sasha Levin <sashal@...nel.org>,
Florian Fainelli <f.fainelli@...il.com>,
Heiner Kallweit <hkallweit1@...il.com>,
Masahisa Kojima <masahisa.kojima@...aro.org>
Subject: Re: realtek PHY commit bbc4d71d63549 causes regression
On Sun, 18 Oct 2020 at 01:02, Andrew Lunn <andrew@...n.ch> wrote:
>
> On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote:
> > (cc'ing some folks who may care about functional networking on SynQuacer)
> >
> > On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@...n.ch> wrote:
> > >
> > > > So we can fix this firmware by just setting phy-mode to the empty
> > > > string, right?
> > >
> > > I've never actually tried it, but i think so. There are no DT files
> > > actually doing that, so you really do need to test it and see. And
> > > there might be some differences between device_get_phy_mode() and
> > > of_get_phy_mode().
> > >
> >
> > Yes, that works fine. Fixed now in the latest firmware build [0]
>
> Thanks for reporting back on that. It probably needs to be something
> we always recommend for ACPI systems, either use "", or preferably no
> phy-mode at all, and let the driver default to NA. ACPI and networking
> is at a very early stage at the moment, since UEFA says nothing about
> it. It will take a while before we figure out the best practices, and
> some vendor gets something added to the ACPI specifications.
>
You mean MDIO topology, right? Every x86 PC and server in the world
uses ACPI, and networking doesn't seem to be a problem there, so it is
simply a matter of choosing the right abstraction level. I know this
is much easier to achieve when all the network controllers are on PCIe
cards, but the point remains valid: exhaustively describing the entire
SoC like we do for DT is definitely not the right choice for ACPI
systems. This also means that ACPI is simply not the right fit for all
designs, and we should push back harder against the tick-the-box
exercises that are going on to use ACPI for describing unusual designs
that are never going to boot RHEL or other ACPI-only distros anyway.
> > But that still leaves the question whether and how to work around this
> > for units in the field. Ignoring the PHY mode in the driver would
> > help, as all known hardware ships with firmware that configures the
> > PHY with the correct settings, but we will lose the ability to use
> > other PHY modes in the future, in case the SoC is ever used with DT
> > based minimal firmware that does not configure networking.
> >
> > Since ACPI implies rich firmware, we could make ACPI probing of the
> > driver ignore the phy-mode setting in the DSDT.
>
> I'm O.K. with that, for this driver, but please add a comment in the
> code about why ACPI ignores DSDT, because of older broken firmware.
>
Sure.
> > But if we don't do the same for DT, it would mean DT users are
> > forced to upgrade their firmware, and hopefully do so before
> > upgrading to a kernel that breaks networking.
>
> Nothing new there. As i said, we have been through this before with
> the Atheros PHY and others.
>
> One option would be to move the DT into the kernel and fix it. Most
> distributions already use the DT version shipped with the kernel, so
> they would automatically get the fixed DT at the same time as the
> kernel which needs the fix.
>
The DT is not a flat file that you can simply source from elsewhere -
it is constructed at boot using firmware settings and other data that
is different between systems.
I do have a question about the history here, btw. As I understand it,
before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx
delays config"), the phy-mode setting did not have any effect on
Realtek PHYs in the first place, right? Since otherwise, this platform
would never have worked from the beginning.
So commit f81dadbcf7fd was backported to -stable, even though it
didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net:
phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit
that causes the regression on these boards.
So why was commit f81dadbcf7fd sent to -stable in the first place? It
doesn't have a fixes tag or cc:stable, and given that it is not
actually correct to begin with, there seems to be little justification
for this. Surely, no platform exists in the field that requires this
commit (as it is broken) or the followup fix (since only was never
taken into account before)
In summary, I think that - even if we agree that the way forward is to
fix the firmware and make the driver do the right thing without any
quirks - these patches do not belong in -stable, and should be
reverted, unless anyone can point out a system that was working
before, got broken and was fixed again by f81dadbcf7fd (i.e., a true
regression)
Powered by blists - more mailing lists