lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 18 Oct 2020 12:56:02 +0200
From:   Ard Biesheuvel <ardb@...nel.org>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Daniel Thompson <daniel.thompson@...aro.org>,
        Sumit Garg <sumit.garg@...aro.org>,
        Alex Bennée <alex.bennee@...aro.org>,
        Masami Hiramatsu <mhiramat@...nel.org>, steve@...val.com,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        "open list:BPF JIT for MIPS (32-BIT AND 64-BIT)" 
        <netdev@...r.kernel.org>, Willy Liu <willy.liu@...ltek.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Sasha Levin <sashal@...nel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Masahisa Kojima <masahisa.kojima@...aro.org>
Subject: Re: realtek PHY commit bbc4d71d63549 causes regression

On Sun, 18 Oct 2020 at 12:35, Ard Biesheuvel <ardb@...nel.org> wrote:
>
> On Sun, 18 Oct 2020 at 01:02, Andrew Lunn <andrew@...n.ch> wrote:
> >
> > On Sun, Oct 18, 2020 at 12:19:25AM +0200, Ard Biesheuvel wrote:
> > > (cc'ing some folks who may care about functional networking on SynQuacer)
> > >
> > > On Sat, 17 Oct 2020 at 21:49, Andrew Lunn <andrew@...n.ch> wrote:
> > > >
> > > > > So we can fix this firmware by just setting phy-mode to the empty
> > > > > string, right?
> > > >
> > > > I've never actually tried it, but i think so. There are no DT files
> > > > actually doing that, so you really do need to test it and see. And
> > > > there might be some differences between device_get_phy_mode() and
> > > > of_get_phy_mode().
> > > >
> > >
> > > Yes, that works fine. Fixed now in the latest firmware build [0]
> >
> > Thanks for reporting back on that. It probably needs to be something
> > we always recommend for ACPI systems, either use "", or preferably no
> > phy-mode at all, and let the driver default to NA. ACPI and networking
> > is at a very early stage at the moment, since UEFA says nothing about
> > it. It will take a while before we figure out the best practices, and
> > some vendor gets something added to the ACPI specifications.
> >
>
> You mean MDIO topology, right? Every x86 PC and server in the world
> uses ACPI, and networking doesn't seem to be a problem there, so it is
> simply a matter of choosing the right abstraction level. I know this
> is much easier to achieve when all the network controllers are on PCIe
> cards, but the point remains valid: exhaustively describing the entire
> SoC like we do for DT is definitely not the right choice for ACPI
> systems. This also means that ACPI is simply not the right fit for all
> designs, and we should push back harder against the tick-the-box
> exercises that are going on to use ACPI for describing unusual designs
> that are never going to boot RHEL or other ACPI-only distros anyway.
>
> > > But that still leaves the question whether and how to work around this
> > > for units in the field. Ignoring the PHY mode in the driver would
> > > help, as all known hardware ships with firmware that configures the
> > > PHY with the correct settings, but we will lose the ability to use
> > > other PHY modes in the future, in case the SoC is ever used with DT
> > > based minimal firmware that does not configure networking.
> > >
> > > Since ACPI implies rich firmware, we could make ACPI probing of the
> > > driver ignore the phy-mode setting in the DSDT.
> >
> > I'm O.K. with that, for this driver, but please add a comment in the
> > code about why ACPI ignores DSDT, because of older broken firmware.
> >
>
> Sure.
>
> > > But if we don't do the same for DT, it would mean DT users are
> > > forced to upgrade their firmware, and hopefully do so before
> > > upgrading to a kernel that breaks networking.
> >
> > Nothing new there. As i said, we have been through this before with
> > the Atheros PHY and others.
> >
> > One option would be to move the DT into the kernel and fix it. Most
> > distributions already use the DT version shipped with the kernel, so
> > they would automatically get the fixed DT at the same time as the
> > kernel which needs the fix.
> >
>
> The DT is not a flat file that you can simply source from elsewhere -
> it is constructed at boot using firmware settings and other data that
> is different between systems.
>
>
> I do have a question about the history here, btw. As I understand it,
> before commit f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx
> delays config"), the phy-mode setting did not have any effect on
> Realtek PHYs in the first place, right? Since otherwise, this platform
> would never have worked from the beginning.
>
> So commit f81dadbcf7fd was backported to -stable, even though it
> didn't actually work, and had to be fixed in bbc4d71d63549bcd ("net:
> phy: realtek: fix rtl8211e rx/tx delay config"), which is the commit
> that causes the regression on these boards.
>
> So why was commit f81dadbcf7fd sent to -stable in the first place? It
> doesn't have a fixes tag or cc:stable, and given that it is not
> actually correct to begin with, there seems to be little justification
> for this. Surely, no platform exists in the field that requires this
> commit (as it is broken) or the followup fix (since only was never
> taken into account before)
>
> In summary, I think that - even if we agree that the way forward is to
> fix the firmware and make the driver do the right thing without any
> quirks - these patches do not belong in -stable, and should be
> reverted, unless anyone can point out a system that was working
> before, got broken and was fixed again by f81dadbcf7fd (i.e., a true
> regression)

Oops - f81dadbcf7fd was never backported, only the fix was. Apologies
for mixing that up.

However, that leaves the question why bbc4d71d63549bcd was backported,
although I understand why the discussion is a bit trickier there. But
if it did not fix a regression, only broken code that never worked in
the first place, I am not convinced it belongs in -stable.

Powered by blists - more mailing lists