lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Jul 2022 09:48:51 +0100
From:   "Russell King (Oracle)" <>
To:     Vladimir Oltean <>
Cc:     Andrew Lunn <>,
        Heiner Kallweit <>,
        Alexandre Belloni <>,
        Alvin __ipraga <>,
        Andy Shevchenko <>,
        Claudiu Manoil <>,
        Daniel Scally <>,
        "David S. Miller" <>,
        DENG Qingfang <>,
        Eric Dumazet <>,
        Florian Fainelli <>,
        George McCollister <>,
        Greg Kroah-Hartman <>,
        Hauke Mehrtens <>,
        Heikki Krogerus <>,
        Jakub Kicinski <>,
        Kurt Kanzenbach <>,
        Landen Chao <>,
        Linus Walleij <>,,,,
        Matthias Brugger <>,, Paolo Abeni <>,
        "Rafael J. Wysocki" <>,
        Sakari Ailus <>,
        Sean Wang <>,,
        Vivien Didelot <>,
        Woojung Huh <>,
        Marek BehĂșn <>
Subject: Re: [PATCH net-next 3/6] net: dsa: add support for retrieving the
 interface mode

On Sat, Jul 16, 2022 at 03:36:08PM +0300, Vladimir Oltean wrote:
> On Sat, Jul 16, 2022 at 12:13:55PM +0100, Russell King (Oracle) wrote:
> > > > and is exactly what I'm trying to get rid of, so we have _consistency_
> > > > in the implementation, to prevent fuckups like I've created by
> > > > converting many DSA drivers to use phylink_pcs. Any DSA driver that
> > > > used a PCS for the DSA or CPu port and has been converted to
> > > > phylink_pcs support has been broken in the last few kernel cycles. I'm
> > > > trying to address that breakage before converting the Marvell DSA
> > > > driver - which is the driver that highlighted the problem.
> > > 
> > > You are essentially saying that it's of no use to keep in DSA the
> > > fallback logic of not registering with phylink, because the phylink_pcs
> > > conversions have broken the defaulting functionality already in all
> > > other drivers.
> > 
> > Correct, and I don't want these exceptions precisely because it creates
> > stupid mistakes like the one I've highlighted. If we have one way of
> > doing something, then development becomes much easier. When we have
> > multiple different ways, mistakes happen, stuff breaks.
> Agree that when we have multiple different ways, mistakes happen and
> stuff breaks. This is also why I want to avoid the proliferation of the
> default_interface reporting when most of the drivers were just fine
> without it. It needs to be opt-in, and checking for the presence of a
> dedicated function is an easy way to check for that opt-in.
> > > I may have missed something, but this is new information to me.
> > > Specifically, before you've said that it is *this* patch set which would
> > > risk introducing breakage (by forcing a link down + a phylink creation).
> > >
> > > What you're saying now directly contradicts that.
> > 
> > No, it is not contradictory at all.
> > 
> > There is previous breakage caused by converting DSA drivers to
> > phylink_pcs - because the PCS code will *not* be called where a driver
> > makes use of the "default-to-fastest-speed" mechanism, so is likely
> > broken. Some drivers work around this by doing things manually.
> Marvell conversion to phylink_pcs doesn't count as "previous" breakage
> if it's not in net-next.

Thank you for stating the obvious. I never claimed it was.

> > This series *also* risks breakage, because it means phylink gets used
> > in more situations which could confuse drivers - such as those which
> > have manually brought stuff up and are not expecting phylink to also
> > do it. Or those which do not fill in the mac_capabilities but do
> > default to this "fastest-speed" mechanism. Or some other unforseen
> > scenario.
> I don't exactly understand the mechanics through which a phylink_pcs
> conversion would break a driver that used defaulting behavior,
> but I can only imagine it involves moving code that didn't depend on
> phylink, to phylink callbacks. It's hard to say whether that happened
> for any of the phylink_pcs conversions in net-next.

Situation before phylink_pcs conversion:
- code paths that configure the PCS are integrated with the rest of the
  driver and would be called where appropriate.

Situation after phylink_pcs conversion:
- code paths that configure the PCS are no longer integrated with the
  rest of the driver and are now dependent on phylink being used for
  them to be called.

If there is no problem with doing this conversion, then why has the
proposed conversion of mv88e6xxx caused breakage when it was working
fine previously - it's because of exactly the above.

> But drivers could also have their CPU port working simply because those
> are internal to an SoC and don't need any software configuration to pass
> traffic. In their case, there is no breakage caused by the phylink_pcs
> conversion, but breakage caused by sudden registration of phylink is
> plausible, if phylink doesn't get the link parameters right.
> And that breakage is preventable. Gradually more drivers could be
> converted to create a fixed-link software node by printing a warning
> that they should, and still keep the logic to avoid phylink registration
> and putting the respective port down. Driver authors might not be very
> responsive to RFC patch sets, but they do look at new warnings in dmesg
> and try to see what they're about.

Are you going to do that conversion then? Good luck trying to find all
the drivers, sending out series of patches, trying to get people to test
the changes.

> What I'm missing is the proof that the phylink_pcs conversion has broken
> those kinds of switches, and that it's therefore pointless to keep the
> existing logic for them. Sorry, but you didn't provide it.

I don't have evidence that existing drivers have broken because I don't
have the hardware to test. I only have Marvell DSA switch hardware and
that is it.

Everything else has to be based on theory because no one bothers to
respond to my patches, so for 99% of the DSA drivers I'm working in the
dark - and that makes development an utter shitpile of poo in hell.

As I've said many times, we have NO CLUE which DSA drivers make use of
this defaulting behaviour - the only one I'm aware of that does is
mv88e6xxx. It all depends on the firmware description, driver behaviour
and hardware behaviour. There is not enough information in the kernel 
to be able to derive this.

If there was a reported regression, then I would be looking to get this
into the NET tree not the NET-NEXT tree.

RMK's Patch system:
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists