lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 21 Jul 2022 01:44:47 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Alvin __ipraga <alsi@...g-olufsen.dk>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Daniel Scally <djrscally@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        DENG Qingfang <dqfext@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        George McCollister <george.mccollister@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Hauke Mehrtens <hauke@...ke-m.de>,
        Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Landen Chao <Landen.Chao@...iatek.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org,
        Matthias Brugger <matthias.bgg@...il.com>,
        netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Sakari Ailus <sakari.ailus@...ux.intel.com>,
        Sean Wang <sean.wang@...iatek.com>,
        UNGLinuxDriver@...rochip.com,
        Vivien Didelot <vivien.didelot@...il.com>,
        Woojung Huh <woojung.huh@...rochip.com>,
        Marek BehĂșn <kabel@...nel.org>
Subject: Re: [PATCH net-next 3/6] net: dsa: add support for retrieving the
 interface mode

On Mon, Jul 18, 2022 at 09:48:51AM +0100, Russell King (Oracle) wrote:
> > But drivers could also have their CPU port working simply because those
> > are internal to an SoC and don't need any software configuration to pass
> > traffic. In their case, there is no breakage caused by the phylink_pcs
> > conversion, but breakage caused by sudden registration of phylink is
> > plausible, if phylink doesn't get the link parameters right.
> > 
> > And that breakage is preventable. Gradually more drivers could be
> > converted to create a fixed-link software node by printing a warning
> > that they should, and still keep the logic to avoid phylink registration
> > and putting the respective port down. Driver authors might not be very
> > responsive to RFC patch sets, but they do look at new warnings in dmesg
> > and try to see what they're about.
> 
> Are you going to do that conversion then? Good luck trying to find all
> the drivers, sending out series of patches, trying to get people to test
> the changes.

Not sure if that's a rhetorical question and if it is, what it's trying
to prove. Of course I'm not going to do any conversion, that's literally
my whole point, I'd rather err on the side of letting sleeping dogs cry
than force-converting everyone at once.

If there is a breakage report and the driver maintainer won't respond,
then yeah, maybe I'll consider looking at that particular issue and
converting if it helps, but that's kind of why I'm here.

Otherwise, we may end up pushing phylink to drivers which have some
wacky link speeds on the CPU port (like 2000, see b53_force_port_config),
which the software node auto-creation logic absolutely won't get right.
I know Florian said that "we won't see a regression since we do not use
the NATP accelerator which would be the reason to run the port at
2Gbits/sec", but frankly I'm not entirely sure what that even means or
what Florian counts as "regression". Lower overall termination throughput
counts or not? Still not worth risking if this isn't the only instance
of a non-standard speed.

> > What I'm missing is the proof that the phylink_pcs conversion has broken
> > those kinds of switches, and that it's therefore pointless to keep the
> > existing logic for them. Sorry, but you didn't provide it.
> 
> I don't have evidence that existing drivers have broken because I don't
> have the hardware to test. I only have Marvell DSA switch hardware and
> that is it.
> 
> Everything else has to be based on theory because no one bothers to
> respond to my patches, so for 99% of the DSA drivers I'm working in the
> dark - and that makes development an utter shitpile of poo in hell.
> 
> As I've said many times, we have NO CLUE which DSA drivers make use of
> this defaulting behaviour - the only one I'm aware of that does is
> mv88e6xxx. It all depends on the firmware description, driver behaviour
> and hardware behaviour. There is not enough information in the kernel 
> to be able to derive this.
> 
> If there was a reported regression, then I would be looking to get this
> into the NET tree not the NET-NEXT tree.

I think there are authors who weren't even aware they were opting into
this interesting DSA feature when they wrote their driver, but didn't
have the inspiration at the time to validate strict DT bindings either.

We have no proof that they don't have DT blobs using the defaulting
feature somewhere out there, but we don't have any proof that they do,
either. The whole problem could become more manageable if we would let
maintainers say to a user "hey, for my driver this feature was never
intended to work, so sorry if by accident it did, it was a marginal and
undocumented condition and now it's broken, so please use something that
was documented".

The ocelot driver is in this exact state, in fact. I really wish there
was a ready-made helper for validating phylink's OF node; I mentioned this
already, it needs to cater for all of fixed-link/phy-handle/managed/sfp.
The more drivers would be calling this (I think the vast majority of
post-phylink drivers would do it), the more we'd minimize the spectrum
of unforeseen breakage. In turn this would make the issue more tractable.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ