[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPv3WKdWr0zfuTkK+x6u7C6FpFxkVtRFrEq1FvemVpLYw2+5ng@mail.gmail.com>
Date: Thu, 10 Dec 2020 18:43:50 +0100
From: Marcin Wojtas <mw@...ihalf.com>
To: Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Sasha Levin <sashal@...nel.org>,
Antoine Tenart <antoine.tenart@...tlin.com>,
stable@...r.kernel.org,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>,
Gabor Samu <samu_gabor@...oo.ca>,
Jon Nettleton <jon@...id-run.com>,
Andrew Elwell <andrew.elwell@...il.com>
Subject: Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper
Hi Russell,
czw., 10 gru 2020 o 16:46 Russell King - ARM Linux admin
<linux@...linux.org.uk> napisał(a):
>
> On Thu, Dec 10, 2020 at 03:35:29PM +0100, Marcin Wojtas wrote:
> > Hi Greg,
> >
> > śr., 9 gru 2020 o 11:59 Greg Kroah-Hartman
> > <gregkh@...uxfoundation.org> napisał(a):
> > > What part fixes the issue? I can't see it...
> >
> > I re-checked in my setup and here's the smallest part of the original
> > patch, that fixes previously described issue:
> > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > index e98be8372780..9d71a4fe1750 100644
> > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > @@ -4767,6 +4767,11 @@ static void mvpp2_port_copy_mac_addr(struct
> > net_device *dev, struct mvpp2 *priv,
> > eth_hw_addr_random(dev);
> > }
> >
> > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config *config)
> > +{
> > + return container_of(config, struct mvpp2_port, phylink_config);
> > +}
> > +
> > static void mvpp2_phylink_validate(struct phylink_config *config,
> > unsigned long *supported,
> > struct phylink_link_state *state)
> > @@ -5105,13 +5110,12 @@ static void mvpp2_gmac_config(struct
> > mvpp2_port *port, unsigned int mode,
> > static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode,
> > const struct phylink_link_state *state)
> > {
> > - struct net_device *dev = to_net_dev(config->dev);
> > - struct mvpp2_port *port = netdev_priv(dev);
> > + struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > bool change_interface = port->phy_interface != state->interface;
> >
> > /* Check for invalid configuration */
> > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
> > - netdev_err(dev, "Invalid mode on %s\n", dev->name);
> > + netdev_err(port->dev, "Invalid mode on %s\n", port->dev->name);
> > return;
> > }
> >
> > @@ -5151,8 +5155,7 @@ static void mvpp2_mac_link_up(struct
> > phylink_config *config,
> > int speed, int duplex,
> > bool tx_pause, bool rx_pause)
> > {
> > - struct net_device *dev = to_net_dev(config->dev);
> > - struct mvpp2_port *port = netdev_priv(dev);
> > + struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> > u32 val;
> >
> > if (mvpp2_is_xlg(interface)) {
> > @@ -5199,7 +5202,7 @@ static void mvpp2_mac_link_up(struct
> > phylink_config *config,
> >
> > mvpp2_egress_enable(port);
> > mvpp2_ingress_enable(port);
> > - netif_tx_wake_all_queues(dev);
> > + netif_tx_wake_all_queues(port->dev);
> > }
> >
> > static void mvpp2_mac_link_down(struct phylink_config *config,
>
> The problem is caused by this hack:
>
> /* Phylink isn't used as of now for ACPI, so the MAC has to be
> * configured manually when the interface is started. This will
> * be removed as soon as the phylink ACPI support lands in.
> */
> struct phylink_link_state state = {
> .interface = port->phy_interface,
> };
> mvpp2_mac_config(&port->phylink_config, MLO_AN_INBAND, &state);
> mvpp2_mac_link_up(&port->phylink_config, MLO_AN_INBAND,
> port->phy_interface, NULL);
>
> which passes an un-initialised (zeroed) port->phylink_config, as
> phylink is not used in ACPI setups.
>
> The problem occurs because port->phylink_config.dev (which is a
> NULL pointer in this instance) is passed to to_net_dev():
>
> #define to_net_dev(d) container_of(d, struct net_device, dev)
>
> Which then means netdev_priv(dev) attempts to dereference a not-quite
> NULL pointer, leading to an oops.
Correct. Hopefully the MDIO+ACPI patchset will land and I'll be able
to get rid of this hack and do not maintain dual handling paths any
longer.
>
> The problem here is that the bug was not noticed; it seems hardly
> anyone bothers to run mainline kernels with ACPI on Marvell platforms,
> or if they do, they don't bother reporting to mainline communities
> when they have problems. Likely, there's posts on some random web-based
> bulletin board or mailing list that kernel developers don't read
> somewhere complaining that there's an oops.
>
I must admit that due to other duties I did not follow the mainline
mvpp2 for a couple revisions (and I am not maintainer of it). However
recently I got reached-out directly by different developers - the
trigger was different distros upgrading the kernel above v5.4+ and for
some reasons the DT path is not chosen there (and the ACPI will be
chosen more and more in the SystemReady world).
> Like...
>
> https://lists.einval.com/pipermail/macchiato/2020-January/000309.html
> https://gist.github.com/AdrianKoshka/ff9862da2183a2d8e26d47baf8dc04b9
>
> This kind of segmentation is very disappointing; it means potentially
> lots of bugs go by unnoticed by kernel developers, and bugs only get
> fixed by chance. Had it been reported to somewhere known earlier
> this year, it is likely that a proper fix patch would have been
> created.
Totally agree. As soon as I got noticed directly it took me no time to
find the regression root cause.
>
> How this gets handled is ultimately up to the stable developers to
> decide what they wish to accept. Do they wish to accept a back-ported
> full version of my commit 6c2b49eb9671 ("net: mvpp2: add
> mvpp2_phylink_to_port() helper") that unintentionally fixed this
> unknown issue, or do they want a more minimal fix such as the cut-down
> version of that commit that Marcin has supplied.
>
> Until something changes in the way bugs get reported, I suspect this
> won't be the only instance of bug-fixing-by-accident happening.
>
Thank you Russell for your input.
Best regards,
Marcin
Powered by blists - more mailing lists