[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201210154651.GV1551@shell.armlinux.org.uk>
Date: Thu, 10 Dec 2020 15:46:51 +0000
From: Russell King - ARM Linux admin <linux@...linux.org.uk>
To: Marcin Wojtas <mw@...ihalf.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Sasha Levin <sashal@...nel.org>
Cc: Antoine Tenart <antoine.tenart@...tlin.com>,
stable@...r.kernel.org,
Alexandre Belloni <alexandre.belloni@...tlin.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>,
Gabor Samu <samu_gabor@...oo.ca>,
Jon Nettleton <jon@...id-run.com>,
Andrew Elwell <andrew.elwell@...il.com>
Subject: Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port()
helper
On Thu, Dec 10, 2020 at 03:35:29PM +0100, Marcin Wojtas wrote:
> Hi Greg,
>
> śr., 9 gru 2020 o 11:59 Greg Kroah-Hartman
> <gregkh@...uxfoundation.org> napisał(a):
> > What part fixes the issue? I can't see it...
>
> I re-checked in my setup and here's the smallest part of the original
> patch, that fixes previously described issue:
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index e98be8372780..9d71a4fe1750 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -4767,6 +4767,11 @@ static void mvpp2_port_copy_mac_addr(struct
> net_device *dev, struct mvpp2 *priv,
> eth_hw_addr_random(dev);
> }
>
> +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config *config)
> +{
> + return container_of(config, struct mvpp2_port, phylink_config);
> +}
> +
> static void mvpp2_phylink_validate(struct phylink_config *config,
> unsigned long *supported,
> struct phylink_link_state *state)
> @@ -5105,13 +5110,12 @@ static void mvpp2_gmac_config(struct
> mvpp2_port *port, unsigned int mode,
> static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode,
> const struct phylink_link_state *state)
> {
> - struct net_device *dev = to_net_dev(config->dev);
> - struct mvpp2_port *port = netdev_priv(dev);
> + struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> bool change_interface = port->phy_interface != state->interface;
>
> /* Check for invalid configuration */
> if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
> - netdev_err(dev, "Invalid mode on %s\n", dev->name);
> + netdev_err(port->dev, "Invalid mode on %s\n", port->dev->name);
> return;
> }
>
> @@ -5151,8 +5155,7 @@ static void mvpp2_mac_link_up(struct
> phylink_config *config,
> int speed, int duplex,
> bool tx_pause, bool rx_pause)
> {
> - struct net_device *dev = to_net_dev(config->dev);
> - struct mvpp2_port *port = netdev_priv(dev);
> + struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> u32 val;
>
> if (mvpp2_is_xlg(interface)) {
> @@ -5199,7 +5202,7 @@ static void mvpp2_mac_link_up(struct
> phylink_config *config,
>
> mvpp2_egress_enable(port);
> mvpp2_ingress_enable(port);
> - netif_tx_wake_all_queues(dev);
> + netif_tx_wake_all_queues(port->dev);
> }
>
> static void mvpp2_mac_link_down(struct phylink_config *config,
The problem is caused by this hack:
/* Phylink isn't used as of now for ACPI, so the MAC has to be
* configured manually when the interface is started. This will
* be removed as soon as the phylink ACPI support lands in.
*/
struct phylink_link_state state = {
.interface = port->phy_interface,
};
mvpp2_mac_config(&port->phylink_config, MLO_AN_INBAND, &state);
mvpp2_mac_link_up(&port->phylink_config, MLO_AN_INBAND,
port->phy_interface, NULL);
which passes an un-initialised (zeroed) port->phylink_config, as
phylink is not used in ACPI setups.
The problem occurs because port->phylink_config.dev (which is a
NULL pointer in this instance) is passed to to_net_dev():
#define to_net_dev(d) container_of(d, struct net_device, dev)
Which then means netdev_priv(dev) attempts to dereference a not-quite
NULL pointer, leading to an oops.
The problem here is that the bug was not noticed; it seems hardly
anyone bothers to run mainline kernels with ACPI on Marvell platforms,
or if they do, they don't bother reporting to mainline communities
when they have problems. Likely, there's posts on some random web-based
bulletin board or mailing list that kernel developers don't read
somewhere complaining that there's an oops.
Like...
https://lists.einval.com/pipermail/macchiato/2020-January/000309.html
https://gist.github.com/AdrianKoshka/ff9862da2183a2d8e26d47baf8dc04b9
This kind of segmentation is very disappointing; it means potentially
lots of bugs go by unnoticed by kernel developers, and bugs only get
fixed by chance. Had it been reported to somewhere known earlier
this year, it is likely that a proper fix patch would have been
created.
How this gets handled is ultimately up to the stable developers to
decide what they wish to accept. Do they wish to accept a back-ported
full version of my commit 6c2b49eb9671 ("net: mvpp2: add
mvpp2_phylink_to_port() helper") that unintentionally fixed this
unknown issue, or do they want a more minimal fix such as the cut-down
version of that commit that Marcin has supplied.
Until something changes in the way bugs get reported, I suspect this
won't be the only instance of bug-fixing-by-accident happening.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists