[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4244c9a-28cb-7e37-684d-64e6cdc89b67@gmail.com>
Date: Thu, 17 Oct 2019 15:22:45 -0700
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <olteanv@...il.com>,
Florian Fainelli <f.fainelli@...il.com>, netdev@...r.kernel.org
Cc: Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
open list <linux-kernel@...r.kernel.org>, hkallweit1@...il.com,
bcm-kernel-feedback-list@...adcom.com, rmk+kernel@...linux.org.uk,
cphealy@...il.com, Jose Abreu <joabreu@...opsys.com>
Subject: Re: [PATCH net-next 2/2] net: phy: Add ability to debug RGMII
connections
On 10/17/2019 3:06 PM, Vladimir Oltean wrote:
>> +static int phy_rgmii_debug_rcv(struct sk_buff *skb, struct net_device
>> *dev,
>> + struct packet_type *pt, struct net_device *unused)
>> +{
>> + struct phy_rgmii_debug_priv *priv = pt->af_packet_priv;
>> + u32 fcs;
>> +
>> + /* If we receive something, the Ethernet header was valid and so was
>> + * the Ethernet type, so to re-calculate the FCS we need to undo
>> what
>> + * eth_type_trans() just did.
>> + */
>> + if (!__skb_push(skb, ETH_HLEN))
>> + return 0;
>
> Why would this return NULL?
I don't think it can, good point.
>
>> +
>> + fcs = phy_rgmii_probe_skb_fcs(skb);
>> + if (skb->len != priv->skb->len || fcs != priv->fcs) {
>
> I feel like this logic is broken. How do you know that this skb is that
> skb? Everybody else can still enqueue to the netdev, right?
That is true, so I could be defeated by someone sending an Ethernet
Frame with a 0xdada ethernet type through, e.g.: raw sockets, good point.
>
> Actually if I'm right about the FCS errors resulting in drops below,
> then any news here is good news, no need to even compare the FCS of two
> frames which you don't know whether they're in fact one and the same.
FCS is a bit overstated here, although it actually is what the HW would
generate/verify but the point was really that if you have a RGMII issue
you may very well end-up with two packets instead of one, because of the
clock/data misalignment.
>
>> + print_hex_dump(KERN_INFO, "RX probe skb: ",
>> + DUMP_PREFIX_OFFSET, 16, 1, skb->data, 32,
>> + false);
>> + netdev_warn(dev, "Calculated FCS: 0x%08x expected: 0x%08x\n",
>> + fcs, priv->fcs);
>> + } else {
>> + priv->rcv_ok = 1;
>> + }
>> +
>> + complete(&priv->compl);
>> +
>> + return 0;
>> +}
>> +
>> +static int phy_rgmii_trigger_config(struct phy_device *phydev,
>> + phy_interface_t interface)
>> +{
>> + int ret = 0;
>> +
>> + /* Configure the interface mode to be tested */
>> + phydev->interface = interface;
>> +
>> + /* Forcibly run the fixups and config_init() */
>> + ret = phy_init_hw(phydev);
>> + if (ret) {
>> + phydev_err(phydev, "phy_init_hw failed: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + /* Some PHY drivers configure RGMII delays in their config_aneg()
>> + * callback, so make sure we run through those as well.
>> + */
>> + ret = phy_start_aneg(phydev);
>> + if (ret) {
>> + phydev_err(phydev, "phy_start_aneg failed: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + /* Put back in loopback mode since phy_init_hw() may have issued
>> + * a software reset.
>> + */
>> + ret = phy_loopback(phydev, true);
>> + if (ret)
>> + phydev_err(phydev, "phy_loopback failed: %d\n", ret);
>> +
>> + return ret;
>> +}
>> +
>> +static void phy_rgmii_probe_xmit_work(struct work_struct *work)
>> +{
>> + struct phy_rgmii_debug_priv *priv;
>> +
>> + priv = container_of(work, struct phy_rgmii_debug_priv, work);
>> +
>> + dev_queue_xmit(priv->skb);
>
> Oops, you just lost ownership of priv->skb here. Anything happening
> further is in a race with the netdev driver. You need to hold a
> reference to it with skb_get().
Doh, yes, thanks!
>
>> +}
>> +
>> +static int phy_rgmii_prepare_probe(struct phy_rgmii_debug_priv *priv)
>> +{
>> + struct phy_device *phydev = priv->phydev;
>> + struct net_device *ndev = phydev->attached_dev;
>> + struct sk_buff *skb;
>> + int ret;
>> +
>> + skb = netdev_alloc_skb(ndev, ndev->mtu);
>> + if (!skb)
>> + return -ENOMEM;
>> +
>> + priv->skb = skb;
>
> Could you assign priv->skb at the end, not here? This way you won't risk
> leaking a freed pointer into priv->skb if eth_header below fails.
Makes sense.
>
>> + skb->dev = ndev;
>> + skb_put(skb, ndev->mtu);
>> + memset(skb->data, 0xaa, skb->len);
>> +
>
> I think you need to do something like this before skb_put:
>
> + skb->protocol = htons(ETH_P_EDSA);
> + skb_reset_network_header(skb);
> + skb_reset_transport_header(skb);
>
> Otherwise I get a lot of these errors on a bridged net device:
>
> [ 142.919783] protocol 0000 is buggy, dev swp2
> [ 142.924436] protocol 0000 is buggy, dev eth2
>
>> + /* Build the header */
>> + ret = eth_header(skb, ndev, ETH_P_EDSA, ndev->dev_addr,
>> + NULL, ndev->mtu);
>
> A switch net device will complain about having SMAC == DMAC and drop the
> frame. Don't you want to send broadcast frames here?
Yes, that makes sense, if you do not have broadcast in your network
filter, your network adapter is not great use.
>
>> + if (ret != ETH_HLEN) {
>> + kfree_skb(skb);
>> + return -EINVAL;
>> + }
>> +
>> + priv->fcs = phy_rgmii_probe_skb_fcs(skb);
>> +
>
> I'm far from a checksumming expert, but if the FCS was invalid, wouldn't
> the RX MAC just drop the frame?
Depends if the user has requested NETIF_F_RXALL, this was just a
convenient way to produce a strong enough checksum to compare against,
the HW will have to insert it and strip it back on its way back to itself.
>
>> + return 0;
>> +}
>> +
>> +static int phy_rgmii_probe_interface(struct phy_rgmii_debug_priv *priv,
>> + phy_interface_t iface)
>> +{
>> + struct phy_device *phydev = priv->phydev;
>> + struct net_device *ndev = phydev->attached_dev;
>> + unsigned long timeout;
>> + int ret;
>> +
>> + ret = phy_rgmii_trigger_config(phydev, iface);
>> + if (ret) {
>> + netdev_err(ndev, "%s rejected by driver(s)\n",
>> + phy_modes(iface));
>> + return ret;
>> + }
>> +
>> + netdev_info(ndev, "Trying \"%s\" PHY interface\n",
>> phy_modes(iface));
>> +
>> + /* Prepare probe frames now */
>> + ret = phy_rgmii_prepare_probe(priv);
>> + if (ret)
>> + return ret;
>> +
>> + priv->rcv_ok = 0;
>> + reinit_completion(&priv->compl);
>> +
>> + cancel_work_sync(&priv->work);
>> + schedule_work(&priv->work);
>> +
>> + timeout = wait_for_completion_timeout(&priv->compl,
>> + msecs_to_jiffies(3000));
>> + if (!timeout) {
>> + netdev_err(ndev, "transmit timeout!\n");
>> + ret = -ETIMEDOUT;
>> + goto out;
>> + }
>> +
>> + ret = priv->rcv_ok == 1 ? 0 : -EINVAL;
>> +out:
>> + phy_loopback(phydev, false);
>> + dev_consume_skb_any(priv->skb);
>
> Don't consume the skb if the xmit has timed out. The driver will have
> already freed it in that case, leading to:
>
> [ 145.994328] sja1105 spi0.1 swp2: transmit timeout!
> [ 145.999259] ------------[ cut here ]------------
> [ 146.003901] WARNING: CPU: 0 PID: 163 at lib/refcount.c:190
> refcount_sub_and_test_checked+0xb8/0xc0
> [ 146.013029] refcount_t: underflow; use-after-free.
>
> That means, in practice, moving the kfree_skb call to phy_rgmii_debug_rcv.
>
>> + return ret;
>> +}
>> +
>> +static struct packet_type phy_rgmii_probes_type __read_mostly = {
>> + .type = cpu_to_be16(ETH_P_EDSA),
>> + .func = phy_rgmii_debug_rcv,
>> +};
>> +
>> +static int phy_rgmii_can_debug(struct phy_device *phydev)
>> +{
>> + struct net_device *ndev = phydev->attached_dev;
>> +
>> + if (!ndev) {
>> + netdev_err(ndev, "No network device attached\n");
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + if (!phy_interface_is_rgmii(phydev)) {
>> + netdev_info(ndev, "Not RGMII configured, nothing to do\n");
>> + return 0;
>> + }
>> +
>> + if (!phydev->is_gigabit_capable) {
>> + netdev_err(ndev, "not relevant in non-Gigabit mode\n");
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + if (phy_driver_is_genphy(phydev) ||
>> phy_driver_is_genphy_10g(phydev)) {
>> + netdev_err(ndev, "only relevant with non-generic drivers\n");
>> + return -EOPNOTSUPP;
>> + }
>> + return 1;
>> +}
>> +
>> +int phy_rgmii_debug_probe(struct phy_device *phydev)
>> +{
>> + struct net_device *ndev = phydev->attached_dev;
>> + unsigned char operstate = ndev->operstate;
>> + phy_interface_t rgmii_modes[4] = {
>> + PHY_INTERFACE_MODE_RGMII,
>> + PHY_INTERFACE_MODE_RGMII_ID,
>> + PHY_INTERFACE_MODE_RGMII_RXID,
>> + PHY_INTERFACE_MODE_RGMII_TXID
>> + };
>> + struct phy_rgmii_debug_priv *priv;
>> + unsigned int i, count;
>> + int ret;
>> +
>> + ret = phy_rgmii_can_debug(phydev);
>> + if (ret <= 0)
>> + return ret;
>> +
>> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> + if (!priv)
>> + return -ENOMEM;
>> +
>> + if (phy_rgmii_probes_type.af_packet_priv)
>> + return -EBUSY;
>> +
>> + phy_rgmii_probes_type.af_packet_priv = priv;
>> + priv->phydev = phydev;
>> + INIT_WORK(&priv->work, phy_rgmii_probe_xmit_work);
>> + init_completion(&priv->compl);
>> +
>> + /* We are now testing this network device */
>> + ndev->operstate = IF_OPER_TESTING;
>> +
>
> Shouldn't you put the netdev in promisc mode somewhere?
If we send with a broadcast MAC SA (which is a good suggestion) and our
own MAC DA, then no.
[snip]
>>
>
> Despite the above, I couldn't actually get this running successfully. At
> the end of the test I always get "-bash: echo: write error: Connection
> timed out".
> It's a fun toy, but I don't really think it's very useful in catching
> any bug.
Looks like it just did, with itself :)
> It's basically a glorified ping test, and brainless ping tests are
> precisely the reason why people get this wrong most of the time. You
> can't have a generic software tool identify for you a configuration
> problem that depends entirely upon a private hardware implementation of
> a specification that is vague.
>
> I mean in theory, the arithmetic is simple enough for a MAC-to-PHY
> connection. These 2 equalities always need to hold true:
>
> MAC TX delay + PCB TX delay + PHY TX delay == 1
> MAC RX delay + PCB RX delay + PHY RX delay == 1
>
> meaning that delays in each direction need to be applied at most once.
>
> For a PHY-to-MAC connection, there is this unwritten Linux rule that the
> PHY should apply the requested delays in both directions. This already
> contradicts common sense, as it is not uncommon, from a hardware point
> of view, for each device to add the delays in its own TX direction (so
> the MAC would add the TX delays and the PHY would add the RX delays).
> That is not possible to specify with Linux. But let's go with the flow.
> So the PHY adds all specified delays, and one can assume that the
> unspecified delays up to rgmii-id were added by the PCB. This small
> kernel thread would basically probe for PCB delays, in this case,
> assuming that the MAC driver and the PHY driver are both compliant.
>
> Let's say there is more than one phy-mode that works. Andrew said to
> raise a red flag in that case, because the PHY driver is surely not
> doing the right thing with the delays. But:
> - Maybe it is, but the equalities above aren't completely set in stone.
> Maybe the inserted propagation delays aren't high enough that two of
> them would break the link again.
> - Which of the multiple phy-mode configurations that work is the right
> one? A tool that can't tell me that is pointless, IMO. My PHY works due
> to pin strapping, but the driver is buggy. Do I care? No, as long as it
> works, and as long as it will continue to work after somebody fixes the
> driver. How do I know what delay mode is right? Well, of course, if it
> works with the configuration out of pin strapping, then obviously I
> should put the pin strapping settings in the DT. End of story. Can this
> kernel thread tell me that? No....
>
> And then, there's the RGMII fixed-link. The rules are cloudy for that
> one, because now there's potentially 2 phy-modes that operate on the
> same link. To complicate matters even further, your patch does not
> consider the fixed-link (no PHY) case, and there is no generic interface
> to even add selftests for that in the future. You would need to unbind
> the MAC driver, mangle the DT bindings, then bind it back again...
>
> I guess I'm just concerned about the chaos that a tool returning false
> positives would create for people who don't really follow what's going
> on ("look, but the tool said this!").
And maybe I should have marked this RFC, the commit subject is clear
that this not fool proof, it cannot be, for all the reasons you
outlined. The thing is that I have spent many hours of my life (like
you, like Andrew) helping people troubleshoot why RGMII does not work,
if we have a good litmus test we can submit, that gets us half-way there.
I am completely fine dropping this if you believe this is going to cause
more harm than good.
--
Florian
Powered by blists - more mailing lists