[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250723220517.063c204b@wsk>
Date: Wed, 23 Jul 2025 22:05:17 +0200
From: Lukasz Majewski <lukma@...x.de>
To: Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, davem@...emloft.net, Eric Dumazet
<edumazet@...gle.com>, Rob Herring <robh@...nel.org>, Krzysztof Kozlowski
<krzk+dt@...nel.org>, Conor Dooley <conor+dt@...nel.org>, Shawn Guo
<shawnguo@...nel.org>, Sascha Hauer <s.hauer@...gutronix.de>, Pengutronix
Kernel Team <kernel@...gutronix.de>, Fabio Estevam <festevam@...il.com>,
Richard Cochran <richardcochran@...il.com>, netdev@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
imx@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org, Stefan Wahren
<wahrenst@....net>, Simon Horman <horms@...nel.org>
Subject: Re: [net-next v15 06/12] net: mtip: Add net_device_ops functions to
the L2 switch driver
Hi Jakub, Paolo,
Do you have more comments and questions regarding this driver after my
explanation?
Shall I do something more?
Thanks in advance for you feedback.
> Hi Jakub,
>
> > On Wed, 16 Jul 2025 23:47:25 +0200 Lukasz Majewski wrote:
> > > +static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
> > > + struct net_device *dev,
> > > int port) +{
> > > + struct mtip_ndev_priv *priv = netdev_priv(dev);
> > > + struct switch_enet_private *fep = priv->fep;
> > > + unsigned short status;
> > > + struct cbd_t *bdp;
> > > + void *bufaddr;
> > > +
> > > + spin_lock(&fep->hw_lock);
> >
> > I see some inconsistencies in how you take this lock.
> > Bunch of bare spin_lock() calls from BH context, but there's also
> > a _irqsave() call in mtip_adjust_link().
>
> In the legacy NXP (Freescale) code for this IP block (i.e. MTIP
> switch) the recommended way to re-setup it, when link or duplex
> changes, is to reset and reconfigure it.
>
> It requires setting up interrupts as well... In that situation, IMHO
> disabling system interrupts is required to avoid some undefined
> behaviour.
>
> > Please align to the strictest
> > context (not sure if the irqsave is actually needed, at a glance,
> > IOW whether the lock is taken from an IRQ)
>
> The spin_lock() for xmit port is similar to what is done for
> fec_main.c. As this switch uses single uDMA for both ports as well as
> there is no support (and need) for multiple queues it can be omitted.
>
> >
> > > + if (!fep->link[0] && !fep->link[1]) {
> > > + /* Link is down or autonegotiation is in
> > > progress. */
> > > + netif_stop_queue(dev);
> > > + spin_unlock(&fep->hw_lock);
> > > + return NETDEV_TX_BUSY;
> > > + }
> > > +
> > > + /* Fill in a Tx ring entry */
> > > + bdp = fep->cur_tx;
> > > +
> > > + /* Force read memory barier on the current transmit
> > > description */
> >
> > Barrier are between things. What is this barrier separating, and
> > what write barrier does it pair with? As far as I can tell cur_tx
> > is just a value in memory, and accesses are under ->hw_lock, so
> > there should be no ordering concerns.
>
> The bdp is the uDMA descritptor (memory allocated in the coherent dma
> area). It is used by the uDMA when data is transferred to MTIP switch
> internal buffer.
>
> The bdp->cbd_sc is a half word, which is modified by uDMA engine, to
> indicate if there are errors or transfer has ended.
>
> The rmb() shall improve robustness - it assures that the status
> corresponds to what was set by uDMA. On the other hand dma coherent
> allocation shall do this as well.
>
> The fec_main.c places the rmb() in similar places, so I followed their
> approach.
>
> >
> > > + rmb();
> > > + status = bdp->cbd_sc;
> > > +
> > > + if (status & BD_ENET_TX_READY) {
> > > + /* All transmit buffers are full. Bail out.
> > > + * This should not happen, since dev->tbusy
> > > should be set.
> > > + */
> > > + netif_stop_queue(dev);
> > > + dev_err(&fep->pdev->dev, "%s: tx queue full!.\n",
> > > dev->name);
> >
> > This needs to be rate limited, we don't want to flood the logs in
> > case there's a bug.
>
> +1
>
> >
> > Also at a glance it seems like you have one fep for multiple
> > netdevs.
>
> Yes.
>
> > So stopping one netdev's Tx queue when fep fills up will not stop
> > the other ports from pushing frames, right?
>
> This is a bit more complicated...
>
> Other solutions - like cpsw_new - are conceptually simple; there are
> two DMAs to two separate eth IP blocks.
> During startup two separate devices are created. When one wants to
> enable bridge (i.e. start in-hw offloading) - just single bit is setup
> and ... that's it.
>
> With vf610 / imx287 and MTIP it is a bit different (imx287 is even
> worse as second ETH interface has incomplete functionality by design).
>
> When switch is not active - you have two uDMA ports to two ENET IP
> blocks. Full separation. That is what is done with fec_main.c driver.
>
> When you enable MTIP switch - then you have just a single uDMA0 active
> for "both" ports. In fact you "bridge" two ports into a single one -
> that is why Freescale/NXP driver (for 2.6.y) just had eth0 to "model"
> bridged interfaces. That was "simpler" (PHY management was done in the
> driver as well).
>
> Now, in this driver, we do have two network devices, which are
> "bridged" (so there is br0). And of course there must be separation
> between lan0/1 when this driver is used, but bridge is not (yet)
> created. This works :-)
>
>
> So I do have - 2x netdevs (handled by single uDMA0) + 2PHYS + br0 +
> NAPI + switchdev (to avoid broadcast frame storms + {R}STP + FDB -
> WIP).
>
>
> Just pure fun :-) to model it all ... and make happy all maintainers
> :-)
>
> >
> > > + spin_unlock(&fep->hw_lock);
> > > + return NETDEV_TX_BUSY;
> > > + }
> > > +
> > > + /* Clear all of the status flags */
> > > + status &= ~BD_ENET_TX_STATS;
> > > +
> > > + /* Set buffer length and buffer pointer */
> > > + bufaddr = skb->data;
> > > + bdp->cbd_datlen = skb->len;
> > > +
> > > + /* On some FEC implementations data must be aligned on
> > > + * 4-byte boundaries. Use bounce buffers to copy data
> > > + * and get it aligned.spin
> > > + */
> > > + if ((unsigned long)bufaddr & MTIP_ALIGNMENT) {
> >
> > I think you should add
> >
> > if ... ||
> > fep->quirks & FEC_QUIRK_SWAP_FRAME)
> >
> > here. You can't modify skb->data without calling skb_cow_data()
> > but you already have buffers allocated so can as well use them.
>
> The vf610 doesn't need the frame to be swapped, but has requirements
> for alignment as well.
>
> I would keep things as they are now - as they just improve
> readability.
>
> Please keep in mind that this version only supports imx287, but the
> plan is to add vf610 as well (to be more specific - this driver also
> works on vf610, but I plan to add those patches after this one is
> accepted and pulled).
>
> >
> > > + unsigned int index;
> > > +
> > > + index = bdp - fep->tx_bd_base;
> > > + memcpy(fep->tx_bounce[index],
> > > + (void *)skb->data, skb->len);
> >
> > this fits on one 80 char line BTW, quite easily:
> >
> > memcpy(fep->tx_bounce[index], (void *)skb->data,
> > skb->len);
> >
> > Also the cast to void * is not necessary in C.
>
> +1
>
> >
> > > + bufaddr = fep->tx_bounce[index];
> > > + }
> > > +
> > > + if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > > + swap_buffer(bufaddr, skb->len);
> > > +
> > > + /* Save skb pointer. */
> > > + fep->tx_skbuff[fep->skb_cur] = skb;
> > > +
> > > + fep->skb_cur = (fep->skb_cur + 1) & TX_RING_MOD_MASK;
> >
> > Not sure if this is buggy, but maybe delay updating things until the
> > mapping succeeds? Fewer things to unwind.
>
> Yes, the skb storage as well as ring buffer modification can be done
> after dma mapping code.
>
> >
> > > + /* Push the data cache so the CPM does not get stale
> > > memory
> > > + * data.
> > > + */
> > > + bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev,
> > > bufaddr,
> > > + MTIP_SWITCH_TX_FRSIZE,
> > > + DMA_TO_DEVICE);
> > > + if (unlikely(dma_mapping_error(&fep->pdev->dev,
> > > bdp->cbd_bufaddr))) {
> > > + dev_err(&fep->pdev->dev,
> > > + "Failed to map descriptor tx buffer\n");
> > > + dev->stats.tx_errors++;
> > > + dev->stats.tx_dropped++;
> >
> > dropped and errors are two different counters
> > I'd stick to dropped
>
> Ok.
>
> >
> > > + dev_kfree_skb_any(skb);
> > > + goto err;
> > > + }
> > > +
> > > + /* Send it on its way. Tell FEC it's ready, interrupt
> > > when done,
> > > + * it's the last BD of the frame, and to put the CRC on
> > > the end.
> > > + */
> > > +
> > > + status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR
> > > + | BD_ENET_TX_LAST | BD_ENET_TX_TC);
> >
> > The | goes at the end of the previous line, start of new line
> > adjusts to the opening brackets..
> >
>
> I've refactored it.
>
> > > +
> > > + /* Synchronize all descriptor writes */
> > > + wmb();
> > > + bdp->cbd_sc = status;
> > > +
> > > + netif_trans_update(dev);
> >
> > Is this call necessary?
>
> I've added it when I was forward porting the old driver. It can be
> removed.
>
> >
> > > + skb_tx_timestamp(skb);
> > > +
> > > + /* Trigger transmission start */
> > > + writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR);
> > > +
> > > + dev->stats.tx_bytes += skb->len;
> > > + /* If this was the last BD in the ring,
> > > + * start at the beginning again.
> > > + */
> > > + if (status & BD_ENET_TX_WRAP)
> > > + bdp = fep->tx_bd_base;
> > > + else
> > > + bdp++;
> > > +
> > > + if (bdp == fep->dirty_tx) {
> > > + fep->tx_full = 1;
> > > + netif_stop_queue(dev);
> > > + }
> > > +
> > > + fep->cur_tx = bdp;
> > > + err:
> > > + spin_unlock(&fep->hw_lock);
> > > +
> > > + return NETDEV_TX_OK;
> > > +}
>
>
> Thanks for the feedback.
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH, Managing Director: Johanna Denk,
> Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
> Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@...x.de
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Johanna Denk,
Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@...x.de
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists