linux-kernel - Re: [net-next v13 05/11] net: mtip: Add net_device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250624233342.5fdb37af@wsk>
Date: Tue, 24 Jun 2025 23:33:42 +0200
From: Lukasz Majewski <lukma@...x.de>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, davem@...emloft.net, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Rob Herring
 <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
 <conor+dt@...nel.org>, Shawn Guo <shawnguo@...nel.org>, Sascha Hauer
 <s.hauer@...gutronix.de>, Pengutronix Kernel Team <kernel@...gutronix.de>,
 Fabio Estevam <festevam@...il.com>, Richard Cochran
 <richardcochran@...il.com>, netdev@...r.kernel.org,
 devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
 imx@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org, Stefan Wahren
 <wahrenst@....net>, Simon Horman <horms@...nel.org>
Subject: Re: [net-next v13 05/11] net: mtip: Add net_device_ops functions to
 the L2 switch driver

Hi Paolo,

> On 6/22/25 11:37 AM, Lukasz Majewski wrote:
> > This patch provides callbacks for struct net_device_ops for MTIP
> > L2 switch.
> > 
> > Signed-off-by: Lukasz Majewski <lukma@...x.de>
> > 
> > ---
> > Changes for v13:
> > - New patch - created by excluding some code from large (i.e. v12
> > and earlier) MTIP driver
> > ---
> >  .../net/ethernet/freescale/mtipsw/mtipl2sw.c  | 273
> > ++++++++++++++++++ 1 file changed, 273 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c index
> > 5142f647d939..813cd39d6d56 100644 ---
> > a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c +++
> > b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c @@ -43,6 +43,15
> > @@ 
> >  #include "mtipl2sw.h"
> >  
> > +static void swap_buffer(void *bufaddr, int len)
> > +{
> > +	int i;
> > +	unsigned int *buf = bufaddr;
> > +
> > +	for (i = 0; i < len; i += 4, buf++)
> > +		swab32s(buf);
> > +}
> > +
> >  /* Set the last buffer to wrap */
> >  static void mtip_set_last_buf_to_wrap(struct cbd_t *bdp)
> >  {
> > @@ -444,6 +453,128 @@ static void mtip_config_switch(struct
> > switch_enet_private *fep) fep->hwp + ESW_IMR);
> >  }
> >  
> > +static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
> > +					struct net_device *dev,
> > int port) +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +	unsigned short status;
> > +	struct cbd_t *bdp;
> > +	void *bufaddr;
> > +
> > +	spin_lock_bh(&fep->hw_lock);
> > +
> > +	if (!fep->link[0] && !fep->link[1]) {
> > +		/* Link is down or autonegotiation is in progress.
> > */
> > +		netif_stop_queue(dev);
> > +		spin_unlock_bh(&fep->hw_lock);
> > +		return NETDEV_TX_BUSY;
> > +	}
> > +
> > +	/* Fill in a Tx ring entry */
> > +	bdp = fep->cur_tx;
> > +
> > +	status = bdp->cbd_sc;
> > +
> > +	if (status & BD_ENET_TX_READY) {
> > +		/* All transmit buffers are full. Bail out.
> > +		 * This should not happen, since dev->tbusy should
> > be set.
> > +		 */
> > +		netif_stop_queue(dev);
> > +		dev_err(&fep->pdev->dev, "%s: tx queue full!.\n",
> > dev->name);
> > +		spin_unlock_bh(&fep->hw_lock);
> > +		return NETDEV_TX_BUSY;
> > +	}
> > +
> > +	/* Clear all of the status flags */
> > +	status &= ~BD_ENET_TX_STATS;
> > +
> > +	/* Set buffer length and buffer pointer */
> > +	bufaddr = skb->data;
> > +	bdp->cbd_datlen = skb->len;
> > +
> > +	/* On some FEC implementations data must be aligned on
> > +	 * 4-byte boundaries. Use bounce buffers to copy data
> > +	 * and get it aligned.spin
> > +	 */
> > +	if ((unsigned long)bufaddr & MTIP_ALIGNMENT) {
> > +		unsigned int index;
> > +
> > +		index = bdp - fep->tx_bd_base;
> > +		memcpy(fep->tx_bounce[index],
> > +		       (void *)skb->data, skb->len);
> > +		bufaddr = fep->tx_bounce[index];
> > +	}
> > +
> > +	if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > +		swap_buffer(bufaddr, skb->len);
> > +
> > +	/* Save skb pointer. */
> > +	fep->tx_skbuff[fep->skb_cur] = skb;
> > +
> > +	fep->skb_cur = (fep->skb_cur + 1) & TX_RING_MOD_MASK;
> > +
> > +	/* Push the data cache so the CPM does not get stale memory
> > +	 * data.
> > +	 */
> > +	bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev, bufaddr,
> > +					  MTIP_SWITCH_TX_FRSIZE,
> > +					  DMA_TO_DEVICE);
> > +	if (unlikely(dma_mapping_error(&fep->pdev->dev,
> > bdp->cbd_bufaddr))) {
> > +		dev_err(&fep->pdev->dev,
> > +			"Failed to map descriptor tx buffer\n");
> > +		dev->stats.tx_errors++;
> > +		dev->stats.tx_dropped++;
> > +		dev_kfree_skb_any(skb);
> > +		goto err;
> > +	}
> > +
> > +	/* Send it on its way.  Tell FEC it's ready, interrupt
> > when done,
> > +	 * it's the last BD of the frame, and to put the CRC on
> > the end.
> > +	 */
> > +
> > +	status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR
> > +			| BD_ENET_TX_LAST | BD_ENET_TX_TC);
> > +
> > +	/* Synchronize all descriptor writes */
> > +	wmb();
> > +	bdp->cbd_sc = status;
> > +
> > +	netif_trans_update(dev);
> > +	skb_tx_timestamp(skb);
> > +
> > +	/* Trigger transmission start */
> > +	writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR);
> > +
> > +	dev->stats.tx_bytes += skb->len;
> > +	/* If this was the last BD in the ring,
> > +	 * start at the beginning again.
> > +	 */
> > +	if (status & BD_ENET_TX_WRAP)
> > +		bdp = fep->tx_bd_base;
> > +	else
> > +		bdp++;
> > +
> > +	if (bdp == fep->dirty_tx) {
> > +		fep->tx_full = 1;
> > +		netif_stop_queue(dev);  
> 
> You may want to stop the queue earlier, i.e. when 75% or the like of
> the tx ring is full. Also you can use netif_txq_maybe_stop() - with
> txq == netdev_get_tx_queue(dev, 0)

There are two main reasons why the netif queue management is so rugged:

1. Due to simplicity - this driver is not using txq (queues), so I
cannot use APIs using as input argument queues. That is why functions
accepting only struct netdev pointer are used.

2. My feeling is that I would need to use queues abstraction only for
one queue - so this would be extra code overhead. I'm trying to
upstream driver which in fact has very simple internals (i.e. ringbuf
with 16 descriptors for tx/rx).

> 
> [...]
> > +static void mtip_timeout(struct net_device *dev, unsigned int
> > txqueue) +{
> > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > +	struct switch_enet_private *fep = priv->fep;
> > +	struct cbd_t *bdp;
> > +	int i;
> > +
> > +	dev->stats.tx_errors++;
> > +
> > +	if (IS_ENABLED(CONFIG_SWITCH_DEBUG)) {
> > +		dev_info(&dev->dev, "%s: transmit timed out.\n",
> > dev->name);
> > +		dev_info(&dev->dev,
> > +			 "Ring data: cur_tx %lx%s, dirty_tx %lx
> > cur_rx: %lx\n",
> > +			 (unsigned long)fep->cur_tx,
> > +			 fep->tx_full ? " (full)" : "",
> > +			 (unsigned long)fep->dirty_tx,
> > +			 (unsigned long)fep->cur_rx);
> > +
> > +		bdp = fep->tx_bd_base;
> > +		dev_info(&dev->dev, " tx: %u buffers\n",
> > TX_RING_SIZE);
> > +		for (i = 0; i < TX_RING_SIZE; i++) {
> > +			dev_info(&dev->dev, "  %08lx: %04x %04x
> > %08x\n",
> > +				 (kernel_ulong_t)bdp, bdp->cbd_sc,
> > +				 bdp->cbd_datlen,
> > (int)bdp->cbd_bufaddr);
> > +			bdp++;
> > +		}
> > +
> > +		bdp = fep->rx_bd_base;
> > +		dev_info(&dev->dev, " rx: %lu buffers\n",
> > +			 (unsigned long)RX_RING_SIZE);
> > +		for (i = 0 ; i < RX_RING_SIZE; i++) {
> > +			dev_info(&dev->dev, "  %08lx: %04x %04x
> > %08x\n",
> > +				 (kernel_ulong_t)bdp,
> > +				 bdp->cbd_sc, bdp->cbd_datlen,
> > +				 (int)bdp->cbd_bufaddr);
> > +			bdp++;
> > +		}  
> 
> Here you are traversing both rings without any lock, which looks race
> prone.

I will add  spin_{un}lock_bh(&fep->hw_lock); (this is only code used
for debugging, not production)

> 
> /P
> 




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@...x.de

Content of type "application/pgp-signature" skipped