netdev - Re: [net-next,v8] net: ethernet: rtsn: Add support for Renesas Ethernet-TSN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <adbe998e-d410-4797-b8a7-24d45ca7a1bc@bp.renesas.com>
Date: Mon, 17 Jun 2024 14:41:52 +0100
From: Paul Barker <paul.barker.ct@...renesas.com>
To: Niklas Söderlund <niklas.soderlund+renesas@...natech.se>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
 linux-renesas-soc@...r.kernel.org
Subject: Re: [net-next,v8] net: ethernet: rtsn: Add support for Renesas
 Ethernet-TSN

On 13/06/2024 13:49, Niklas Söderlund wrote:
> Hi Paul,
> 
> Thanks for your feedback.
> 
> On 2024-06-13 13:29:30 +0100, Paul Barker wrote:
>> On 13/06/2024 11:41, Niklas Söderlund wrote:
>>> Add initial support for Renesas Ethernet-TSN End-station device of R-Car
>>> V4H. The Ethernet End-station can connect to an Ethernet network using a
>>> 10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII.
>>> Depending on the connected PHY.
>>>
>>> The driver supports Rx checksum and offload and hardware timestamps.
>>>
>>> While full power management and suspend/resume is not yet supported the
>>> driver enables runtime PM in order to enable the module clock. While
>>> explicit clock management using clk_enable() would suffice for the
>>> supported SoC, the module could be reused on SoCs where the module is
>>> part of a power domain.
>>>
>>> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@...natech.se>
>>> ---
>>> * Changes since v7
>>> - Properly handle Rx when netpoll is called with 0 budget. Previously
>>>    one Rx packet would be processed in this case.
>>> - Use napi_alloc_skb() instead of netdev_alloc_skb() in Rx.
>>> - Use device implemented 64 bit stats instead of using stats member in
>>>    struct net_device.
>>> - Implement ndo_hwtstamp_{g,s}et instead of acting on SIOC{G,S}HWTSTAMP
>>>    in do_ioctl().
>>> - Fix incorrect error sign when using PTR_ERR().
>>> - Add missing cosmetic space when creating struct of_device_id.
>>> - Rename label error_alloc to error_free.
>>>
>>> * Changes since v6
>>> - Fix warning added when removing delays depending on phy-mode logic.
>>>
>>> * Changes since v5
>>> - Removed delays depending on phy-mode logic. This is only needed on
>>>    some SoCs (V4H multiple boards), and currently we can't test on them.
>>>    As this have been a hot topic in review drop it for now so we can
>>>    support V4H single boards which we can test and lets work on the delay
>>>    on-top when we can test it.
>>>
>>> * Changes since v4
>>> - Enable GPOUT_RDM and GPOUT_TDM delays depending on phy-mode.
>>>
>>> * Changes since v3
>>> - Add description to commit message why PM operations are used instead
>>>    of explicit management of the clock.
>>>
>>> * Changes since v2
>>> - Drop extra call to ether_setup(), already called by
>>>    alloc_etherdev_mqs().
>>> - Remove dependency on MII.
>>> - Improve error paths when requestion IRQs.
>>> - Correct the interpretation of which  phy-mode to apply delays for, and
>>>    mask the phy-mode passed to the PHY if MAC adds delays.
>>>
>>> * Changes since v1
>>> - Remove C45 read/write functions, the phy-core can do that for us.
>>> - Rework the driver to register the MDIO bus at probe time instead of at
>>>    ndo_open().
>>> - Rework rtsn_rx() to take advantages of the improved design proposed
>>>    and upstreamed for R-Car RAVB driver Rx code-path.
>>> - Use napi_complete_done() instead of napi_complete().
>>> - Update commit message to list that the driver supports Rx CSUM
>>>    offload.
>>> - Drop unneeded __iowmb() left from development as well as a stray ;.
>>> - Consider all RGMII modes.
>>> - Move the phy_stop() call to mirror where phy_start() is called.
>>> - Forward IOCTLS from rtsn_do_ioctl() to PHY using ndo_eth_ioctl()
>>> - Align variable names to match core for IOCTLS functions.
>>> - Make sure DMA mask and PTP is registered before registering the ndev.
>>> - Document why the RTSN driver needs to be able to apply delays
>>> - Add checks for which phy-modes the MAC driver can apply delays
>>> - Use snprintf() instead of sprintf()
>>>
>>> * Changes since RFC
>>> - Fix issues in MDIO communication.
>>> - Use a dedicated OF node for the MDIO bus.
>>> ---
>>>   MAINTAINERS                           |    8 +
>>>   drivers/net/ethernet/renesas/Kconfig  |   10 +
>>>   drivers/net/ethernet/renesas/Makefile |    2 +
>>>   drivers/net/ethernet/renesas/rtsn.c   | 1381 +++++++++++++++++++++++++
>>>   drivers/net/ethernet/renesas/rtsn.h   |  464 +++++++++
>>>   5 files changed, 1865 insertions(+)
>>>   create mode 100644 drivers/net/ethernet/renesas/rtsn.c
>>>   create mode 100644 drivers/net/ethernet/renesas/rtsn.h
>>
>> [snip]
>>
>>> +static int rtsn_rx(struct net_device *ndev, int budget)
>>> +{
>>> +	struct rtsn_private *priv = netdev_priv(ndev);
>>> +	unsigned int ndescriptors;
>>> +	unsigned int rx_packets;
>>> +	unsigned int i;
>>> +	bool get_ts;
>>> +
>>> +	get_ts = priv->ptp_priv->tstamp_rx_ctrl &
>>> +		RCAR_GEN4_RXTSTAMP_TYPE_V2_L2_EVENT;
>>> +
>>> +	ndescriptors = priv->dirty_rx + priv->num_rx_ring - priv->cur_rx;
>>> +	rx_packets = 0;
>>> +	for (i = 0; i < ndescriptors; i++) {
>>> +		const unsigned int entry = priv->cur_rx % priv->num_rx_ring;
>>> +		struct rtsn_ext_ts_desc *desc = &priv->rx_ring[entry];
>>> +		struct sk_buff *skb;
>>> +		dma_addr_t dma_addr;
>>> +		u16 pkt_len;
>>> +
>>> +		/* Stop processing descriptors if budget is consumed. */
>>> +		if (rx_packets >= budget)
>>> +			break;
>>> +
>>> +		/* Stop processing descriptors on first empty. */
>>> +		if ((desc->die_dt & DT_MASK) == DT_FEMPTY)
>>> +			break;
>>> +
>>> +		dma_rmb();
>>> +		pkt_len = le16_to_cpu(desc->info_ds) & RX_DS;
>>> +
>>> +		skb = priv->rx_skb[entry];
>>> +		priv->rx_skb[entry] = NULL;
>>> +		dma_addr = le32_to_cpu(desc->dptr);
>>> +		dma_unmap_single(ndev->dev.parent, dma_addr, PKT_BUF_SZ,
>>> +				 DMA_FROM_DEVICE);
>>> +
>>> +		/* Get timestamp if enabled. */
>>> +		if (get_ts) {
>>> +			struct skb_shared_hwtstamps *shhwtstamps;
>>> +			struct timespec64 ts;
>>> +
>>> +			shhwtstamps = skb_hwtstamps(skb);
>>> +			memset(shhwtstamps, 0, sizeof(*shhwtstamps));
>>> +
>>> +			ts.tv_sec = (u64)le32_to_cpu(desc->ts_sec);
>>> +			ts.tv_nsec = le32_to_cpu(desc->ts_nsec & cpu_to_le32(0x3fffffff));
>>> +
>>> +			shhwtstamps->hwtstamp = timespec64_to_ktime(ts);
>>> +		}
>>> +
>>> +		skb_put(skb, pkt_len);
>>> +		skb->protocol = eth_type_trans(skb, ndev);
>>> +		netif_receive_skb(skb);
>>> +
>>> +		/* Update statistics. */
>>> +		priv->stats.rx_packets++;
>>
>> I think it's better to do `priv->stats.rx_packets += rx_packets` once at
>> the end of this function instead of repeatedly incrementing another
>> variable on the hot path.
> 
> We could do that, but I opted for this as it makes more sens to update 
> rx_packets and rx_bytes in the same location IMHO.
> 

Ok, I have no strong opinion here.

>>
>>> +		priv->stats.rx_bytes += pkt_len;
>>> +
>>> +		/* Update counters. */
>>> +		priv->cur_rx++;
>>> +		rx_packets++;
>>> +	}
>>> +
>>> +	/* Refill the RX ring buffers */
>>> +	for (; priv->cur_rx - priv->dirty_rx > 0; priv->dirty_rx++) {
>>> +		const unsigned int entry = priv->dirty_rx % priv->num_rx_ring;
>>> +		struct rtsn_ext_ts_desc *desc = &priv->rx_ring[entry];
>>> +		struct sk_buff *skb;
>>> +		dma_addr_t dma_addr;
>>> +
>>> +		desc->info_ds = cpu_to_le16(PKT_BUF_SZ);
>>> +
>>> +		if (!priv->rx_skb[entry]) {
>>> +			skb = napi_alloc_skb(&priv->napi,
>>> +					     PKT_BUF_SZ + RTSN_ALIGN - 1);
>>> +			if (!skb)
>>> +				break;
>>> +			skb_reserve(skb, NET_IP_ALIGN);
>>> +			dma_addr = dma_map_single(ndev->dev.parent, skb->data,
>>> +						  le16_to_cpu(desc->info_ds),
>>> +						  DMA_FROM_DEVICE);
>>> +			if (dma_mapping_error(ndev->dev.parent, dma_addr))
>>> +				desc->info_ds = cpu_to_le16(0);
>>> +			desc->dptr = cpu_to_le32(dma_addr);
>>> +			skb_checksum_none_assert(skb);
>>> +			priv->rx_skb[entry] = skb;
>>> +		}
>>> +
>>> +		dma_wmb();
>>> +		desc->die_dt = DT_FEMPTY | D_DIE;
>>> +	}
>>> +
>>> +	priv->rx_ring[priv->num_rx_ring].die_dt = DT_LINK;
>>> +
>>> +	return rx_packets;
>>> +}
>>
>> [snip]
>>
>>> +static netdev_tx_t rtsn_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>> +{
>>> +	struct rtsn_private *priv = netdev_priv(ndev);
>>> +	struct rtsn_ext_desc *desc;
>>> +	int ret = NETDEV_TX_OK;
>>> +	unsigned long flags;
>>> +	dma_addr_t dma_addr;
>>> +	int entry;
>>> +
>>> +	spin_lock_irqsave(&priv->lock, flags);
>>> +
>>> +	if (priv->cur_tx - priv->dirty_tx > priv->num_tx_ring) {
>>> +		netif_stop_subqueue(ndev, 0);
>>> +		ret = NETDEV_TX_BUSY;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (skb_put_padto(skb, ETH_ZLEN))
>>> +		goto out;
>>> +
>>> +	dma_addr = dma_map_single(ndev->dev.parent, skb->data, skb->len,
>>> +				  DMA_TO_DEVICE);
>>> +	if (dma_mapping_error(ndev->dev.parent, dma_addr)) {
>>> +		dev_kfree_skb_any(skb);
>>> +		goto out;
>>> +	}
>>> +
>>> +	entry = priv->cur_tx % priv->num_tx_ring;
>>> +	priv->tx_skb[entry] = skb;
>>> +	desc = &priv->tx_ring[entry];
>>> +	desc->dptr = cpu_to_le32(dma_addr);
>>> +	desc->info_ds = cpu_to_le16(skb->len);
>>
>> Should we check against the maximum TX frame size supported by the
>> hardware here?
>>
>> Whatever we do here, we should also do in the ravb driver as that makes
>> a similar cpu_to_le16() call to fill the DS field with no check that the
>> HW actually supports transmitting a frame of the given size.
> 
> Compared to RAVB the RTSN driver do not support splitting a packet over 
> multiple descriptors, so the max frame size adhering to the MTU will 
> always fit using a single descriptor.
> 

My concern here is with pathological or malicious packets. For example,
you can use stacked VLANS (QinQ, QinQinQ, etc) to expand the size of the
TX frame for a given MTU since the bytes used by the extra VLAN tags are
not counted as payload bytes.

At least with the RZ/G2L family, no verification has been performed on
sending packets larger than 1526 bytes to my knowledge. Even using only
one TX descriptor, I was able to completely lock up the GbEth IP by
pushing a 2kB frame into the TX queue. So I do think it is worth adding
some checks here.

Thanks,

-- 
Paul Barker
Download attachment "OpenPGP_0x27F4B3459F002257.asc" of type "application/pgp-keys" (3521 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)