[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5QegAP71PkbppOO@LQ3V64L9R2>
Date: Fri, 24 Jan 2025 15:13:04 -0800
From: Joe Damato <jdamato@...tly.com>
To: Basharath Hussain Khaja <basharath@...thit.com>
Cc: danishanwar@...com, rogerq@...nel.org, andrew+netdev@...n.ch,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, robh@...nel.org, krzk+dt@...nel.org,
conor+dt@...nel.org, nm@...com, ssantosh@...nel.org,
tony@...mide.com, richardcochran@...il.com, parvathi@...thit.com,
schnelle@...ux.ibm.com, rdunlap@...radead.org,
diogo.ivo@...mens.com, m-karicheri2@...com, horms@...nel.org,
jacob.e.keller@...el.com, m-malladi@...com,
javier.carrasco.cruz@...il.com, afd@...com, s-anna@...com,
linux-arm-kernel@...ts.infradead.org, netdev@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-omap@...r.kernel.org, pratheesh@...com, prajith@...com,
vigneshr@...com, praneeth@...com, srk@...com, rogerq@...com,
krishna@...thit.com, pmohan@...thit.com, mohan@...thit.com
Subject: Re: [RFC v2 PATCH 04/10] net: ti: prueth: Adds link detection, RX
and TX support.
On Fri, Jan 24, 2025 at 06:07:01PM +0530, Basharath Hussain Khaja wrote:
> From: Roger Quadros <rogerq@...com>
>
> Changes corresponding to link configuration such as speed and duplexity.
> IRQ and handler initializations are performed for packet reception.Firmware
> receives the packet from the wire and stores it into OCMC queue. Next, it
> notifies the CPU via interrupt. Upon receiving the interrupt CPU will
> service the IRQ and packet will be processed by pushing the newly allocated
> SKB to upper layers.
>
> When the user application want to transmit a packet, it will invoke
> sys_send() which will inturn invoke the PRUETH driver, then it will write
> the packet into OCMC queues. PRU firmware will pick up the packet and
> transmit it on to the wire.
>
> Signed-off-by: Roger Quadros <rogerq@...com>
> Signed-off-by: Andrew F. Davis <afd@...com>
> Signed-off-by: Parvathi Pudi <parvathi@...thit.com>
> Signed-off-by: Basharath Hussain Khaja <basharath@...thit.com>
> ---
> drivers/net/ethernet/ti/icssm/icssm_prueth.c | 599 ++++++++++++++++++-
> drivers/net/ethernet/ti/icssm/icssm_prueth.h | 46 ++
> 2 files changed, 640 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/icssm/icssm_prueth.c b/drivers/net/ethernet/ti/icssm/icssm_prueth.c
> index 82ed0e3a0d88..0ba1d16a7a15 100644
> --- a/drivers/net/ethernet/ti/icssm/icssm_prueth.c
> +++ b/drivers/net/ethernet/ti/icssm/icssm_prueth.c
[...]
> +/**
> + * icssm_prueth_tx_enqueue - queue a packet to firmware for transmission
> + *
> + * @emac: EMAC data structure
> + * @skb: packet data buffer
> + * @queue_id: priority queue id
> + *
> + * Return: 0 (Success)
> + */
> +static int icssm_prueth_tx_enqueue(struct prueth_emac *emac,
> + struct sk_buff *skb,
> + enum prueth_queue_id queue_id)
> +{
[...]
> +
> + /* which port to tx: MII0 or MII1 */
> + txport = emac->tx_port_queue;
> + src_addr = skb->data;
> + pktlen = skb->len;
> + /* Get the tx queue */
> + queue_desc = emac->tx_queue_descs + queue_id;
> + txqueue = &queue_infos[txport][queue_id];
> +
> + buffer_desc_count = txqueue->buffer_desc_end -
> + txqueue->buffer_desc_offset;
> + buffer_desc_count /= BD_SIZE;
> + buffer_desc_count++;
> +
> + bd_rd_ptr = readw(&queue_desc->rd_ptr);
> + bd_wr_ptr = readw(&queue_desc->wr_ptr);
> +
> + /* the PRU firmware deals mostly in pointers already
> + * offset into ram, we would like to deal in indexes
> + * within the queue we are working with for code
> + * simplicity, calculate this here
> + */
> + write_block = (bd_wr_ptr - txqueue->buffer_desc_offset) / BD_SIZE;
> + read_block = (bd_rd_ptr - txqueue->buffer_desc_offset) / BD_SIZE;
Seems like there's a lot of similar code repeated in the rx code
path.
Maybe there's a way to simplify it all with a helper of some sort?
> + if (write_block > read_block) {
> + free_blocks = buffer_desc_count - write_block;
> + free_blocks += read_block;
> + } else if (write_block < read_block) {
> + free_blocks = read_block - write_block;
> + } else { /* they are all free */
> + free_blocks = buffer_desc_count;
> + }
> +
> + pkt_block_size = DIV_ROUND_UP(pktlen, ICSS_BLOCK_SIZE);
> + if (pkt_block_size > free_blocks) /* out of queue space */
> + return -ENOBUFS;
> +
> + /* calculate end BD address post write */
> + update_block = write_block + pkt_block_size;
> +
> + /* Check for wrap around */
> + if (update_block >= buffer_desc_count) {
> + update_block %= buffer_desc_count;
> + buffer_wrapped = true;
> + }
> +
> + /* OCMC RAM is not cached and write order is not important */
> + ocmc_ram = (__force void *)emac->prueth->mem[PRUETH_MEM_OCMC].va;
> + dst_addr = ocmc_ram + txqueue->buffer_offset +
> + (write_block * ICSS_BLOCK_SIZE);
> +
> + /* Copy the data from socket buffer(DRAM) to PRU buffers(OCMC) */
> + if (buffer_wrapped) { /* wrapped around buffer */
> + int bytes = (buffer_desc_count - write_block) * ICSS_BLOCK_SIZE;
> + int remaining;
> +
> + /* bytes is integral multiple of ICSS_BLOCK_SIZE but
> + * entire packet may have fit within the last BD
> + * if pkt_info.length is not integral multiple of
> + * ICSS_BLOCK_SIZE
> + */
> + if (pktlen < bytes)
> + bytes = pktlen;
> +
> + /* copy non-wrapped part */
> + memcpy(dst_addr, src_addr, bytes);
> +
> + /* copy wrapped part */
> + src_addr += bytes;
> + remaining = pktlen - bytes;
> + dst_addr = ocmc_ram + txqueue->buffer_offset;
> + memcpy(dst_addr, src_addr, remaining);
> + } else {
> + memcpy(dst_addr, src_addr, pktlen);
> + }
> +
> + /* update first buffer descriptor */
> + wr_buf_desc = (pktlen << PRUETH_BD_LENGTH_SHIFT) &
> + PRUETH_BD_LENGTH_MASK;
> + writel(wr_buf_desc, dram + bd_wr_ptr);
> +
> + /* update the write pointer in this queue descriptor, the firmware
> + * polls for this change so this will signal the start of transmission
> + */
> + update_wr_ptr = txqueue->buffer_desc_offset + (update_block * BD_SIZE);
> + writew(update_wr_ptr, &queue_desc->wr_ptr);
> +
> + return 0;
> +}
[...]
> +
> +/* get packet from queue
> + * negative for error
> + */
The comment above seems superfluous and does not seem to follow the
format of the tx comment which appears to use kdoc style
> +int icssm_emac_rx_packet(struct prueth_emac *emac, u16 *bd_rd_ptr,
> + struct prueth_packet_info *pkt_info,
> + const struct prueth_queue_info *rxqueue)
> +{
[...]
> +
> + /* the PRU firmware deals mostly in pointers already
> + * offset into ram, we would like to deal in indexes
> + * within the queue we are working with for code
> + * simplicity, calculate this here
> + */
> + buffer_desc_count = rxqueue->buffer_desc_end -
> + rxqueue->buffer_desc_offset;
> + buffer_desc_count /= BD_SIZE;
> + buffer_desc_count++;
> + read_block = (*bd_rd_ptr - rxqueue->buffer_desc_offset) / BD_SIZE;
> + pkt_block_size = DIV_ROUND_UP(pkt_info->length, ICSS_BLOCK_SIZE);
> +
> + /* calculate end BD address post read */
> + update_block = read_block + pkt_block_size;
> +
> + /* Check for wrap around */
> + if (update_block >= buffer_desc_count) {
> + update_block %= buffer_desc_count;
> + if (update_block)
> + buffer_wrapped = true;
> + }
> +
> + /* calculate new pointer in ram */
> + *bd_rd_ptr = rxqueue->buffer_desc_offset + (update_block * BD_SIZE);
Seems like there's a lot of repeated math here and in the above
function. Maybe this can be simplified with a helper function to
avoid repeating the math in multiple places?
> +
> + /* Pkt len w/ HSR tag removed, If applicable */
> + actual_pkt_len = pkt_info->length - start_offset;
> +
> + /* Allocate a socket buffer for this packet */
> + skb = netdev_alloc_skb_ip_align(ndev, actual_pkt_len);
> + if (!skb) {
> + if (netif_msg_rx_err(emac) && net_ratelimit())
> + netdev_err(ndev, "failed rx buffer alloc\n");
> + return -ENOMEM;
> + }
> +
> + dst_addr = skb->data;
> +
> + /* OCMC RAM is not cached and read order is not important */
> + ocmc_ram = (__force void *)emac->prueth->mem[PRUETH_MEM_OCMC].va;
> +
> + /* Get the start address of the first buffer from
> + * the read buffer description
> + */
> + src_addr = ocmc_ram + rxqueue->buffer_offset +
> + (read_block * ICSS_BLOCK_SIZE);
> + src_addr += start_offset;
> +
> + /* Copy the data from PRU buffers(OCMC) to socket buffer(DRAM) */
> + if (buffer_wrapped) { /* wrapped around buffer */
> + int bytes = (buffer_desc_count - read_block) * ICSS_BLOCK_SIZE;
> + int remaining;
> + /* bytes is integral multiple of ICSS_BLOCK_SIZE but
> + * entire packet may have fit within the last BD
> + * if pkt_info.length is not integral multiple of
> + * ICSS_BLOCK_SIZE
> + */
> + if (pkt_info->length < bytes)
> + bytes = pkt_info->length;
> +
> + /* If applicable, account for the HSR tag removed */
> + bytes -= start_offset;
> +
> + /* copy non-wrapped part */
> + memcpy(dst_addr, src_addr, bytes);
> +
> + /* copy wrapped part */
> + dst_addr += bytes;
> + remaining = actual_pkt_len - bytes;
> +
> + src_addr = ocmc_ram + rxqueue->buffer_offset;
> + memcpy(dst_addr, src_addr, remaining);
> + src_addr += remaining;
> + } else {
> + memcpy(dst_addr, src_addr, actual_pkt_len);
> + src_addr += actual_pkt_len;
> + }
> +
> + if (!pkt_info->sv_frame) {
> + skb_put(skb, actual_pkt_len);
> +
> + /* send packet up the stack */
> + skb->protocol = eth_type_trans(skb, ndev);
> + local_bh_disable();
> + netif_receive_skb(skb);
> + local_bh_enable();
> + } else {
> + dev_kfree_skb_any(skb);
> + }
> +
> + /* update stats */
> + ndev->stats.rx_bytes += actual_pkt_len;
> + ndev->stats.rx_packets++;
See comment below about atomicity.
> + return 0;
> +}
> +
> +/**
> + * icssm_emac_rx_thread - EMAC Rx interrupt thread handler
> + * @irq: interrupt number
> + * @dev_id: pointer to net_device
> + *
> + * EMAC Rx Interrupt thread handler - function to process the rx frames in a
> + * irq thread function. There is only limited buffer at the ingress to
> + * queue the frames. As the frames are to be emptied as quickly as
> + * possible to avoid overflow, irq thread is necessary. Current implementation
> + * based on NAPI poll results in packet loss due to overflow at
> + * the ingress queues. Industrial use case requires loss free packet
> + * processing. Tests shows that with threaded irq based processing,
> + * no overflow happens when receiving at ~92Mbps for MTU sized frames and thus
> + * meet the requirement for industrial use case.
That's interesting. Any idea why this is the case? Is there not
enough CPU for softirq/NAPI processing to run or something? I
suppose I'd imagine that NAPI would run and if data is arriving fast
enough, the NAPI would be added to the repoll list and processed
later.
So I guess either there isn't enough CPU or the ingress queues don't
have many descriptors or something like that ?
> + *
> + * Return: interrupt handled condition
> + */
> +static irqreturn_t icssm_emac_rx_thread(int irq, void *dev_id)
> +{
> + struct net_device *ndev = (struct net_device *)dev_id;
> + struct prueth_emac *emac = netdev_priv(ndev);
> + struct prueth_queue_desc __iomem *queue_desc;
> + const struct prueth_queue_info *rxqueue;
> + struct prueth *prueth = emac->prueth;
> + struct net_device_stats *ndevstats;
> + struct prueth_packet_info pkt_info;
> + int start_queue, end_queue;
> + void __iomem *shared_ram;
> + u16 bd_rd_ptr, bd_wr_ptr;
> + u16 update_rd_ptr;
> + u8 overflow_cnt;
> + u32 rd_buf_desc;
> + int used = 0;
> + int i, ret;
> +
> + ndevstats = &emac->ndev->stats;
FWIW the docs in include/linux/netdevice.h say:
/**
...
* @stats: Statistics struct, which was left as a legacy, use
* rtnl_link_stats64 instead
...
*/
struct net_device {
...
struct net_device_stats stats; /* not used by modern drivers */
...
};
perhaps consider using rtnl_link_stats64 as suggested instead?
> + shared_ram = emac->prueth->mem[PRUETH_MEM_SHARED_RAM].va;
> +
> + start_queue = emac->rx_queue_start;
> + end_queue = emac->rx_queue_end;
> +retry:
> + /* search host queues for packets */
> + for (i = start_queue; i <= end_queue; i++) {
> + queue_desc = emac->rx_queue_descs + i;
> + rxqueue = &queue_infos[PRUETH_PORT_HOST][i];
> +
> + overflow_cnt = readb(&queue_desc->overflow_cnt);
> + if (overflow_cnt > 0) {
> + emac->ndev->stats.rx_over_errors += overflow_cnt;
> + /* reset to zero */
> + writeb(0, &queue_desc->overflow_cnt);
> + }
> +
> + bd_rd_ptr = readw(&queue_desc->rd_ptr);
> + bd_wr_ptr = readw(&queue_desc->wr_ptr);
> +
> + /* while packets are available in this queue */
> + while (bd_rd_ptr != bd_wr_ptr) {
> + /* get packet info from the read buffer descriptor */
> + rd_buf_desc = readl(shared_ram + bd_rd_ptr);
> + icssm_parse_packet_info(prueth, rd_buf_desc, &pkt_info);
> +
> + if (pkt_info.length <= 0) {
> + /* a packet length of zero will cause us to
> + * never move the read pointer ahead, locking
> + * the driver, so we manually have to move it
> + * to the write pointer, discarding all
> + * remaining packets in this queue. This should
> + * never happen.
> + */
> + update_rd_ptr = bd_wr_ptr;
> + ndevstats->rx_length_errors++;
See question below.
> + } else if (pkt_info.length > EMAC_MAX_FRM_SUPPORT) {
> + /* if the packet is too large we skip it but we
> + * still need to move the read pointer ahead
> + * and assume something is wrong with the read
> + * pointer as the firmware should be filtering
> + * these packets
> + */
> + update_rd_ptr = bd_wr_ptr;
> + ndevstats->rx_length_errors++;
in netdevice.h it says:
* struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
* Called when a user wants to get the network device usage
* statistics. Drivers must do one of the following:
* 1. Define @ndo_get_stats64 to fill in a zero-initialised
* rtnl_link_stats64 structure passed by the caller.
* 2. Define @ndo_get_stats to update a net_device_stats structure
* (which should normally be dev->stats) and return a pointer to
* it. The structure may be changed asynchronously only if each
* field is written atomically.
* 3. Update dev->stats asynchronously and atomically, and define
* neither operation.
I didn't look in the other patches to see if ndo_get_stats is
defined or not, but are these increments atomic?
Powered by blists - more mailing lists