lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1925493932.450322.1738129268925.JavaMail.zimbra@couthit.local>
Date: Wed, 29 Jan 2025 11:11:08 +0530 (IST)
From: Basharath Hussain Khaja <basharath@...thit.com>
To: Joe Damato <jdamato@...tly.com>
Cc: basharath <basharath@...thit.com>, danishanwar <danishanwar@...com>, 
	rogerq <rogerq@...nel.org>, andrew+netdev <andrew+netdev@...n.ch>, 
	davem <davem@...emloft.net>, edumazet <edumazet@...gle.com>, 
	kuba <kuba@...nel.org>, pabeni <pabeni@...hat.com>, 
	Rob Herring <robh@...nel.org>, krzk+dt <krzk+dt@...nel.org>, 
	conor+dt <conor+dt@...nel.org>, nm <nm@...com>, 
	ssantosh <ssantosh@...nel.org>, tony <tony@...mide.com>, 
	richardcochran <richardcochran@...il.com>, 
	parvathi <parvathi@...thit.com>, schnelle <schnelle@...ux.ibm.com>, 
	rdunlap <rdunlap@...radead.org>, diogo ivo <diogo.ivo@...mens.com>, 
	m-karicheri2 <m-karicheri2@...com>, horms <horms@...nel.org>, 
	jacob e keller <jacob.e.keller@...el.com>, 
	m-malladi <m-malladi@...com>, 
	javier carrasco cruz <javier.carrasco.cruz@...il.com>, 
	afd <afd@...com>, s-anna <s-anna@...com>, 
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, 
	netdev <netdev@...r.kernel.org>, 
	devicetree <devicetree@...r.kernel.org>, 
	linux-kernel <linux-kernel@...r.kernel.org>, 
	linux-omap <linux-omap@...r.kernel.org>, 
	pratheesh <pratheesh@...com>, prajith <prajith@...com>, 
	vigneshr <vigneshr@...com>, praneeth <praneeth@...com>, 
	srk <srk@...com>, rogerq <rogerq@...com>, 
	krishna <krishna@...thit.com>, pmohan <pmohan@...thit.com>, 
	mohan <mohan@...thit.com>
Subject: Re: [RFC v2 PATCH 04/10] net: ti: prueth: Adds link detection, RX
 and TX support.

> On Fri, Jan 24, 2025 at 06:07:01PM +0530, Basharath Hussain Khaja wrote:
>> From: Roger Quadros <rogerq@...com>
>> 
>> Changes corresponding to link configuration such as speed and duplexity.
>> IRQ and handler initializations are performed for packet reception.Firmware
>> receives the packet from the wire and stores it into OCMC queue. Next, it
>> notifies the CPU via interrupt. Upon receiving the interrupt CPU will
>> service the IRQ and packet will be processed by pushing the newly allocated
>> SKB to upper layers.
>> 
>> When the user application want to transmit a packet, it will invoke
>> sys_send() which will inturn invoke the PRUETH driver, then it will write
>> the packet into OCMC queues. PRU firmware will pick up the packet and
>> transmit it on to the wire.
>> 
>> Signed-off-by: Roger Quadros <rogerq@...com>
>> Signed-off-by: Andrew F. Davis <afd@...com>
>> Signed-off-by: Parvathi Pudi <parvathi@...thit.com>
>> Signed-off-by: Basharath Hussain Khaja <basharath@...thit.com>
>> ---
>>  drivers/net/ethernet/ti/icssm/icssm_prueth.c | 599 ++++++++++++++++++-
>>  drivers/net/ethernet/ti/icssm/icssm_prueth.h |  46 ++
>>  2 files changed, 640 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/ti/icssm/icssm_prueth.c
>> b/drivers/net/ethernet/ti/icssm/icssm_prueth.c
>> index 82ed0e3a0d88..0ba1d16a7a15 100644
>> --- a/drivers/net/ethernet/ti/icssm/icssm_prueth.c
>> +++ b/drivers/net/ethernet/ti/icssm/icssm_prueth.c
> 
> [...]
> 
>> +/**
>> + * icssm_prueth_tx_enqueue - queue a packet to firmware for transmission
>> + *
>> + * @emac: EMAC data structure
>> + * @skb: packet data buffer
>> + * @queue_id: priority queue id
>> + *
>> + * Return: 0 (Success)
>> + */
>> +static int icssm_prueth_tx_enqueue(struct prueth_emac *emac,
>> +				   struct sk_buff *skb,
>> +				   enum prueth_queue_id queue_id)
>> +{
> 
> [...]
> 
>> +
>> +	/* which port to tx: MII0 or MII1 */
>> +	txport = emac->tx_port_queue;
>> +	src_addr = skb->data;
>> +	pktlen = skb->len;
>> +	/* Get the tx queue */
>> +	queue_desc = emac->tx_queue_descs + queue_id;
>> +	txqueue = &queue_infos[txport][queue_id];
>> +
>> +	buffer_desc_count = txqueue->buffer_desc_end -
>> +			    txqueue->buffer_desc_offset;
>> +	buffer_desc_count /= BD_SIZE;
>> +	buffer_desc_count++;
>> +
>> +	bd_rd_ptr = readw(&queue_desc->rd_ptr);
>> +	bd_wr_ptr = readw(&queue_desc->wr_ptr);
>> +
>> +	/* the PRU firmware deals mostly in pointers already
>> +	 * offset into ram, we would like to deal in indexes
>> +	 * within the queue we are working with for code
>> +	 * simplicity, calculate this here
>> +	 */
>> +	write_block = (bd_wr_ptr - txqueue->buffer_desc_offset) / BD_SIZE;
>> +	read_block = (bd_rd_ptr - txqueue->buffer_desc_offset) / BD_SIZE;
> 
> Seems like there's a lot of similar code repeated in the rx code
> path.
> 
> Maybe there's a way to simplify it all with a helper of some sort?
> 

We will plan to re-look into common code (more than twice) and create a helper
function and use it.

>> +	if (write_block > read_block) {
>> +		free_blocks = buffer_desc_count - write_block;
>> +		free_blocks += read_block;
>> +	} else if (write_block < read_block) {
>> +		free_blocks = read_block - write_block;
>> +	} else { /* they are all free */
>> +		free_blocks = buffer_desc_count;
>> +	}
>> +
>> +	pkt_block_size = DIV_ROUND_UP(pktlen, ICSS_BLOCK_SIZE);
>> +	if (pkt_block_size > free_blocks) /* out of queue space */
>> +		return -ENOBUFS;
>> +
>> +	/* calculate end BD address post write */
>> +	update_block = write_block + pkt_block_size;
>> +
>> +	/* Check for wrap around */
>> +	if (update_block >= buffer_desc_count) {
>> +		update_block %= buffer_desc_count;
>> +		buffer_wrapped = true;
>> +	}
>> +
>> +	/* OCMC RAM is not cached and write order is not important */
>> +	ocmc_ram = (__force void *)emac->prueth->mem[PRUETH_MEM_OCMC].va;
>> +	dst_addr = ocmc_ram + txqueue->buffer_offset +
>> +		   (write_block * ICSS_BLOCK_SIZE);
>> +
>> +	/* Copy the data from socket buffer(DRAM) to PRU buffers(OCMC) */
>> +	if (buffer_wrapped) { /* wrapped around buffer */
>> +		int bytes = (buffer_desc_count - write_block) * ICSS_BLOCK_SIZE;
>> +		int remaining;
>> +
>> +		/* bytes is integral multiple of ICSS_BLOCK_SIZE but
>> +		 * entire packet may have fit within the last BD
>> +		 * if pkt_info.length is not integral multiple of
>> +		 * ICSS_BLOCK_SIZE
>> +		 */
>> +		if (pktlen < bytes)
>> +			bytes = pktlen;
>> +
>> +		/* copy non-wrapped part */
>> +		memcpy(dst_addr, src_addr, bytes);
>> +
>> +		/* copy wrapped part */
>> +		src_addr += bytes;
>> +		remaining = pktlen - bytes;
>> +		dst_addr = ocmc_ram + txqueue->buffer_offset;
>> +		memcpy(dst_addr, src_addr, remaining);
>> +	} else {
>> +		memcpy(dst_addr, src_addr, pktlen);
>> +	}
>> +
>> +       /* update first buffer descriptor */
>> +	wr_buf_desc = (pktlen << PRUETH_BD_LENGTH_SHIFT) &
>> +		       PRUETH_BD_LENGTH_MASK;
>> +	writel(wr_buf_desc, dram + bd_wr_ptr);
>> +
>> +	/* update the write pointer in this queue descriptor, the firmware
>> +	 * polls for this change so this will signal the start of transmission
>> +	 */
>> +	update_wr_ptr = txqueue->buffer_desc_offset + (update_block * BD_SIZE);
>> +	writew(update_wr_ptr, &queue_desc->wr_ptr);
>> +
>> +	return 0;
>> +}
> 
> [...]
> 
>> +
>> +/* get packet from queue
>> + * negative for error
>> + */
> 
> The comment above seems superfluous and does not seem to follow the
> format of the tx comment which appears to use kdoc style
> 

We will address in the next version.

>> +int icssm_emac_rx_packet(struct prueth_emac *emac, u16 *bd_rd_ptr,
>> +			 struct prueth_packet_info *pkt_info,
>> +			 const struct prueth_queue_info *rxqueue)
>> +{
> 
> [...]
> 
>> +
>> +	/* the PRU firmware deals mostly in pointers already
>> +	 * offset into ram, we would like to deal in indexes
>> +	 * within the queue we are working with for code
>> +	 * simplicity, calculate this here
>> +	 */
>> +	buffer_desc_count = rxqueue->buffer_desc_end -
>> +			    rxqueue->buffer_desc_offset;
>> +	buffer_desc_count /= BD_SIZE;
>> +	buffer_desc_count++;
>> +	read_block = (*bd_rd_ptr - rxqueue->buffer_desc_offset) / BD_SIZE;
>> +	pkt_block_size = DIV_ROUND_UP(pkt_info->length, ICSS_BLOCK_SIZE);
>> +
>> +	/* calculate end BD address post read */
>> +	update_block = read_block + pkt_block_size;
>> +
>> +	/* Check for wrap around */
>> +	if (update_block >= buffer_desc_count) {
>> +		update_block %= buffer_desc_count;
>> +		if (update_block)
>> +			buffer_wrapped = true;
>> +	}
>> +
>> +	/* calculate new pointer in ram */
>> +	*bd_rd_ptr = rxqueue->buffer_desc_offset + (update_block * BD_SIZE);
> 
> Seems like there's a lot of repeated math here and in the above
> function. Maybe this can be simplified with a helper function to
> avoid repeating the math in multiple places?
> 

We will plan to re-look into common code (more than twice) and create a helper
function and use it.

>> +
>> +	/* Pkt len w/ HSR tag removed, If applicable */
>> +	actual_pkt_len = pkt_info->length - start_offset;
>> +
>> +	/* Allocate a socket buffer for this packet */
>> +	skb = netdev_alloc_skb_ip_align(ndev, actual_pkt_len);
>> +	if (!skb) {
>> +		if (netif_msg_rx_err(emac) && net_ratelimit())
>> +			netdev_err(ndev, "failed rx buffer alloc\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	dst_addr = skb->data;
>> +
>> +	/* OCMC RAM is not cached and read order is not important */
>> +	ocmc_ram = (__force void *)emac->prueth->mem[PRUETH_MEM_OCMC].va;
>> +
>> +	/* Get the start address of the first buffer from
>> +	 * the read buffer description
>> +	 */
>> +	src_addr = ocmc_ram + rxqueue->buffer_offset +
>> +		   (read_block * ICSS_BLOCK_SIZE);
>> +	src_addr += start_offset;
>> +
>> +	/* Copy the data from PRU buffers(OCMC) to socket buffer(DRAM) */
>> +	if (buffer_wrapped) { /* wrapped around buffer */
>> +		int bytes = (buffer_desc_count - read_block) * ICSS_BLOCK_SIZE;
>> +		int remaining;
>> +		/* bytes is integral multiple of ICSS_BLOCK_SIZE but
>> +		 * entire packet may have fit within the last BD
>> +		 * if pkt_info.length is not integral multiple of
>> +		 * ICSS_BLOCK_SIZE
>> +		 */
>> +		if (pkt_info->length < bytes)
>> +			bytes = pkt_info->length;
>> +
>> +		/* If applicable, account for the HSR tag removed */
>> +		bytes -= start_offset;
>> +
>> +		/* copy non-wrapped part */
>> +		memcpy(dst_addr, src_addr, bytes);
>> +
>> +		/* copy wrapped part */
>> +		dst_addr += bytes;
>> +		remaining = actual_pkt_len - bytes;
>> +
>> +		src_addr = ocmc_ram + rxqueue->buffer_offset;
>> +		memcpy(dst_addr, src_addr, remaining);
>> +		src_addr += remaining;
>> +	} else {
>> +		memcpy(dst_addr, src_addr, actual_pkt_len);
>> +		src_addr += actual_pkt_len;
>> +	}
>> +
>> +	if (!pkt_info->sv_frame) {
>> +		skb_put(skb, actual_pkt_len);
>> +
>> +		/* send packet up the stack */
>> +		skb->protocol = eth_type_trans(skb, ndev);
>> +		local_bh_disable();
>> +		netif_receive_skb(skb);
>> +		local_bh_enable();
>> +	} else {
>> +		dev_kfree_skb_any(skb);
>> +	}
>> +
>> +	/* update stats */
>> +	ndev->stats.rx_bytes += actual_pkt_len;
>> +	ndev->stats.rx_packets++;
> 
> See comment below about atomicity.
> 
>> +	return 0;
>> +}
>> +
>> +/**
>> + * icssm_emac_rx_thread - EMAC Rx interrupt thread handler
>> + * @irq: interrupt number
>> + * @dev_id: pointer to net_device
>> + *
>> + * EMAC Rx Interrupt thread handler - function to process the rx frames in a
>> + * irq thread function. There is only limited buffer at the ingress to
>> + * queue the frames. As the frames are to be emptied as quickly as
>> + * possible to avoid overflow, irq thread is necessary. Current implementation
>> + * based on NAPI poll results in packet loss due to overflow at
>> + * the ingress queues. Industrial use case requires loss free packet
>> + * processing. Tests shows that with threaded irq based processing,
>> + * no overflow happens when receiving at ~92Mbps for MTU sized frames and thus
>> + * meet the requirement for industrial use case.
> 
> That's interesting. Any idea why this is the case? Is there not
> enough CPU for softirq/NAPI processing to run or something? I
> suppose I'd imagine that NAPI would run and if data is arriving fast
> enough, the NAPI would be added to the repoll list and processed
> later.
> 
> So I guess either there isn't enough CPU or the ingress queues don't
> have many descriptors or something like that ?
> 

This is due to combination of both limited number of buffer descriptors 
(memory constraints) and also not having enough CPU.

>> + *
>> + * Return: interrupt handled condition
>> + */
>> +static irqreturn_t icssm_emac_rx_thread(int irq, void *dev_id)
>> +{
>> +	struct net_device *ndev = (struct net_device *)dev_id;
>> +	struct prueth_emac *emac = netdev_priv(ndev);
>> +	struct prueth_queue_desc __iomem *queue_desc;
>> +	const struct prueth_queue_info *rxqueue;
>> +	struct prueth *prueth = emac->prueth;
>> +	struct net_device_stats *ndevstats;
>> +	struct prueth_packet_info pkt_info;
>> +	int start_queue, end_queue;
>> +	void __iomem *shared_ram;
>> +	u16 bd_rd_ptr, bd_wr_ptr;
>> +	u16 update_rd_ptr;
>> +	u8 overflow_cnt;
>> +	u32 rd_buf_desc;
>> +	int used = 0;
>> +	int i, ret;
>> +
>> +	ndevstats = &emac->ndev->stats;
> 
> FWIW the docs in include/linux/netdevice.h say:
> 
> /**
> ...
> *      @stats:         Statistics struct, which was left as a legacy, use
> *                      rtnl_link_stats64 instead
> ...
> */
> struct net_device {
>  ...
>  struct net_device_stats stats; /* not used by modern drivers */
>  ...
> };
> 
> perhaps consider using rtnl_link_stats64 as suggested instead?
> 
>> +	shared_ram = emac->prueth->mem[PRUETH_MEM_SHARED_RAM].va;
>> +
>> +	start_queue = emac->rx_queue_start;
>> +	end_queue = emac->rx_queue_end;
>> +retry:
>> +	/* search host queues for packets */
>> +	for (i = start_queue; i <= end_queue; i++) {
>> +		queue_desc = emac->rx_queue_descs + i;
>> +		rxqueue = &queue_infos[PRUETH_PORT_HOST][i];
>> +
>> +		overflow_cnt = readb(&queue_desc->overflow_cnt);
>> +		if (overflow_cnt > 0) {
>> +			emac->ndev->stats.rx_over_errors += overflow_cnt;
>> +			/* reset to zero */
>> +			writeb(0, &queue_desc->overflow_cnt);
>> +		}
>> +
>> +		bd_rd_ptr = readw(&queue_desc->rd_ptr);
>> +		bd_wr_ptr = readw(&queue_desc->wr_ptr);
>> +
>> +		/* while packets are available in this queue */
>> +		while (bd_rd_ptr != bd_wr_ptr) {
>> +			/* get packet info from the read buffer descriptor */
>> +			rd_buf_desc = readl(shared_ram + bd_rd_ptr);
>> +			icssm_parse_packet_info(prueth, rd_buf_desc, &pkt_info);
>> +
>> +			if (pkt_info.length <= 0) {
>> +				/* a packet length of zero will cause us to
>> +				 * never move the read pointer ahead, locking
>> +				 * the driver, so we manually have to move it
>> +				 * to the write pointer, discarding all
>> +				 * remaining packets in this queue. This should
>> +				 * never happen.
>> +				 */
>> +				update_rd_ptr = bd_wr_ptr;
>> +				ndevstats->rx_length_errors++;
> 
> See question below.
> 
>> +			} else if (pkt_info.length > EMAC_MAX_FRM_SUPPORT) {
>> +				/* if the packet is too large we skip it but we
>> +				 * still need to move the read pointer ahead
>> +				 * and assume something is wrong with the read
>> +				 * pointer as the firmware should be filtering
>> +				 * these packets
>> +				 */
>> +				update_rd_ptr = bd_wr_ptr;
>> +				ndevstats->rx_length_errors++;
> 
> in netdevice.h it says:
> 
> * struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
> *      Called when a user wants to get the network device usage
> *      statistics. Drivers must do one of the following:
> *      1. Define @ndo_get_stats64 to fill in a zero-initialised
> *         rtnl_link_stats64 structure passed by the caller.
> *      2. Define @ndo_get_stats to update a net_device_stats structure
> *         (which should normally be dev->stats) and return a pointer to
> *         it. The structure may be changed asynchronously only if each
> *         field is written atomically.
> *      3. Update dev->stats asynchronously and atomically, and define
> *         neither operation.
> 
> I didn't look in the other patches to see if ndo_get_stats is
> defined or not, but are these increments atomic?

This module maintain 32 bit statistics inside PRU firmware. we will 
check the feasibility of using rtnl_link_stats64 and make appropriate 
changes as per your suggestion in next version. 

Thanks & Best Regards,
Basharath

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ