netdev - Re: [PATCH 1/1] can: m_can: Control tx flow to avoid message stuck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fzbw7i5wrpngg4ycapbo2g4b6d7ecykj4an3flcrxgwrp5t6cr@ogqcnsnvlwi2>
Date: Thu, 9 Jan 2025 16:43:32 +0100
From: Markus Schneider-Pargmann <msp@...libre.com>
To: subramanian.mohan@...el.com
Cc: rcsekar@...sung.com, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, balbi@...nel.org, raymond.tan@...el.com, 
	jarkko.nikula@...ux.intel.com, linux-can@...r.kernel.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux@...tq-group.com, lst@...gutronix.de, 
	matthias.hahn@...el.com, srinivasan.chinnadurai@...el.com
Subject: Re: [PATCH 1/1] can: m_can: Control tx flow to avoid message stuck

Hi,

On Wed, Jan 08, 2025 at 02:31:12PM +0530, subramanian.mohan@...el.com wrote:
> From: Subramanian Mohan <subramanian.mohan@...el.com>
> 
> The prolonged testing of passing can messages between
> two Elkhartlake platforms resulted in message stuck
> i.e Message did not receive at receiver side

Can you please describe the reason for the stuck messages in your
commit message? I am reading this but I don't understand why this
happens or why your proposed solution helps.

> 
> Contolling TX i.e TEFN bit helped to resolve the message
> stuck issue.
> 
> The current solution is enhanced/optimized from the below patch:
> https://lore.kernel.org/lkml/20230623051124.64132-1-kumari.pallavi@intel.com/T/
> 
> Setup used to reproduce the issue:
> 
> +---------------------+         +----------------------+
> |Intel ElkhartLake    |         |Intel ElkhartLake     |
> |       +--------+    |         |       +--------+     |
> |       |m_can 0 |    |<=======>|       |m_can 0 |     |
> |       +--------+    |         |       +--------+     |
> +---------------------+         +----------------------+
> 
> Steps to be run on the two Elkhartlake HW:
> 1)Bus-Rate is 1 MBit/s
> 2)Busload during the test is about 40%
> 3)we initialize the CAN with following commands
> 4)ip link set can0 txqueuelen 100/1024/2048
> 5)ip link set can0 up type can bitrate 1000000
> 
> Python scripts are used send and receive the can messages
> between the EHL systems.
> 
> Signed-off-by: Hahn Matthias <matthias.hahn@...el.com>
> Signed-off-by: Subramanian Mohan <subramanian.mohan@...el.com>
> ---
>  drivers/net/can/m_can/m_can.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> index 97cd8bbf2e32..0a2c9a622842 100644
> --- a/drivers/net/can/m_can/m_can.c
> +++ b/drivers/net/can/m_can/m_can.c
> @@ -1220,7 +1220,7 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
>  static int m_can_interrupt_handler(struct m_can_classdev *cdev)
>  {
>  	struct net_device *dev = cdev->net;
> -	u32 ir = 0, ir_read;
> +	u32 ir = 0, ir_read, new_interrupts;
>  	int ret;
>  
>  	if (pm_runtime_suspended(cdev->dev))
> @@ -1283,6 +1283,9 @@ static int m_can_interrupt_handler(struct m_can_classdev *cdev)
>  			ret = m_can_echo_tx_event(dev);
>  			if (ret != 0)
>  				return ret;
> +
> +			new_interrupts = cdev->active_interrupts & ~(IR_TEFN);
> +			m_can_interrupt_enable(cdev, new_interrupts);

Here is a theoretical situation of two messages being sent. The first is
being sent and handled in this interrupt handler. Then it would disable
the TEFN bit right? If the second message wasn't done sending yet, how
would it ever call the interrupt handler if the interrupt is disabled?

Also you are disabling this interrupt here regardless of the type of
mcan device and also regardless of the coalescing state. In the transmit
part you are only enabling it for non-peripheral devices. For peripheral
mcan devices this would also introduce an additional two transfers per
transmit.

In which situations is this really necessary? Does it help to implement
coalescing for non-peripheral devices?

Best
Markus

>  		}
>  	}
>  
> @@ -1989,6 +1992,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
>  	struct m_can_classdev *cdev = netdev_priv(dev);
>  	unsigned int frame_len;
>  	netdev_tx_t ret;
> +	u32 new_interrupts;
>  
>  	if (can_dev_dropped_skb(dev, skb))
>  		return NETDEV_TX_OK;
> @@ -2008,8 +2012,11 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
>  
>  	if (cdev->is_peripheral)
>  		ret = m_can_start_peripheral_xmit(cdev, skb);
> -	else
> +	else {
> +		new_interrupts = cdev->active_interrupts | IR_TEFN;
> +		m_can_interrupt_enable(cdev, new_interrupts);
>  		ret = m_can_tx_handler(cdev, skb);
> +	}
>  
>  	if (ret != NETDEV_TX_OK)
>  		netdev_completed_queue(dev, 1, frame_len);
> -- 
> 2.35.3
> 

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)