lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <afd9eece-49a0-1ee5-5ffc-ff9b92f4f80c@grandegger.com>
Date:   Mon, 11 Mar 2019 12:19:58 +0100
From:   Wolfgang Grandegger <wg@...ndegger.com>
To:     Dan Murphy <dmurphy@...com>, mkl@...gutronix.de,
        davem@...emloft.net
Cc:     linux-can@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v7 1/4] can: m_can: Create a m_can platform framework

Hello Dan,

Am 08.03.19 um 21:36 schrieb Dan Murphy:
> On 3/8/19 12:06 PM, Wolfgang Grandegger wrote:
>>
>>
>> Am 08.03.19 um 18:52 schrieb Dan Murphy:
>>> On 3/8/19 11:40 AM, Wolfgang Grandegger wrote:
>>>> Hello Dan,
>>>>
>>>> Am 08.03.19 um 18:25 schrieb Dan Murphy:
>>>>> On 3/8/19 11:08 AM, Wolfgang Grandegger wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Am 08.03.19 um 16:48 schrieb Dan Murphy:
>>>>>>> Wolfgang
>>>>>>>
>>>>>>> On 3/8/19 8:41 AM, Wolfgang Grandegger wrote:
>>>>>>>> Hello Dan,
>>>>>>>>
>>>>>>>> thinking more about it...
>>>>>>>>
>>>>>>>> Am 08.03.19 um 14:29 schrieb Wolfgang Grandegger:
>>>>>>>>> Hello Dan,
>>>>>>>>>
>>>>>>>>> Am 08.03.19 um 13:44 schrieb Dan Murphy:
>>>>>>>>>> Wolfgang
>>>>>>>>>>
>>>>>>>>>> On 3/8/19 4:10 AM, Wolfgang Grandegger wrote:
>>>>>>>>>>> Hallo Dan,
>>>>>>>>>>>
>>>>>>>>>>> Am 05.03.19 um 16:52 schrieb Dan Murphy:
>>>>>>>>>>>> Create a m_can platform framework that peripherial
>>>>>>>>>>>> devices can register to and use common code and register sets.
>>>>>>>>>>>> The peripherial devices may provide read/write and configuration
>>>>>>>>>>>> support of the IP.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Dan Murphy <dmurphy@...com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> v7 - Fixed remaining new checkpatch issues, removed CSR setting, fixed tx hard
>>>>>>>>>>>> start function to return tx_busy, and renamed device callbacks - https://lore.kernel.org/patchwork/patch/1047220/
>>>>>>>>>>>>
>>>>>>>>>>>> v6 - Squashed platform patch to this patch for bissectablity, fixed coding style
>>>>>>>>>>>> issues, updated Kconfig help, placed mcan reg offsets back into c file, renamed
>>>>>>>>>>>> priv->skb to priv->tx_skb and cleared perp interrupts at ISR start -
>>>>>>>>>>>> Patch 1 comments - https://lore.kernel.org/patchwork/patch/1042446/
>>>>>>>>>>>> Patch 2 comments - https://lore.kernel.org/patchwork/patch/1042442/
>>>>>>>>>>>>
>>>>>>>>>>>>  drivers/net/can/m_can/Kconfig          |  13 +-
>>>>>>>>>>>>  drivers/net/can/m_can/Makefile         |   1 +
>>>>>>>>>>>>  drivers/net/can/m_can/m_can.c          | 700 +++++++++++++------------
>>>>>>>>>>>>  drivers/net/can/m_can/m_can.h          | 110 ++++
>>>>>>>>>>>>  drivers/net/can/m_can/m_can_platform.c | 202 +++++++
>>>>>>>>>>>>  5 files changed, 682 insertions(+), 344 deletions(-)
>>>>>>>>>>>>  create mode 100644 drivers/net/can/m_can/m_can.h
>>>>>>>>>>>>  create mode 100644 drivers/net/can/m_can/m_can_platform.c
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/net/can/m_can/Kconfig b/drivers/net/can/m_can/Kconfig
>>>>>>>>>>>> index 04f20dd39007..f7119fd72df4 100644
>>>>>>>>>>>> --- a/drivers/net/can/m_can/Kconfig
>>>>>>>>>>>> +++ b/drivers/net/can/m_can/Kconfig
>>>>>>>>>>>> @@ -1,5 +1,14 @@
>>>>>>>>>>>>  config CAN_M_CAN
>>>>>>>>>>>> +	tristate "Bosch M_CAN support"
>>>>>>>>>>>> +	---help---
>>>>>>>>>>>> +	  Say Y here if you want support for Bosch M_CAN controller framework.
>>>>>>>>>>>> +	  This is common support for devices that embed the Bosch M_CAN IP.
>>>>>>>>>>>> +
>>>>>>>>>>>> +config CAN_M_CAN_PLATFORM
>>>>>>>>>>>> +	tristate "Bosch M_CAN support for io-mapped devices"
>>>>>>>>>>>>  	depends on HAS_IOMEM
>>>>>>>>>>>> -	tristate "Bosch M_CAN devices"
>>>>>>>>>>>> +	depends on CAN_M_CAN
>>>>>>>>>>>>  	---help---
>>>>>>>>>>>> -	  Say Y here if you want to support for Bosch M_CAN controller.
>>>>>>>>>>>> +	  Say Y here if you want support for IO Mapped Bosch M_CAN controller.
>>>>>>>>>>>> +	  This support is for devices that have the Bosch M_CAN controller
>>>>>>>>>>>> +	  IP embedded into the device and the IP is IO Mapped to the processor.
>>>>>>>>>>>> diff --git a/drivers/net/can/m_can/Makefile b/drivers/net/can/m_can/Makefile
>>>>>>>>>>>> index 8bbd7f24f5be..057bbcdb3c74 100644
>>>>>>>>>>>> --- a/drivers/net/can/m_can/Makefile
>>>>>>>>>>>> +++ b/drivers/net/can/m_can/Makefile
>>>>>>>>>>>> @@ -3,3 +3,4 @@
>>>>>>>>>>>>  #
>>>>>>>>>>>>  
>>>>>>>>>>>>  obj-$(CONFIG_CAN_M_CAN) += m_can.o
>>>>>>>>>>>> +obj-$(CONFIG_CAN_M_CAN_PLATFORM) += m_can_platform.o
>>>>>>>>>>>> diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
>>>>>>>>>>>> index 9b449400376b..a60278d94126 100644
>>>>>>>>>>>> --- a/drivers/net/can/m_can/m_can.c
>>>>>>>>>>>> +++ b/drivers/net/can/m_can/m_can.c
>>>>>>>>>>>
>>>>>>>>>>> ... snip...
>>>>>>>>>>>
>>>>>>>>>>>> +static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
>>>>>>>>>>>> +				    struct net_device *dev)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +	struct m_can_priv *priv = netdev_priv(dev);
>>>>>>>>>>>> +
>>>>>>>>>>>> +	if (can_dropped_invalid_skb(dev, skb))
>>>>>>>>>>>> +		return NETDEV_TX_OK;
>>>>>>>>>>>> +
>>>>>>>>>>>> +	if (priv->is_peripherial) {
>>>>>>>>>>>> +		if (priv->tx_skb) {
>>>>>>>>>>>> +			netdev_err(dev, "hard_xmit called while tx busy\n");
>>>>>>>>>>>> +			return NETDEV_TX_BUSY;
>>>>>>>>>>>> +		}
>>>>>>>>>>>
>>>>>>>>>>> The problem with that approach is, that the upper layer will try to
>>>>>>>>>>> resubmit the current "skb" but not the previous "tx_skb". And the
>>>>>>>>>>> previous "tx_skb" has not been freed yet. I would just drop and free the
>>>>>>>>>>> skb and return NETDEV_TX_OK in m_can_tx_handler() for peripheral devices
>>>>>>>>>>> (like can_dropped_invalid_skb() does).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OK.
>>>>>>>>>>
>>>>>>>>>> So would this also be a bug in the hi3110 and mcp251x drivers (line 521) as well because besides checking tx_length
>>>>>>>>>> this is how these drivers are written.
>>>>>>>>>
>>>>>>>>> This is different. When entering the "start_xmit" routine, the previous
>>>>>>>>> TX is still in progress. It will (hopefully) complete soon. Therefore
>>>>>>>>> returning NETDEV_TX_BUSY is OK. The "start_xmit" routine will be
>>>>>>>>> recalled soon with the same "skb". That scenario should/could also not
>>>>>>>>> happen.
>>>>>>>>
>>>>>>>> In principle, this also applies to the m_can peripheral devices. If
>>>>>>>> tx_skb is not NULL, the TX is still in progress and returning
>>>>>>>> NETDEV_TX_BUSY is just fine.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In contrast, in "m_can_tx_handler()", the skb could not be handled
>>>>>>>>> because the FIFO is full. The "start_xmit" routine for peripheral
>>>>>>>>> devices for that skb already returned NETDEV_TX_OK. Therefore the only
>>>>>>>>> meaningful action is to drop the skb. Also this error should not happen
>>>>>>>>> and if, something is going really wrong. Therefore I think, a
>>>>>>>>> WARN_ONCE() would be even more appropriate. But that should be a
>>>>>>>>> separate patch.
>>>>>>>>
>>>>>>>> But that's a different issue/error. The tx_skb cannot be processed in
>>>>>>>> "m_can_tx_handler()". Either we drop it or we re-queue it (retry later).
>>>>>>>>
>>>>>>>
>>>>>>> OK I am a bit confused on this.  Are you saying this is not an issue?
>>>>>>> Or are you saying I need to check for tx_len like the other code?
>>>>>>
>>>>>> If you check for tx_skb in the "start_xmit" routine like the hi3110 and
>>>>>> mcp251x, it will work the same way. But only, if the "tx_handler()" has
>>>>>> fully processed the message. It simple means, the TX is still in
>>>>>> progress and will complete soon. But in "m_can_tx_handler()" we return
>>>>>> without handling the message! It will never be sent and freed. Or will
>>>>>> the "m_can_tx_handler()" retry?
>>>>>>
>>>>>
>>>>> I am not seeing where we are not handling the message in the m_can_tx_handler()
>>>>
>>>> static void m_can_tx_handler(struct m_can_classdev *priv)
>>>> {
>>>> 		...
>>>> 		/* Check if FIFO full */
>>>> 		if (m_can_tx_fifo_full(priv)) {
>>>> 			/* This shouldn't happen */
>>>> 			netif_stop_queue(dev);
>>>> 			netdev_warn(dev,
>>>> 				    "TX queue active although FIFO is full.");
>>>> 			return;
>>>> 		}
>>>>
>>>> We simply return here. When is the message (tx_skb) processed (sent or freed)?
>>>> What happens with tx_skb?
>>>>
>>>
>>> Are you sure you are looking at the right code?
>>>
>>> For patch version v7 I have the following
>>>
>>> 		/* Check if FIFO full */
>>> 		if (m_can_tx_fifo_full(cdev)) {
>>> 			/* This shouldn't happen */
>>> 			netif_stop_queue(dev);
>>> 			netdev_warn(dev,
>>> 				    "TX queue active although FIFO is full.");
>>> 			return NETDEV_TX_BUSY;
>>> 		}
>>>
>>> Which is no change from the original source code.
>>
>> I know,  but for the peripheral devices you have:
>>
>>   static void m_can_tx_work_queue(struct work_struct *ws)
>>   {
>> 	struct m_can_priv *priv = container_of(ws, struct m_can_priv,
>> 						tx_work);
>> 	netdev_tx_t ret;
>>
>> 	ret = m_can_tx_handler(priv);
>> 	if (ret == NETDEV_TX_OK)
>> 		priv->tx_skb = NULL;
>>   }
>>
>> What will happen with tx_skb if NETDEV_TX_BUSY? It has not been
>> dropped/freed yet?
>>
> 
> OK I think I see the issue there.

The key point is that the "skb" entered by the "start_xmit" must be
released/free when it's processed (with NETDEV_TX_OK). This is more
tricky for perp devices because the "skb" is handled deferred.

> 
> I should probably add can_put_echo_skb if NETDEV_TX_BUSY and always NULL out the SKB.

can_put_echo_skb() should only be called after the TX has been
initiated. The normal flow for the skb is:

 start-xmit -> initiate tx -> can_put_echo_skb -> return NETDEV_TX_OK...
   tx done interrupt -> can_get_echo_skb -> free skb

I would just drop the message/skb:

		/* Check if FIFO full */
		if (m_can_tx_fifo_full(cdev)) {
			/* This shouldn't happen */
			netif_stop_queue(dev);
			netdev_warn(dev,
				    "TX queue active although FIFO is full.");
			if (cdev->is_peripherial) {
				kfree_skb(skb);
				dev->stats.tx_dropped++;
				return NETDEV_TX_OK;
			} else {
				return NETDEV_TX_BUSY;
			}
		}

> This appears to be the way the other perp drivers do it as they just put and null the skb
> regardless of the return of the handlers.

You can also use:

		if (cdev->tx_skb) {
			netdev_err(dev, "hard_xmit called while tx busy\n");
			return NETDEV_TX_BUSY;
		}

to check for "hard_xmit called while tx busy". But you still need to
handle the "m_can_tx_fifo_full(cdev)" case properly. See above.

> And clean is called when the BUS is off or coming out of suspend.

Probably we need that as well even if other drivers don't care if
the device goes bus-off while TX messages are pending.

Wolfgang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ