linux-kernel - Re: [virtio-dev] [RFC PATCH 1/1] can: virtio: Initial virtio CAN driver.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8bd1dc3b-e1f0-e7f9-bf65-8d243c65adb5@opensynergy.com>
Date:   Thu, 3 Nov 2022 13:26:35 +0100
From:   Harald Mommer <hmo@...nsynergy.com>
To:     Arnd Bergmann <arnd@...nel.org>,
        Harald Mommer <harald.mommer@...nsynergy.com>
Cc:     virtio-dev@...ts.oasis-open.org, linux-can@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Wolfgang Grandegger <wg@...ndegger.com>,
        Marc Kleine-Budde <mkl@...gutronix.de>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Dariusz Stojaczyk <Dariusz.Stojaczyk@...nsynergy.com>,
        Stratos Mailing List <stratos-dev@...lists.linaro.org>
Subject: Re: [virtio-dev] [RFC PATCH 1/1] can: virtio: Initial virtio CAN
 driver.

Hello,

currently in the preparation that changed code can go out to the list.

On 25.08.22 20:21, Arnd Bergmann wrote:

>
>>   drivers/net/can/Kconfig                 |    1 +
>>   drivers/net/can/Makefile                |    1 +
>>   drivers/net/can/virtio_can/Kconfig      |   12 +
>>   drivers/net/can/virtio_can/Makefile     |    5 +
>>   drivers/net/can/virtio_can/virtio_can.c | 1176 +++++++++++++++++++++++
>>   include/uapi/linux/virtio_can.h         |   69 ++
> Since the driver is just one file, you probably don't need the subdirectory.
Easy to do, makes the changes smaller.
>> +struct virtio_can_tx {
>> +       struct list_head list;
>> +       int prio; /* Currently always 0 "normal priority" */
>> +       int putidx;
>> +       struct virtio_can_tx_out tx_out;
>> +       struct virtio_can_tx_in tx_in;
>> +};
> Having a linked list of these appears to add a little extra complexity.
> If they are always processed in sequence, using an array would be
> much simpler, as you just need to remember the index.

The messages are not necessarily processed in sequence by the CAN stack. 
CAN is priority based. The lower the CAN ID the higher the priority. So 
a message with CAN ID 0x100 can surpass a message with ID 0x123 if the 
hardware is not just simple basic CAN controller using a single TX 
mailbox with a FIFO queue on top of it.

Thinking about this the code becomes more complex with the array. What I 
get from the device when the message has been processed is a pointer to 
the processed message by virtqueue_get_buf(). I can then simply do a 
list_del(), free the message and done.

>> +#ifdef DEBUG
>> +static void __attribute__((unused))
>> +virtio_can_hexdump(const void *data, size_t length, size_t base)
>> +{
>> +#define VIRTIO_CAN_MAX_BYTES_PER_LINE 16u
> This seems to duplicate print_hex_dump(), maybe just use that?
Checked where it's still used. The code is not disabled by #ifdef DEBUG 
but simply commented out. Under this circumstances it's for now best to 
simply remove the code now and also the commented out places where is 
was used at some time in the past.
>> +
>> +       while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
>> +               cpu_relax();
>> +
>> +       mutex_unlock(&priv->ctrl_lock);
> A busy loop is probably not what you want here. Maybe just
> wait_for_completion() until the callback happens?

Was done in the same way as elsewhere 
(virtio_console.c/__send_control_msg() & 
virtio_net.c/virtnet_send_command()). Yes, wait_for_completion() is 
better, this avoids polling.

>> +       /* Push loopback echo. Will be looped back on TX interrupt/TX NAPI */
>> +       can_put_echo_skb(skb, dev, can_tx_msg->putidx, 0);
>> +
>> +       err = virtqueue_add_sgs(vq, sgs, 1u, 1u, can_tx_msg, GFP_ATOMIC);
>> +       if (err != 0) {
>> +               list_del(&can_tx_msg->list);
>> +               virtio_can_free_tx_idx(priv, can_tx_msg->prio,
>> +                                      can_tx_msg->putidx);
>> +               netif_stop_queue(dev);
>> +               spin_unlock_irqrestore(&priv->tx_lock, flags);
>> +               kfree(can_tx_msg);
>> +               if (err == -ENOSPC)
>> +                       netdev_info(dev, "TX: Stop queue, no space left\n");
>> +               else
>> +                       netdev_warn(dev, "TX: Stop queue, reason = %d\n", err);
>> +               return NETDEV_TX_BUSY;
>> +       }
>> +
>> +       if (!virtqueue_kick(vq))
>> +               netdev_err(dev, "%s(): Kick failed\n", __func__);
>> +
>> +       spin_unlock_irqrestore(&priv->tx_lock, flags);
> There should not be a need for a spinlock or disabling interrupts
> in the xmit function. What exactly are you protecting against here?

I'm using 2 NAPIs, one for TX and one for RX. The RX NAPI just receives 
RX messages and is of no interest here. The TX NAPI handles the TX 
messages which have been processed by the virtio CAN device in 
virtio_can_read_tx_queue(). If this was done without the TX NAPI this 
would have been done by the TX interrupt directly, no difference.

In virtio_can_start_xmit()

* Reserve putidx - done by an own mechanism using list operations in 
tx_putidx_list

Could be that it's simpler to use idr_alloc() and friends getting those 
numbers to get rid of this own mechanism, not sure yet. But this needs a 
locks as it's based on a linked list and the list operation has to be 
protected.

* Add the TX message to the pending list

Again a list operation which has to be protected.

* Try to send the message

Now it may happen that at the same time while we do something with the 
lists in virtio_can_start_xmit() the function virtio_can_read_tx_queue() 
is active accessing the same queue. Comment above virtqueue_add_sgs(): 
"Caller must ensure that we don't call this with other virtqueue 
operations at the same time (except when noted)."

Also tried, virtqueue_add_sgs() needs this lock.

* And then there is also a list operation on failure of the function

But the code needed to reworked to understand the necessity of each lock 
again.

> As a further optimization, you may want to use the xmit_more()
> function, as the virtqueue kick is fairly expensive and can be
> batched here.
Looked elsewhere how it works and did.
>> +       kfree(can_tx_msg);
>> +
>> +       /* Flow control */
>> +       if (netif_queue_stopped(dev)) {
>> +               netdev_info(dev, "TX ACK: Wake up stopped queue\n");
>> +               netif_wake_queue(dev);
>> +       }
> You may want to add netdev_sent_queue()/netdev_completed_queue()
> based BQL flow control here as well, so you don't have to rely on the
> queue filling up completely.
Not addressed, not yet completely understood.
>> +static int virtio_can_probe(struct virtio_device *vdev)
>> +{
>> +       struct net_device *dev;
>> +       struct virtio_can_priv *priv;
>> +       int err;
>> +       unsigned int echo_skb_max;
>> +       unsigned int idx;
>> +       u16 lo_tx = VIRTIO_CAN_ECHO_SKB_MAX;
>> +
>> +       BUG_ON(!vdev);
> Not a useful debug check, just remove the BUG_ON(!vdev), here and elsewhere
A lot of BUG_ON() were removed when not considered useful, some were 
reworked to contain better error handling code when this was possible, 
others were kept to ease further development. If anyone catches 
something would be seriously broken and had to be fixed in the code. But 
this then we want to know.
>> +
>> +       echo_skb_max = lo_tx;
>> +       dev = alloc_candev(sizeof(struct virtio_can_priv), echo_skb_max);
>> +       if (!dev)
>> +               return -ENOMEM;
>> +
>> +       priv = netdev_priv(dev);
>> +
>> +       dev_info(&vdev->dev, "echo_skb_max = %u\n", priv->can.echo_skb_max);
> Also remove the prints, I assume this is left over from
> initial debugging.
Yes, this thing was overall too noisy.
>> +
>> +       register_virtio_can_dev(dev);
>> +
>> +       /* Initialize virtqueues */
>> +       err = virtio_can_find_vqs(priv);
>> +       if (err != 0)
>> +               goto on_failure;
> Should the register_virtio_can_dev() be done here? I would expect this to be
> the last thing after setting up the queues.
Doing so makes the code somewhat simpler and shorter = better.
>> +static struct virtio_driver virtio_can_driver = {
>> +       .feature_table = features,
>> +       .feature_table_size = ARRAY_SIZE(features),
>> +       .feature_table_legacy = NULL,
>> +       .feature_table_size_legacy = 0u,
>> +       .driver.name =  KBUILD_MODNAME,
>> +       .driver.owner = THIS_MODULE,
>> +       .id_table =     virtio_can_id_table,
>> +       .validate =     virtio_can_validate,
>> +       .probe =        virtio_can_probe,
>> +       .remove =       virtio_can_remove,
>> +       .config_changed = NULL,
>> +#ifdef CONFIG_PM_SLEEP
>> +       .freeze =       virtio_can_freeze,
>> +       .restore =      virtio_can_restore,
>> +#endif
> You can remove the #ifdef here and above, and replace that with the
> pm_sleep_ptr() macro in the assignment.

This pm_sleep_ptr(_ptr) macro returns either the argument when 
CONFIG_PM_SLEEP is defined or NULL. But in struct virtio_driver there is

#ifdef CONFIG_PM   int(*freeze) ...;   int(*restore) ...; #endif

so without CONFIG_PM there are no freeze and restore structure members.

So

   .freeze = pm_sleep_ptr(virtio_can_freeze)

won't work.

>> diff --git a/include/uapi/linux/virtio_can.h b/include/uapi/linux/virtio_can.h
>> new file mode 100644
>> index 000000000000..0ca75c7a98ee
>> --- /dev/null
>> +++ b/include/uapi/linux/virtio_can.h
>> @@ -0,0 +1,69 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause */
>> +/*
>> + * Copyright (C) 2021 OpenSynergy GmbH
>> + */
>> +#ifndef _LINUX_VIRTIO_VIRTIO_CAN_H
>> +#define _LINUX_VIRTIO_VIRTIO_CAN_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/virtio_types.h>
>> +#include <linux/virtio_ids.h>
>> +#include <linux/virtio_config.h>
> Maybe a link to the specification here? I assume the definitions in this file
> are all lifted from that document, rather than specific to the driver, right?
>
>           Arnd

The driver as made in parallel to the specification work. So there is no 
finished specification yet. To avoid traffic (mistake here) I've not 
sent the patch to the specification to all mailing lists.

Patch to the virtio specification is now here:

https://lore.kernel.org/all/20220825133410.18367-1-harald.mommer@opensynergy.com/

This was made on top of https://github.com/oasis-tcs/virtio-spec.git 
commit 26ed30ccb049

Harald

As mentioned, the updated code will be sent out to the list(s) soon.

-- 
Dipl.-Ing. Harald Mommer
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:  +49 (30) 60 98 540-0 <== Zentrale
Fax:    +49 (30) 60 98 540-99
E-Mail:harald.mommer@...nsynergy.com

www.opensynergy.com

Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah