lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fa51745b-efc2-f513-adf9-35253405ec1c@opensynergy.com>
Date:   Wed, 24 May 2023 15:30:40 +0200
From:   Harald Mommer <harald.mommer@...nsynergy.com>
To:     Vincent MAILHOL <mailhol.vincent@...adoo.fr>
Cc:     Mikhail Golubev-Ciuchea <Mikhail.Golubev-Ciuchea@...nsynergy.com>,
        Marc Kleine-Budde <mkl@...gutronix.de>,
        virtio-dev@...ts.oasis-open.org, linux-can@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org,
        Wolfgang Grandegger <wg@...ndegger.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Damir Shaikhutdinov <Damir.Shaikhutdinov@...nsynergy.com>
Subject: Re: [RFC PATCH v3] can: virtio: Initial virtio CAN driver.

Hello Vincent,

On 15.05.23 08:31, Vincent MAILHOL wrote:
>>>             if (err) {
>> Opened Eclipse, searched for "if (err != 0)" in the kernel code. 290
>> matches. For "if (ret != 0)" I found now 1970 matches.
> Read my comment again. I never mentioned err vs. ret.
>
>   I am asking to replace "if (err != 0)" by "if (err)". We are not
> using MISRA and there is no concept of essential boolean type here.
> You can pass an integer to an if ().
>
> I do not use eclipse, but git can give a few relevant statistics:
>
>    $ git grep "if (err != 0)" | wc -l
>    277
>    $ git grep "if (err)" | wc -l
>    34307
>
> And while this is not the topic, "ret" is more popular than "err":
>
>    $ git grep "if (ret != 0)" | wc -l
>    1956
>    $ git grep "if (ret)" | wc -l
>    67927
>
> but both are well established usage so I do not really care which one
> of "ret" or "err" you use.

Looks like major practice is indeed to omit the != 0 for error 
indicating return codes so I will adapt to the majority but adding an 
unlikely. The block is not expected to be entered at all as the comment 
says however it needs to be  there due to defensive programming practice.

=> if (unlikely(ret))

>>>> +               /* Not expected to happen */
>>>> +               dev_err(dev, "%s(): virtqueue_add_sgs() failed\n", __func__);
>>>> +       }
>>>> +
>>>> +       if (!virtqueue_kick(vq)) {
>>>> +               /* Not expected to happen */
>>>> +               dev_err(dev, "%s(): Kick failed\n", __func__);
>>>> +       }
>>>> +
>>>> +       while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq))
>>>> +               wait_for_completion(&priv->ctrl_done);
>>>> +
>>>> +       mutex_unlock(&priv->ctrl_lock);
>>>> +
>>>> +       return priv->cpkt_in.result;
>>>> +}
...
>>>> +static netdev_tx_t virtio_can_start_xmit(struct sk_buff *skb,
>>>> +                                        struct net_device *dev)
>>>> +{
>>>> +       struct virtio_can_priv *priv = netdev_priv(dev);
>>>> +       struct canfd_frame *cf = (struct canfd_frame *)skb->data;
>>>> +       struct virtio_can_tx *can_tx_msg;
>>>> +       struct virtqueue *vq = priv->vqs[VIRTIO_CAN_QUEUE_TX];
>>>> +       struct scatterlist sg_out[1];
>>>> +       struct scatterlist sg_in[1];
>>>> +       struct scatterlist *sgs[2];
>>> Instead declaring two times an array of 1, can we do:
>>>
>>>           struct scatterlist sgs[2];
> Ooopsy on my side. sgs is an array of pointers so the above is not equivalent.
>
> And doing this:
>
>            struct scatterlist *sgs[2];
>
> Would be problematic as the memory of the two elements would not be allocated.
>
>>> and then use sgs[0] for out and sgs[1] for in?
>>>
>>> Or, if you really want to keep sg_out and sg_in, at least do:
>>>
>>>             struct scatterlist sg_out, sg_in;
>>>             struct scatterlist *sgs[] = {&sg_out, &sg_in};
>>>
>>> N.B. The same comment also applies to the other places where you are
>>> doing some sg[1] declarations.
>> Makes thing worse. I'm not even sure whether this is a null change only
>> or introduces a problem.
>>
>> virtio strictly separates devices readable and device writeable data.
>> Therefore I want really to have here 2 separate definitions. The one is
>> data to the device, the other is data from the device.
> ACK. My second example does that:
>
>              struct scatterlist sg_out, sg_in;
>              struct scatterlist *sgs[] = {&sg_out, &sg_in};

First I remove the [1] from sg_out and sg_in to make those scalars 
instead of arrays of [1].

Your 2nd example above violates the reverse xmas tree rule so I cannot 
take this one as is.

But got the thing through the compiler with the following construct:

     struct scatterlist sg_out, sg_in, *sgs[2] = { &sg_out, &sg_in };

>> If this had any advantage, I could separate the data further. For
>> example I could separate the payload from the preceding data. In this
>> case I had  struct scatterlist sg_out[2]. As long as the payload is
>> small the memcpy for the payload can be justified and [1] is good. In
>> fact, those are still arrays even if by coincident now the number of
>> elements is 1.
> sg_out and sg_in are only passed to one function: sg_init_one(). And
> as the name suggests, sg_init_one expects a single scatterlist, not an
> array.
>
> A look at:
>
>    $ git grep sg_init_one
>
> show me that doing as "sg_init_one(&foo[0], ...)" is not a popular
> solution. The majority does sg_init_one(&foo, ...).
ACK. Checked this also now, saw &foo[0] once or so.
> I do get that sgs is an array of arrays. I am just not comfortable
> with sg_out and sg_in being declared as arrays because these never get
> used as such.
>
>>>> +       unsigned long flags;
>>>> +       u32 can_flags;
>>>> +       int err;
>>>> +       int putidx;
>>>> +       netdev_tx_t xmit_ret = NETDEV_TX_OK;
>>>> +       const unsigned int hdr_size = offsetof(struct virtio_can_tx_out, sdu);
>>>> +
>>>> +       if (can_dev_dropped_skb(dev, skb))
>>>> +               goto kick; /* No way to return NET_XMIT_DROP here */
>>>> +
>>>> +       /* No local check for CAN_RTR_FLAG or FD frame against negotiated
>>>> +        * features. The device will reject those anyway if not supported.
>>>> +        */
>>>> +
>>>> +       can_tx_msg = kzalloc(sizeof(*can_tx_msg), GFP_ATOMIC);
>>>> +       if (!can_tx_msg)
>>>> +               goto kick; /* No way to return NET_XMIT_DROP here */
I missed here to update the number of dropped messages.
>>>> +
>>>> +       can_tx_msg->tx_out.msg_type = cpu_to_le16(VIRTIO_CAN_TX);
>>>> +       can_flags = 0;
>>>> +
>>>> +       if (cf->can_id & CAN_EFF_FLAG) {
>>>> +               can_flags |= VIRTIO_CAN_FLAGS_EXTENDED;
>>>> +               can_tx_msg->tx_out.can_id = cpu_to_le32(cf->can_id & CAN_EFF_MASK);
>>>> +       } else {
>>>> +               can_tx_msg->tx_out.can_id = cpu_to_le32(cf->can_id & CAN_SFF_MASK);
>>>> +       }
>>>> +       if (cf->can_id & CAN_RTR_FLAG)
>>>> +               can_flags |= VIRTIO_CAN_FLAGS_RTR;
>>>> +       else
>>>> +               memcpy(can_tx_msg->tx_out.sdu, cf->data, cf->len);
>>>> +       if (can_is_canfd_skb(skb))
>>>> +               can_flags |= VIRTIO_CAN_FLAGS_FD;
>>>> +
>>>> +       can_tx_msg->tx_out.flags = cpu_to_le32(can_flags);
>>>> +       can_tx_msg->tx_out.length = cpu_to_le16(cf->len);
>>>> +
>>>> +       /* Prepare sending of virtio message */
>>>> +       sg_init_one(&sg_out[0], &can_tx_msg->tx_out, hdr_size + cf->len);
>>>> +       sg_init_one(&sg_in[0], &can_tx_msg->tx_in, sizeof(can_tx_msg->tx_in));
>>>> +       sgs[0] = sg_out;
>>>> +       sgs[1] = sg_in;
>>>> +
>>>> +       putidx = virtio_can_alloc_tx_idx(priv);
>>>> +
>>>> +       if (unlikely(putidx < 0)) {
>>>> +               netif_stop_queue(dev);
>>>> +               kfree(can_tx_msg);
>>>> +               netdev_warn(dev, "TX: Stop queue, no putidx available\n");
>>> ida_alloc_range() can also return -ENOMEM. So the error is not
>>> necessarily because of no putidx available. Maybe better to print the
>>> error message (with %pe to show the mnemotechnic).
>> %pe does not do that. It works for an error coded in a pointer. I have
>> here an int.
> Yes, and you can use the ERR_PTR() to turn your int into an error pointer.
>
> Do:
>
>    $ git grep -A1 "%pe"
>
> if you need examples.

Thanks for this trick with ERR_PTR(). Does not show me a clear text 
error but dumps the pointer in hex so that an additional hex to decimal 
conversion becomes necessary to lookup for the error code. May be due to 
my small embedded setup.

But I realized thinking about all those comments that there is major 
problem in this block:

- The queue is stopped in this block
- The only place where the queue is re-enabled is when a pending TX 
message is processed
- If there is a -ENOMEM and no TX message is pending the driver blocks 
until doomsday

Received also a review comment being to noisy here

- ENOMEM is a rare but valid condition so not tracing this anymore
- ENOSPC should be impossible if flow control works so WARN_ON(putidx == 
-ENOSPC)

=> Remove queue stop, trace only impossible condition not to be noisy, 
drop message always and don't forget to increase drop count.

With current code -ENOSPC was seen. There was a bug in 
virtio_can_read_tx_queue() where the netif_wake_queue() is not done when 
the tx_lock is still held but immediately afterwards and that was a problem.

>>>
>>> Why do we need both VIRTIO_CAN_F_RTR_FRAMES VIRTIO_CAN_FLAGS_RTR?
>>>
>>> Is it to manage devices not able to sent remote frames? If so, we may
>>> also need to add a CAN_CTRLMODE_RTR in linux/can/netlink.h?
>> VIRTIO_CAN_F_RTR_FRAMES is a feature flag. RTR frames may or may not be
>> supported. AUTOSAR CAN drivers do not know anything about RTR frames.
> Now that you say it, it rings a bell.
>
> So indeed, we will probably need a new flag in can/netlink.h to report
> to the userland whether a device is capable or not to manage remote
> frames.

Currently our virtio devices run in the Linux environment but this may 
not to be the case for all our virtio devices in the future. It is 
likely that at some point in time some devices will be migrated to a 
smaller RTOS environment and then it is well imaginable that someone 
wants to use an existing AUTOSAR CAN driver as backend for a virtio CAN 
device. With such a setup no VIRTIO_CAN_F_RTR_FRAMES.

Yes, probably, if we make such a setup possible and want to be clean.

>> if someone wants to build a virtio CAN device on top of an AUTOSAR CAN
>> driver with the additional requirement not to change the existing
>> AUTOSAR CAN driver RTR frames cannot be supported and the feature flag
>> won't be offered by the virtio device.
>>
>> VIRTIO_CAN_FLAGS_RTR is a bit to indicate a frame type in a CAN message.
>> Used internally in an Linux SocketCAN interface.
> ACK.
>
> On a side note, did you have a look at:
>
>    https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2felixir.bootlin.com%2flinux%2flatest%2fsource%2finclude%2fuapi%2flinux%2fcan%2fnetlink.h%23L95&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-75dedb6bf65ef4342bf7c18b89f4f30ed7dc9dc6
>
> ?
>
> It lists the different hardware capabilities which may or may not be
> present at the hardware level. This list can be used as an input to
> decide how to extend the feature bit list.

Looked at this and got some more headache especially with FD. There is 
CAN_CTRLMODE_FD and CAN_CTRLMODE_FD_NON_ISO. Elsewhere is CANFD_BRS. 
Virtio CAN with (too much?) AUTOSAR in mind has only an idea of FD and 
does not distinguish any of this. In AUTOSAR Can_Write() and friends 
there is only bit 30 which says "this is a CAN FD frame".

Virtio CAN was intended to get CAN for automotive usage into the virtual 
machine. However it should be open to be extended later if there were 
requirements.

The hard requirements I got for now are

 1. Bring CAN for automotive usage into the virtual machine
 2. Do not overload the very first version with features

2.) ensures 1.) will happen.

>>>> +
>>>> +/* CAN Result Types */
>>>> +#define VIRTIO_CAN_RESULT_OK            0
>>>> +#define VIRTIO_CAN_RESULT_NOT_OK        1
>>> Silly question, but what is the rationale of not using the error code
>>> from errno.h?
>> Looked into the AUTOSAR_SWS_StandardTypes.pdf  Std_ReturnType:
>>
>> E_OK = 0
>>
>> E_NOT_OK = 1
>>
>> other are "Available for user specific errors" like CAN_BUSY.
>>
>> Linux is only one operating system. An important one but there are
>> others. AUTOSAR people may ask you "What is errno.h?" (and also "What is
>> malloc?").
> Sorry, but I do not buy this argument. Do you really now AUTOSAR
> developpers who do not know about malloc()?

They know. But neither it's provided by their classic AUTOSAR 
environment nor are they allowed to use it if MISRA is enforced which is 
likely the case.

MISRA C 2012 Dir 4.12: "Dynamic memory allocation shall not be used".

MISRA C 2012 Rule 21.3 "The memory allocation and deallocation functions 
of <stdlib.h> shall not be used".

"What's malloc()" at least in the last company meant "I'm doing MISRA so 
I'm not supposed to use malloc()".

>> Our internal client is interested in a Virtio AUTOSAR CAN
>> driver. So there were reasons to look first into AUTOSAR.
> Is it AUTOSAR Classic or AUTOSAR Adaptive?

Looking currently into the specification(s) for the "Classic Platform". 
There exists no "errno.h".

> AUTOSAR Adaptive is POSIX (to some extends):
>
>    [SWS_OSI_01001] POSIX PSE51 Interface: [The OSI shall provide OS
>    functionality with POSIX PSE51 interface, according to the
>    1003.13-2003 specification.]
>
> Ref: AUTOSAR AP R22-11 - Specification of Operating System Interface
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fwww.autosar.org%2ffileadmin%2fstandards%2fR22%2d11%2fAP%2fAUTOSAR%5fSWS%5fOperatingSystemInterface.pdf&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-1b9dde25c1a8497018be33058ca535bcfe2c272a
>
>> There is also a CAN_BUSY for the AUTOSAR Can_Write() to be returned but
>> this is not needed at this interface as a virtio AUTOSAR CAN driver was
>> busy when there are no sufficient messages available in the virtqueue,
>> so for this condition we need no defined error code to be used in a
>> virtio message.
>>
>> Virtio block defines VIRTIO_BLK_S_OK 0, VIRTIO_BLK_S_IOERR 1,
>> VIRTIO_BLK_S_UNSUPP 2.
>>
>>> I do see that some other virtio devices do the same:
>>>
>>>     https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2felixir.bootlin.com%2flinux%2fv4.6%2fsource%2finclude%2fuapi%2flinux%2fvirtio%5fnet.h%23L140&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-698619745d87a25323604ab5a614cc946b24e642
>>>
>>> But I am genuinely curious to understand why virtio do not use more
>>> standard error codes.
>> I cannot speak for virtio. errno.h is ANSI C and POSIX but virtio does
>> not only address those environments. It is a more general specification.
> That's my point. ISO C is so predominant that those error codes are
> available nearly everywhere. And this being just some #define, it can
> easily be integrated to the few system which do not have this header.
>
> If there is a requirement to make virtio header self contained, then I
> would understand why POSIX error code can not be used. But as I said,
> I am genously curious to understand the reason behind this choice.
>
>> For the virtio RPMB device the result codes in the virtio specification
>> come for example directly from an JEDEC specification for eMMC. Which
>> has some connection to a JEDEC UFS specification, same result codes
>> there. Makes a lot of sense to use those result codes in this context.
>>
>> As virtio is more general, I have for this also my doubts whether it
>> really was a good idea to take over the CAN RX and CAN TX message
>> definitions 1:1 from Linux (if this is possible). Someone proposed but
>> I've my doubts.
> Yes, I did ask here:
>
>    https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flore.kernel.org%2flinux%2dcan%2fCAMZ6RqLALOYFWQJ4C4HTaRw7y%2dwaUbqOX0WzrWVNiQG51QexHw%40mail.gmail.com%2f&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-6453b89fa56fa0a97905d808b20a7df3eb16f85a
>
> I am still waiting for your answer.
>
> I do understand that Linux is not the only OS. However, it is the only
> one with a complehensive set of open source CAN dirvers. Reusing the
> Linux structures would allow to reuse bigger chunks of code,
> decreasing the amount of effort needed implement drivers in the virtio
> host.
>
> That said, the POSIX error code and reusing the Linux CAN structures
> are two different topics.
>
>>>> +/* CAN flags to determine type of CAN Id */
>>>> +#define VIRTIO_CAN_FLAGS_EXTENDED       0x8000
>>>> +#define VIRTIO_CAN_FLAGS_FD             0x4000
>>>> +#define VIRTIO_CAN_FLAGS_RTR            0x2000
>>>> +
>>>> +struct virtio_can_config {
>>>> +#define VIRTIO_CAN_S_CTRL_BUSOFF (1u << 0) /* Controller BusOff */
>>>> +       /* CAN controller status */
>>>> +       __le16 status;
>>>> +};
>>>> +
>>>> +/* TX queue message types */
>>>> +struct virtio_can_tx_out {
>>>> +#define VIRTIO_CAN_TX                   0x0001
>>>> +       __le16 msg_type;
>>>> +       __le16 length; /* 0..8 CC, 0..64 CAN-FD, 0..2048 CAN-XL, 12 bits */
>>>> +       __le32 reserved; /* May be needed in part for CAN XL priority */
>>>> +       __le32 flags;
>>>> +       __le32 can_id;
>>>> +       __u8 sdu[64];
>>>> +};
>>>> +
>>>> +struct virtio_can_tx_in {
>>>> +       __u8 result;
>>>> +};
>>>> +
>>>> +/* RX queue message types */
>>>> +struct virtio_can_rx {
>>>> +#define VIRTIO_CAN_RX                   0x0101
>>>> +       __le16 msg_type;
>>>> +       __le16 length; /* 0..8 CC, 0..64 CAN-FD, 0..2048 CAN-XL, 12 bits */
>>>> +       __le32 reserved; /* May be needed in part for CAN XL priority */
>>> Can we use this field to report the classical CAN DLC greater than 8?
>>> If also needed by CAN XL, the field can be turned into a union.
>> Classical CAN cannot have a DLC > 8. CAN FD has a length up to 64 bytes.
> No, The DLC is coded on four bits and ranges from 0 to 15 for both
> Classical CAN and CAN-FD.
>
> Please refer to:
>
>    commit ea7800565a12 ("can: add optional DLC element to Classical CAN
> frame structure")
>    Link:https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgit.kernel.org%2ftorvalds%2fc%2fea7800565a12&umid=0b3549ac-d010-4d96-a348-2702485d82af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-83a945dba88675fd8f98628601040c6860123b6d
>
> For more details, please refor to section 8.4.2.4 "DLC field" of ISO
> 11898-1:2015.
>
> I do believe that AUTOSAR do not allow Classical CAN frames with a DLC
> greater than 8, but the virtio implementation should support the ISO
> definitions.

I see now. The "length" in the more familiar (to me) AUTOSAR CAN_PduType 
is a "length" and not a "dlc". This detail with the DLC is ISO only and 
so I missed this. Not only therefore.

The CAN XL priority is 11 bits only and so fits in a __le16. Nobody else 
used an union so maybe:

__le32 reserved; =>

     __u8 reserved_classic_dlc; /* if CAN classic length = 8 then DLC 
can be 8..15 */
     __u8 padding;
     __le16 reserved_xl_priority; /* May be needed for CAN XL priority */

The member length is already __le16 to be able to put the 12 bits of 
2048 for CAN XL.

What made it more easy to miss is that also the cantools from Ubuntu 
20.04 do not support DLC > 8. The change in the cantools came in 
20-Nov-2020 by some commits from Oliver Hartkopp. Using newer cantools I 
can send and receive CAN frames with a DLC > 8 over vcan on my machine 
without having to hack the cantools. So the prerequisites to develop 
this feature seem to be available without having to hack cantools or vcan.

Now also not only a new CAN frame structure was introduced with 5.11.0 
but also CAN_CTRLMODE_CC_LEN8_DLC. CAN_CTRLMODE_CC_LEN8_DLC seems to be 
a feature if I understand this correctly. If we want to allow to use an 
AUTOSAR CAN driver as virtio device back end then also a new feature 
flag (e.g. VIRTIO_CAN_F_CLASSIC_DLC) would be needed.

If possible I would like to have this as a future feature only changing 
now the "reserved" field to make clear the planned purposes.

>> CAN XL into it.
>>
>> But for CAN XL we need anyway a more critical look from CAN XL experts
>> on the list. Here in the house there is already only fewer experience
>> with CAN FD in comparison with classic CAN but none at all with CAN XL.
>> Too new. If something is done in a stupid way we can define in the
>> future completely new messages as we have the msg_type. But if no
>> mistake is made now we can avoid this and enhancing things will be more
>> simple later. The RX and TX messages are really critical. Some bugs in
>> the software can be fixed easily. But if we define here something not
>> future proof this can only be addressed later in the spec with some more
>> effort.
> ACK.
>
>>>> +       __le32 flags;
>>>> +       __le32 can_id;
>>>> +       __u8 sdu[64];
>>>> +};
>>>> +
>>>> +/* Control queue message types */
>>>> +struct virtio_can_control_out {
>>>> +#define VIRTIO_CAN_SET_CTRL_MODE_START  0x0201
>>>> +#define VIRTIO_CAN_SET_CTRL_MODE_STOP   0x0202
>>>> +       __le16 msg_type;
>>>> +};
>>>> +
>>>> +struct virtio_can_control_in {
>>>> +       __u8 result;
>>>> +};
>>>> +
>>>> +#endif /* #ifndef _LINUX_VIRTIO_VIRTIO_CAN_H */
> Yours sincerely,
> Vincent Mailhol
>
Regards

Harald Mommer


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ