lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Mar 2019 10:04:27 +0000
From:   Appana Durga Kedareswara Rao <appanad@...inx.com>
To:     Oliver Hartkopp <socketcan@...tkopp.net>,
        Dave Taht <dave@...t.net>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Andre Naujoks <nautsch2@...il.com>,
        "wg@...ndegger.com" <wg@...ndegger.com>,
        "mkl@...gutronix.de" <mkl@...gutronix.de>,
        "davem@...emloft.net" <davem@...emloft.net>
CC:     "linux-can@...r.kernel.org" <linux-can@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] net: can: Increase tx queue length

Hi All,

<Snip> 
> Hi all,
> 
> On 3/10/19 6:07 AM, Dave Taht wrote:
> > Toke Høiland-Jørgensen <toke@...hat.com> writes:
> >
> >> Appana Durga Kedareswara Rao <appanad@...inx.com> writes:
> >>
> >>> Hi Andre,
> >>>
> >>> <Snip>
> >>>>
> >>>> On 3/9/19 3:07 PM, Appana Durga Kedareswara rao wrote:
> >>>>> While stress testing the CAN interface on xilinx axi can in
> >>>>> loopback mode getting message "write: no buffer space available"
> >>>>> Increasing device tx queue length resolved the above mentioned issue.
> >>>>
> >>>> No need to patch the kernel:
> >>>>
> >>>> $ ip link set <dev-name> txqueuelen 500
> >>>>
> >>>> does the same thing.
> >>>
> >>> Thanks for the review...
> >>> Agree but it is not an out of box solution right??
> >>> Do you have any idea for socket can devices why the tx queue length
> >>> is 10 whereas for other network devices (ex: ethernet) it is 1000 ??
> >>
> >> Probably because you don't generally want a long queue adding latency
> >> on a CAN interface? The default 1000 is already way too much even for
> >> an Ethernet device in a lot of cases.
> >>
> >> If you get "out of buffer" errors it means your application is
> >> sending things faster than the receiver (or device) can handle them.
> >> If you solve this by increasing the queue length you are just
> >> papering over the underlying issue, and trading latency for fewer
> >> errors. This tradeoff
> >> *may* be appropriate for your particular application, but I can
> >> imagine it would not be appropriate as a default. Keeping the buffer
> >> size small allows errors to propagate up to the application, which
> >> can then back off, or do something smarter, as appropriate.
> >>
> >> I don't know anything about the actual discussions going on when the
> >> defaults were set, but I can imagine something along the lines of the
> >> above was probably a part of it :)
> >>
> >> -Toke
> >
> > In a related discussion, loud and often difficult, over here on the
> > can bus,
> >
> > https://github.com/systemd/systemd/issues/9194#issuecomment-
> 469403685
> >
> > we found that applying fq_codel as the default via sysctl qdisc a bad
> > idea for systems for at least one model of can device.
> >
> > If you scroll back on the bug, a good description of what the can
> > subsystem expects from the qdisc is therein - it mandates an in-order
> > fifo qdisc or no queue at all. the CAN protocol expects each packet to
> > be transmitted successfully or rejected, and if so, passes the error
> > up to userspace and is supposed to stop for further input.
> >
> > As this was the first serious bug ever reported against using fq_codel
> > as the default in 5+ years of systemd and 7 of openwrt deployment I've
> > been taking it very seriously. It's worse than just systemd - openwrt
> > patches out pfifo_fast entirely. pfifo_fast is the wrong qdisc - the
> > right choices are noqueue and possibly pfifo.
> >
> > However, the vcan device exposes noqueue, and so far it has been only
> > the one device ( a 8Devices socketcan USB2CAN ) that did not do this
> > in their driver that was misbehaving.
> >
> > Which was just corrected with a simple:
> >
> > static int usb_8dev_probe(struct usb_interface *intf,
> > 			 const struct usb_device_id *id)
> > {
> >       ...
> >       netdev->netdev_ops = &usb_8dev_netdev_ops;
> >
> >       netdev->flags |= IFF_ECHO; /* we support local echo */
> > +    netdev->priv_flags |= IFF_NO_QUEUE;
> >       ...
> > }
> >
> > and successfully tested on that bug report.
> >
> > So at the moment, my thought is that all can devices should default to
> > noqueue, if they are not already. I think a pfifo_fast and a qlen of
> > any size is the wrong thing, but I still don't know enough about what
> > other can devices do or did to be certain.
> >
> 
> Having about 10 elements in a CAN driver tx queue allows to work with
> queueing disciplines
> (http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf) and also to maintain a
> nearly real-time behaviour with outgoing traffic.
> 
> When the CAN interface is not able to cope with the (intened) outgoing traffic
> load, the applications should get an instant feedback about it.
> 
> There is a difference between running CAN applications in the real world and
> doing performance tests, where it makes sense to increase the tx-queue-len to
> e.g. 1000 and dump 1000 frames into the driver to check the hardware
> performance.

Thanks, Oliver, Martin, Andre, Toke, Dave for your inputs...
So to conclude this the default txqueuelen 10 is ideal for real-time CAN traffic,
For Stress/Performance tests user manually need to increase the txqueuelen based on his requirements.

Please correct me if my understanding is wrong. 

Regards,
Kedar.

> 
> Best regards,
> Oliver

Powered by blists - more mailing lists