linux-kernel - Re: Serial data loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200407091611.GA657631@kroah.com>
Date:   Tue, 7 Apr 2020 11:16:11 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     gianluca <gianlucarenzi@...ek.it>
Cc:     jslaby@...e.com, linux-serial@...r.kernel.org,
        linux-kernel@...r.kernel.org, Gianluca Renzi <icjtqr@...il.com>,
        dimka@...eddedalley.com, linux@...pel-privat.de
Subject: Re: Serial data loss

On Tue, Apr 07, 2020 at 11:01:08AM +0200, gianluca wrote:
> On 04/07/2020 10:24 AM, Greg Kroah-Hartman wrote:
> > On Tue, Apr 07, 2020 at 09:30:21AM +0200, gianluca wrote:
> > > I have a BIG trouble having dataloss when using two internal serial ports of
> > > my boards based on NXP/FreeScale iMX28 SoC ARMv5Te ARM920ej-s architecture.
> > > 
> > > It runs at 454Mhz.
> > > 
> > > Kernel used 4.9.x
> > 
> > That's a very old kernel, you are going to have to get support for that
> > from the vendor you bought it from :(
> > 
> 
> We are the vendor. ;-)

Good luck!  :)

> Jokes apart, I can try to use the latest kernel 5.6, and see how is going on
> them, but at the first check the driver seems exactly the same as in kernel
> 4.9.
> 
> > > When using my test case unit software between two serial ports connect each
> > > other by a null modem cable, it fails when the speed rate are different,
> > 
> > Of course, how would that work?
> > 
> 
> I am not native english speaker so I am misleading to a misunderstanding: my
> test case is a software with two pthreads which the main thread is working
> with a differnet baud rate than the other pthread. Using the same software
> in two different machines, and using the same baudrate for each
> corrispondant port it should work.
> 
> i.e. /dev/ttyAPP1 is running at 9600 and /dev/ttyAPP2 is running at 38400
> 
> The same in the other machine. Both ports are null-modem connected:
> 
> 	9600  /dev/ttyAPP1 <----> /dev/ttyAPP1 9600
> 	38400 /dev/ttyAPP2 <----> /dev/ttyAPP2 38400
> 
> I hope to be clear now. ;-)

Ok, yes, that makes more sense now, thank you.

> > > and
> > > dataloss is increasing higher the speed rate.
> > 
> > What type of flow control are you using?
> > 
> 
> Unfortunately no flow control. Because the I cannot use it. When connected
> to the real-hardware those two ports are connected to a microcontroller unit
> which does not have flow control, only RX & TX connected (i.e. no
> RTS/CTS/DTE/DCE lines)

Then you are going to have problems, that is exactly what flow control
was designed for.  To ignore that is to have problems.

Also, there is software flow control when you do not have any control
lines.  This "issue" was solved decades ago :)

> > > I am talking about the high latency from the IRQ up to the service routine
> > > when flushing the FIFO and another IRQ is called by another uart in the same
> > > time at different speed.
> > > 
> > > The code I was looking is: drivers/tty/serial/mxs-auart.c __but__ all other
> > > serial drivers are acting in the same way: they are reading one character at
> > > time from the FIFO (if it exists) and put it into the circular buffer so
> > > serial/tty driver can pass them to the user read routine.
> > > 
> > > Each function call has some overhead and it is time-consuming, and if
> > > another interrupt is invoked by the same UART Core but from another serial
> > > port (different context) the continuos insertion done by hardware UART into
> > > the FIFO cannot be served fast enough to have an overrun. I think this can
> > > be applied __almost__ to every serial driver as they are written in the same
> > > way.
> > > 
> > > And it is __NOT__ an issue because of the CPU and its speed! Using two
> > > serial converter (FTDI and Prolific PL2303 based) on each board, the problem
> > > does not appear at all even after 24 hours running at more than 115200!!!
> > 
> > usb-serial devices are totally different and send data to the host in a
> > completly different way.
> > 
> > Your hardware might just not be able to handle really high baud rates at
> > a continous stream, what baud rate were you using?
> > 
> 
> I suppose that, but the same issue can be proven with all single core (NO
> FIFO UART) processors using two ports on the same uart core, running Linux
> kernel @ 450 Mhz or less.
> 
> The irq latency it is the same.

Again, usb-serial devices do not use a uart on the host, so they have a
totally different design and code flow.

> > And again, this is what flow control was designed for, please use it.
> > 
> 
> I know and usually I am using a sort of protocol which can check correctness
> of packet, and if not, the packet has to be reasked/resent.
> In this case the microcontroller board I am connected to is not built by us,
> and the software is a custom protocol (and I do not know if an error on
> transfer can be accomplished by another request).
> 
> So the flow control __CANNOT_BE_USED_AT_ALL__...

Then that is a design mistake, please fix that.

good luck!

greg k-h