linux-kernel - Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code to use pre-existing APIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1cc70895-b520-4dde-971e-692041dfbcce@kernel.org>
Date: Tue, 5 Mar 2024 08:19:27 +0100
From: Jiri Slaby <jirislaby@...nel.org>
To: Rengarajan.S@...rochip.com, linux-serial@...r.kernel.org,
 gregkh@...uxfoundation.org, Kumaravel.Thiagarajan@...rochip.com,
 UNGLinuxDriver@...rochip.com, Tharunkumar.Pasumarthi@...rochip.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code
 to use pre-existing APIs

On 05. 03. 24, 5:15, Rengarajan.S@...rochip.com wrote:
> Hi Jiri,
> 
> On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote:
>> [Some people who received this message don't often get email from
>> jirislaby@...nel.org. Learn why this is important at
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>> know the content is safe
>>
>> On 04. 03. 24, 5:37, Rengarajan.S@...rochip.com wrote:
>>> Hi Jiri,
>>>
>>> On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>>> know the content is safe
>>>>
>>>> On 23. 02. 24, 10:21, Rengarajan.S@...rochip.com wrote:
>>>>> On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote:
>>>>>> EXTERNAL EMAIL: Do not click links or open attachments unless
>>>>>> you
>>>>>> know the content is safe
>>>>>>
>>>>>> On 22. 02. 24, 14:49, Rengarajan S wrote:
>>>>>>> Updated the TX Burst implementation by changing the
>>>>>>> circular
>>>>>>> buffer
>>>>>>> processing with the pre-existing APIs in kernel. Also
>>>>>>> updated
>>>>>>> conditional
>>>>>>> statements and alignment issues for better readability.
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> so why are you keeping the nested double loop?
>>>>>>
>>>>>
>>>>> Hi, in order to differentiate Burst mode handling with byte
>>>>> mode
>>>>> had
>>>>> seperate loops for both. Since, having single while loop also
>>>>> does
>>>>> not
>>>>> align with rx implementation (where we have seperate handling
>>>>> for
>>>>> burst
>>>>> and byte) have retained the double loop.
>>>>
>>>> So obviously, align RX to a single loop if possible. The current
>>>> TX
>>>> code
>>>> is very hard to follow and sort of unmaintainable (and buggy).
>>>> And
>>>> IMO
>>>> it's unnecessary as I proposed [1]. And even if RX cannot be one
>>>> loop,
>>>> you still can make TX easy to read as the two need not be the
>>>> same.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@kernel.org/
>>>
>>>
>>> while (data_empty_count) {
>>>      cnt = CIRC_CNT_TO_END();
>>>      if (!cnt)
>>>        break;
>>>      if (cnt < UART_BURST_SIZE || (tail & 3)) { // is_unaligned()
>>>        writeb();
>>>        cnt = 1;
>>>      } else {
>>>        writel()
>>>        cnt = UART_BURST_SIZE;
>>>      }
>>>      uart_xmit_advance(cnt);
>>>      data_empty_count -= cnt;
>>> }
>>>
>>> With the above implementation we are observing performance drop of
>>> 2
>>> Mbps at baud rate of 4 Mbps. The reason for this is the fact that
>>> for
>>> each iteration we are checking if the the data need to be processed
>>> via
>>> DWORDs or Bytes. The condition check for each iteration is causing
>>> the
>>> drop in performance.
>>
>> Hi,
>>
>> the check is by several orders of magnitude faster than the I/O
>> proper.
>> So I don't think that's the root cause.
>>
>>> With the previous implementation(with nested loops) the performance
>>> is
>>> found to be around 4 Mbps at baud rate of 4 Mbps. In that
>>> implementation we handle sending DWORDs continuosly until the
>>> transfer
>>> size < 4. Can you let us know any other alternatives for the above
>>> performance drop.
>>
>> Could you attach the patch you are testing?
> 
> Please find the updated pci1xxxx_process_write_data
> 
> 	u32 xfer_cnt;
> 
>          while (*valid_byte_count) {
>                  xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail,
>                                             UART_XMIT_SIZE);
> 
>                  if (!xfer_cnt)
>                          break;
> 
>                  if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & 3)) {

Hi,

OK, is it different if you remove the alignment checking (which should 
be correct™ thing to do, but may/will slow down things on platforms 
which don't care)?

>                          writeb(xmit->buf[xmit->tail], port->membase +
>                                 UART_TX_BYTE_FIFO);
>                          xfer_cnt = UART_BYTE_SIZE;
>                  } else {
>                          writel(*(u32 *)&xmit->buf[xmit->tail],

If you remove the "tail & 3" check, you can use get_unaligned() here and 
need not care about unaligned accesses after all...

>                                 port->membase + UART_TX_BURST_FIFO);
>                          xfer_cnt = UART_BURST_SIZE;
>                  }
> 
>                  uart_xmit_advance(port, xfer_cnt);
>                  *data_empty_count -= xfer_cnt;
>                  *valid_byte_count -= xfer_cnt;
>          }
> 
> Testing is done via minicom by transferring a 10 MB file at 4 Mbps,
> 
> After the minicom transfer with single instance:
> 
> Previous implementation(Nested While Loops):
> Transferred 10 MB at 3900000 CPS
> 
> Current implementation:
> Transferred 10 MB at 2459999 CPS



-- 
js
suse labs