[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1cc70895-b520-4dde-971e-692041dfbcce@kernel.org>
Date: Tue, 5 Mar 2024 08:19:27 +0100
From: Jiri Slaby <jirislaby@...nel.org>
To: Rengarajan.S@...rochip.com, linux-serial@...r.kernel.org,
gregkh@...uxfoundation.org, Kumaravel.Thiagarajan@...rochip.com,
UNGLinuxDriver@...rochip.com, Tharunkumar.Pasumarthi@...rochip.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code
to use pre-existing APIs
On 05. 03. 24, 5:15, Rengarajan.S@...rochip.com wrote:
> Hi Jiri,
>
> On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote:
>> [Some people who received this message don't often get email from
>> jirislaby@...nel.org. Learn why this is important at
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>> know the content is safe
>>
>> On 04. 03. 24, 5:37, Rengarajan.S@...rochip.com wrote:
>>> Hi Jiri,
>>>
>>> On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you
>>>> know the content is safe
>>>>
>>>> On 23. 02. 24, 10:21, Rengarajan.S@...rochip.com wrote:
>>>>> On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote:
>>>>>> EXTERNAL EMAIL: Do not click links or open attachments unless
>>>>>> you
>>>>>> know the content is safe
>>>>>>
>>>>>> On 22. 02. 24, 14:49, Rengarajan S wrote:
>>>>>>> Updated the TX Burst implementation by changing the
>>>>>>> circular
>>>>>>> buffer
>>>>>>> processing with the pre-existing APIs in kernel. Also
>>>>>>> updated
>>>>>>> conditional
>>>>>>> statements and alignment issues for better readability.
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> so why are you keeping the nested double loop?
>>>>>>
>>>>>
>>>>> Hi, in order to differentiate Burst mode handling with byte
>>>>> mode
>>>>> had
>>>>> seperate loops for both. Since, having single while loop also
>>>>> does
>>>>> not
>>>>> align with rx implementation (where we have seperate handling
>>>>> for
>>>>> burst
>>>>> and byte) have retained the double loop.
>>>>
>>>> So obviously, align RX to a single loop if possible. The current
>>>> TX
>>>> code
>>>> is very hard to follow and sort of unmaintainable (and buggy).
>>>> And
>>>> IMO
>>>> it's unnecessary as I proposed [1]. And even if RX cannot be one
>>>> loop,
>>>> you still can make TX easy to read as the two need not be the
>>>> same.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@kernel.org/
>>>
>>>
>>> while (data_empty_count) {
>>> cnt = CIRC_CNT_TO_END();
>>> if (!cnt)
>>> break;
>>> if (cnt < UART_BURST_SIZE || (tail & 3)) { // is_unaligned()
>>> writeb();
>>> cnt = 1;
>>> } else {
>>> writel()
>>> cnt = UART_BURST_SIZE;
>>> }
>>> uart_xmit_advance(cnt);
>>> data_empty_count -= cnt;
>>> }
>>>
>>> With the above implementation we are observing performance drop of
>>> 2
>>> Mbps at baud rate of 4 Mbps. The reason for this is the fact that
>>> for
>>> each iteration we are checking if the the data need to be processed
>>> via
>>> DWORDs or Bytes. The condition check for each iteration is causing
>>> the
>>> drop in performance.
>>
>> Hi,
>>
>> the check is by several orders of magnitude faster than the I/O
>> proper.
>> So I don't think that's the root cause.
>>
>>> With the previous implementation(with nested loops) the performance
>>> is
>>> found to be around 4 Mbps at baud rate of 4 Mbps. In that
>>> implementation we handle sending DWORDs continuosly until the
>>> transfer
>>> size < 4. Can you let us know any other alternatives for the above
>>> performance drop.
>>
>> Could you attach the patch you are testing?
>
> Please find the updated pci1xxxx_process_write_data
>
> u32 xfer_cnt;
>
> while (*valid_byte_count) {
> xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail,
> UART_XMIT_SIZE);
>
> if (!xfer_cnt)
> break;
>
> if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & 3)) {
Hi,
OK, is it different if you remove the alignment checking (which should
be correct™ thing to do, but may/will slow down things on platforms
which don't care)?
> writeb(xmit->buf[xmit->tail], port->membase +
> UART_TX_BYTE_FIFO);
> xfer_cnt = UART_BYTE_SIZE;
> } else {
> writel(*(u32 *)&xmit->buf[xmit->tail],
If you remove the "tail & 3" check, you can use get_unaligned() here and
need not care about unaligned accesses after all...
> port->membase + UART_TX_BURST_FIFO);
> xfer_cnt = UART_BURST_SIZE;
> }
>
> uart_xmit_advance(port, xfer_cnt);
> *data_empty_count -= xfer_cnt;
> *valid_byte_count -= xfer_cnt;
> }
>
> Testing is done via minicom by transferring a 10 MB file at 4 Mbps,
>
> After the minicom transfer with single instance:
>
> Previous implementation(Nested While Loops):
> Transferred 10 MB at 3900000 CPS
>
> Current implementation:
> Transferred 10 MB at 2459999 CPS
--
js
suse labs
Powered by blists - more mailing lists