[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27407669-a580-482c-8c60-226b56562ce6@microchip.com>
Date: Wed, 4 Jun 2025 06:15:31 +0000
From: <Dharma.B@...rochip.com>
To: <dlechner@...libre.com>, <kamel.bouhara@...tlin.com>, <wbg@...nel.org>,
<Nicolas.Ferre@...rochip.com>, <alexandre.belloni@...tlin.com>,
<claudiu.beznea@...on.dev>
CC: <linux-arm-kernel@...ts.infradead.org>, <linux-iio@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] counter: microchip-tcb-capture: Add DMA support for
TC_RAB register reads
On 29/05/25 9:03 pm, David Lechner wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 5/28/25 1:13 AM, Dharma Balasubiramani wrote:
>> Add optional DMA-based data transfer support to read the TC_RAB register,
>> which provides the next unread captured value from either RA or RB. This
>> improves performance and offloads CPU when mchp,use-dma-cap is enabled in
>> the device tree.
>
> It looks like this is using DMA to read a single register in the implementation
> of a sysfs read. Do you have measurements to show the performance difference?
> I find it hard to believe that this would actually make a significant difference
> compared to the overhead of the read syscall to read the sysfs attribute.
>
Hi David,
Thanks for the feedback.
You're right — in our current testing setup, I didn't observe any
significant performance benefit from using DMA to read the TC_RAB
register via sysfs. I benchmarked both DMA-based and direct MMIO
register access using a userspace program generating high-frequency
capture events, and the overhead of the sysfs read path seems to
dominate in both cases.
Our initial motivation for using DMA was that the TCB IP in Microchip
SoCs includes optional DMA support specifically for capture value
transfers. I wanted to evaluate the potential benefit of offloading CPU
load when frequent capture events are occurring. However, in practice,
the complexity added (especially due to blocking behavior in atomic
contexts like watch) does not appear to be justified, at least via sysfs
or simple polling.
I also tried routing the DMA-based read through the
COUNTER_COMPONENT_EXTENSION watch path, but as you may expect, that
ended up hanging due to blocking behavior in non-sleepable contexts. So
that route seems unsuitable without a more complex asynchronous
buffering model.
Would you suggest exploring a different approach or a more appropriate
interface for DMA-based capture (e.g., via a dedicated ioctl or char
device with async support)? I’m happy to rework it if there's a suitable
context where DMA adds measurable value.
Thanks again for your review and time.
--
With Best Regards,
Dharma B.
Powered by blists - more mailing lists