linux-kernel - Re: [PATCH 2/2] counter: microchip-tcb-capture: Add DMA support for TC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27407669-a580-482c-8c60-226b56562ce6@microchip.com>
Date: Wed, 4 Jun 2025 06:15:31 +0000
From: <Dharma.B@...rochip.com>
To: <dlechner@...libre.com>, <kamel.bouhara@...tlin.com>, <wbg@...nel.org>,
	<Nicolas.Ferre@...rochip.com>, <alexandre.belloni@...tlin.com>,
	<claudiu.beznea@...on.dev>
CC: <linux-arm-kernel@...ts.infradead.org>, <linux-iio@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] counter: microchip-tcb-capture: Add DMA support for
 TC_RAB register reads

On 29/05/25 9:03 pm, David Lechner wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 5/28/25 1:13 AM, Dharma Balasubiramani wrote:
>> Add optional DMA-based data transfer support to read the TC_RAB register,
>> which provides the next unread captured value from either RA or RB. This
>> improves performance and offloads CPU when mchp,use-dma-cap is enabled in
>> the device tree.
> 
> It looks like this is using DMA to read a single register in the implementation
> of a sysfs read. Do you have measurements to show the performance difference?
> I find it hard to believe that this would actually make a significant difference
> compared to the overhead of the read syscall to read the sysfs attribute.
> 
Hi David,

Thanks for the feedback.

You're right — in our current testing setup, I didn't observe any 
significant performance benefit from using DMA to read the TC_RAB 
register via sysfs. I benchmarked both DMA-based and direct MMIO 
register access using a userspace program generating high-frequency 
capture events, and the overhead of the sysfs read path seems to 
dominate in both cases.

Our initial motivation for using DMA was that the TCB IP in Microchip 
SoCs includes optional DMA support specifically for capture value 
transfers. I wanted to evaluate the potential benefit of offloading CPU 
load when frequent capture events are occurring. However, in practice, 
the complexity added (especially due to blocking behavior in atomic 
contexts like watch) does not appear to be justified, at least via sysfs 
or simple polling.

I also tried routing the DMA-based read through the 
COUNTER_COMPONENT_EXTENSION watch path, but as you may expect, that 
ended up hanging due to blocking behavior in non-sleepable contexts. So 
that route seems unsuitable without a more complex asynchronous 
buffering model.

Would you suggest exploring a different approach or a more appropriate 
interface for DMA-based capture (e.g., via a dedicated ioctl or char 
device with async support)? I’m happy to rework it if there's a suitable 
context where DMA adds measurable value.

Thanks again for your review and time.

-- 
With Best Regards,
Dharma B.