lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27407669-a580-482c-8c60-226b56562ce6@microchip.com>
Date: Wed, 4 Jun 2025 06:15:31 +0000
From: <Dharma.B@...rochip.com>
To: <dlechner@...libre.com>, <kamel.bouhara@...tlin.com>, <wbg@...nel.org>,
	<Nicolas.Ferre@...rochip.com>, <alexandre.belloni@...tlin.com>,
	<claudiu.beznea@...on.dev>
CC: <linux-arm-kernel@...ts.infradead.org>, <linux-iio@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] counter: microchip-tcb-capture: Add DMA support for
 TC_RAB register reads

On 29/05/25 9:03 pm, David Lechner wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 5/28/25 1:13 AM, Dharma Balasubiramani wrote:
>> Add optional DMA-based data transfer support to read the TC_RAB register,
>> which provides the next unread captured value from either RA or RB. This
>> improves performance and offloads CPU when mchp,use-dma-cap is enabled in
>> the device tree.
> 
> It looks like this is using DMA to read a single register in the implementation
> of a sysfs read. Do you have measurements to show the performance difference?
> I find it hard to believe that this would actually make a significant difference
> compared to the overhead of the read syscall to read the sysfs attribute.
> 
Hi David,

Thanks for the feedback.

You're right — in our current testing setup, I didn't observe any 
significant performance benefit from using DMA to read the TC_RAB 
register via sysfs. I benchmarked both DMA-based and direct MMIO 
register access using a userspace program generating high-frequency 
capture events, and the overhead of the sysfs read path seems to 
dominate in both cases.

Our initial motivation for using DMA was that the TCB IP in Microchip 
SoCs includes optional DMA support specifically for capture value 
transfers. I wanted to evaluate the potential benefit of offloading CPU 
load when frequent capture events are occurring. However, in practice, 
the complexity added (especially due to blocking behavior in atomic 
contexts like watch) does not appear to be justified, at least via sysfs 
or simple polling.

I also tried routing the DMA-based read through the 
COUNTER_COMPONENT_EXTENSION watch path, but as you may expect, that 
ended up hanging due to blocking behavior in non-sleepable contexts. So 
that route seems unsuitable without a more complex asynchronous 
buffering model.

Would you suggest exploring a different approach or a more appropriate 
interface for DMA-based capture (e.g., via a dedicated ioctl or char 
device with async support)? I’m happy to rework it if there's a suitable 
context where DMA adds measurable value.

Thanks again for your review and time.

-- 
With Best Regards,
Dharma B.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ