lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160309105807.GO8418@lukather>
Date:	Wed, 9 Mar 2016 11:58:07 +0100
From:	Maxime Ripard <maxime.ripard@...e-electrons.com>
To:	Vinod Koul <vinod.koul@...el.com>
Cc:	Hans de Goede <hdegoede@...hat.com>,
	Boris Brezillon <boris.brezillon@...e-electrons.com>,
	Dan Williams <dan.j.williams@...el.com>,
	dmaengine@...r.kernel.org, Chen-Yu Tsai <wens@...e.org>,
	linux-sunxi@...glegroups.com,
	Emilio López <emilio@...pez.com.ar>,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [linux-sunxi] Re: [PATCH] dma: sun4i: expose block size and wait
 cycle configuration to DMA users

On Tue, Mar 08, 2016 at 03:35:38PM +0530, Vinod Koul wrote:
> On Tue, Mar 08, 2016 at 09:42:31AM +0100, Hans de Goede wrote:
> > <wild speculation>
> > 
> > I see 2 possible reasons why waiting till checking for drq can help:
> > 
> > 1) A lot of devices have an internal fifo hooked up to a single mmio data
> > register which gets read using the general purpose dma-engine, it allows
> > this fifo to fill, and thus do burst transfers
> > (We've seen similar issues with the scanout engine for the display which
> >  has its own dma engine, and doing larger transfers helps a lot).
> > 
> > 2) Physical memory on the sunxi SoCs is (often) divided into banks
> > with a shared data / address bus doing bank-switches is expensive, so
> > this wait cycles may introduce latency which allows a user of another
> > bank to complete its RAM accesses before the dma engine forces a
> > bank switch, which ends up avoiding a lot of (interleaved) bank switches
> > while both try to access a different banj and thus waiting makes things
> > (much) faster in the end (again a known problem with the display
> > scanout engine).
> > 
> > </wild speculation>
> > 
> > Note the differences these kinda tweaks make can be quite dramatic,
> > when using a 1920x1080p60 hdmi output on the A10 SoC with a 16 bit
> > memory bus (real world worst case scenario), the memory bandwidth
> > left for userspace processes (measured through memset) almost doubles
> > from 48 MB/s to 85 MB/s, source:
> > http://ssvb.github.io/2014/11/11/revisiting-fullhd-x11-desktop-performance-of-the-allwinner-a10.html
> > 
> > TL;DR: Waiting before starting DMA allows for doing larger burst
> > transfers which ends up making things more efficient.
> > 
> > Given this, I really expect there to be other dma-engines which
> > have some option to wait a bit before starting/unpausing a transfer
> > instead of starting it as soon as (more) data is available, so I think
> > this would make a good addition to dma_slave_config.
> 
> I tend to agree but before we do that I would like this hypothesis to be
> confirmed :)

We can't confirm it, we don't have access to any documentation that
might explain what this is about.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ