lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Oct 2006 08:47:47 -0700
From:	Geoff Levand <geoffrey.levand@...sony.com>
To:	Linas Vepstas <linas@...tin.ibm.com>
CC:	jschopp <jschopp@...tin.ibm.com>, akpm@...l.org, jeff@...zik.org,
	Arnd Bergmann <arnd@...db.de>, netdev@...r.kernel.org,
	James K Lewis <jklewis@...ibm.com>,
	linux-kernel@...r.kernel.org, linuxppc-dev@...abs.org
Subject: Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

Linas Vepstas wrote:
> On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote:
>> > Linas Vepstas wrote:
>> >> The current driver code performs 512 DMA mappns of a bunch of 
>> >> 32-byte structures. This is silly, as they are all in contiguous 
>> >> memory. Ths patch changes the code to DMA map the entie area
>> >> with just one call.
>> 
>> Linas, 
>> 
>> Is the motivation for this change to improve performance by reducing the overhead
>> of the mapping calls?  
> 
> Yes.
> 
>> If so, there may be some benefit for some systems.  Could
>> you please elaborate?
> 
> I started writingthe patch thinking it will have some huge effect on
> performance, based on a false assumption on how i/o was done on this
> machine
> 
> *If* this were another pSeries system, then each call to 
> pci_map_single() chews up an actual hardware "translation 
> control entry" (TCE) that maps pci bus addresses into 
> system RAM addresses. These are somewhat limited resources,
> and so one shouldn't squander them.  Furthermore, I thouhght
> TCE's have TLB's associated with them (similar to how virtual
> memory page tables are backed by hardware page TLB's), of which 
> there are even less of. I was thinking that TLB thrashing would 
> have a big hit on performance. 
> 
> Turns out that there was no difference to performance at all, 
> and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
> made it clear why: there's no fancy i/o address mapping.

OK, thanks for the explanation.  Actually, the current cell DMA mapping
implementation uses a simple 'linear' mapping, in that, all of RAM is
mapped into the bus DMA address space at once, and in fact, it is all
just done at system startup.

There is ongoing work to implement 'dynamic' mapping, where DMA pages are
mapped into the bus DMA address space on demand.  I think a key point to
understand the benefit to this is that the cell processor's I/O controller
maps pages per device, so you can map one DMA page to one device.  I
currently have this working for my platform, but have not released that
work.  There is some overhead to managing the mapped buffers and to request
pages be mapped by the hypervisor, etc., so I was thinking that is this work
of yours to consolidate the memory buffers prior to requesting the mapping
could be of benefit if it was in an often executed code path.

-Geoff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ