netdev - RE: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08FE5CC30C9A3F41BF819A502CF7BF6E0198249D@fmsmsx411.amr.corp.intel.com>
Date:	Fri, 6 Jul 2007 10:14:56 -0700
From:	"Williams, Mitch A" <mitch.a.williams@...el.com>
To:	"David Miller" <davem@...emloft.net>,
	<shemminger@...ux-foundation.org>
Cc:	<netdev@...r.kernel.org>
Subject: RE: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

David Miller wrote:
>> Okay, but then using SG lists makes skbuff's much bigger.
>>     
>> 	fraglist	scatterlist		   per skbuff
>> 32 bit	8		20		+12 * 18 = +216!
>> 64 bit	16		32		+16 * 18 = +288
>> 
>> So never mind...
>
>I know, this is why nobody ever really tries to tackle this.
>
>> I'll do a fraglist to scatter list set of routines, but not sure
>> if it's worth it.
>
>It's better to add dma_map_skb() et al. interfaces to be honest.
>
>Also even with the scatterlist idea, we'd still need to do two
>map calls, one for skb->data and one for the page vector.

FWIW, I tried this about a year ago to try to improve e1000 performance
on pSeries.  I was hoping to simplify the driver transmit code and make
IOMMU mapping easier.  This was on 2.6.16 or thereabouts.

Net result:  zilch.  No performance increase, no noticeable CPU
utilization
benefits.  Nothing.  So I dropped it.

Slightly off topic:
The real problem that I saw on pSeries is lock contention for the IOMMU.
It's architected with a single table per slot, which is great in that
two boards in separate slots won't have lock contention.  However, this
all goes out the window when you drop a quad-port gigabit adapter in
there.
The time spent waiting for the IOMMU table lock goes up exponentially
as you activate each additional port.

In my opinion, IOMMU table locking is the major issue with this type of
architecture.  Since both Intel and AMD are touting IOMMUs for virtual-
ization support, this is an issue that's going to need a lot of
scrutiny.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html