netdev - Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 30 Oct 2012 12:53:10 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Ian Campbell <Ian.Campbell@...rix.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	"xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>
Subject: Re: [PATCH] net: allow configuration of the size of page in
 __netdev_alloc_frag

On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote:
> On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:
> > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:
> 
> > > If you really have such problems, why locally generated TCP traffic
> > > doesnt also have it ?
> > 
> > I think it does. The reason I noticed the original problem was that ssh
> > to the machine was virtually (no pun intended) unusable.
> > 
> > > Your patch doesnt touch sk_page_frag_refill(), does it ?
> > 
> > That's right. It doesn't. When is (sk->sk_allocation & __GFP_WAIT) true?
> > Is it possible I'm just not hitting that case?
> > 
> 
> I hope not. GFP_KERNEL has __GFP_WAIT.
> 
> > Is it possible that this only affects certain traffic patterns (I only
> > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is
> > only broken in one corner case and not the other.
> 
> Could you try a netperf -t TCP_STREAM ?

For fun I did a couple of tests - I setup two machines (one r8168, the other
e1000e) and tried to do netperf/netserver. Both of them are running a baremetal
kernel and one of them has 'iommu=soft swiotlb=force' to simulate the worst
case. This is using v3.7-rc3.

The r8169 is booted without any arguments, the e1000e is using 'iommu=soft
swiotlb=force'.

So r8169 -> e1000e, I get ~940 (this is odd, I expected that the e1000e
on the recv side would be using the bounce buffer, but then I realized it
sets up using pci_alloc_coherent an 'dma' pool).

The other way - e1000e -> r8169 got me around ~128. So it is the sending
side that ends up using the bounce buffer and it slows down considerably.

I also swapped the machine that had e1000e with a tg3 - and got around
the same numbers.

So all of this points to the swiotlb and to just make sure that nothing
was amiss I wrote a little driver that would allocate a compound page,
setup DMA mapping, do some writes, sync and unmap the DMA page. And it works
correctly - so swiotlb (and the xen variant) work right just right.
Attached for your fun.

Then I decided to try v3.6.3, with the same exact parameters.. and
the problem went away.

The e1000e -> r8169 which got me around ~128, now gets ~940! Still
using the swiotlb bounce buffer.

> 
> Because ssh use small packets, and small TCP packets dont use frags but
> skb->head.
> 
> You mentioned a 70% drop of performance, but what test have you used
> exactly ?

Note, I did not provide any arguments to netperf, but it did pick the
test you wanted:

> netperf -H tst019
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to tst019.dumpdata.com (192.168.101.39) port 0 AF_INET

> 
> 

View attachment "dma_test.c" of type "text/plain" (5699 bytes)