netdev - Re: [PATCH v2] tcp: splice as many packets as possible at once

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090204.011228.88323558.davem@davemloft.net>
Date:	Wed, 04 Feb 2009 01:12:28 -0800 (PST)
From:	David Miller <davem@...emloft.net>
To:	w@....eu
Cc:	zbr@...emap.net, herbert@...dor.apana.org.au, jarkao2@...il.com,
	dada1@...mosbay.com, ben@...s.com, mingo@...e.hu,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	jens.axboe@...cle.com
Subject: Re: [PATCH v2] tcp: splice as many packets as possible at once

From: Willy Tarreau <w@....eu>
Date: Wed, 4 Feb 2009 07:19:47 +0100

> On Tue, Feb 03, 2009 at 04:47:34PM -0800, David Miller wrote:
> > From: Willy Tarreau <w@....eu>
> > Date: Tue, 3 Feb 2009 13:25:35 +0100
> > 
> > > Well, FWIW, I've always observed better performance with 4k MTU (4080 to
> > > be precise) than with 9K, and I think that the overhead of allocating 3
> > > contiguous pages is a major reason for this.
> > 
> > With what hardware?  If it's with myri10ge, that driver uses page
> > frags so would not be using 3 contiguous pages even for jumbo frames.
> 
> Yes myri10ge for the optimal 4080, but with e1000 too (though I don't
> remember the exact optimal value, I think it was slightly lower).
> 
> For the myri10ge, could this be caused by the cache footprint then ?
> I can also retry with various values between 4 and 9k, including
> values close to 8k. Maybe the fact that 4k is better than 9 is
> because we get better filling of all pages ?

Looking quickly, myri10ge's buffer manager is incredibly simplistic so
it wastes a lot of memory and gives terrible cache behavior.

When using JUMBO MTU it just gives whole pages to the chip.

So it looks like, assuming 4096 byte PAGE_SIZE and 9000 byte
jumbo MTU, the chip will allocate for a full size frame:

	FULL PAGE
	FULL PAGE
	FULL PAGE

and only ~1K of that last full page will be utilized.

The headers will therefore always land on the same cache lines,
and PAGE_SIZE-~1K will be wasted.

Whereas for < PAGE_SIZE mtu selections, it will give MTU sized
blocks to the chip for packet data allocation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html