netdev - Re: [RFC] New driver API to speed up small packets xmits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <p73wszfenni.fsf@bingen.suse.de>
Date:	11 May 2007 11:05:05 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Krishna Kumar <krkumar2@...ibm.com>
Cc:	netdev@...r.kernel.org, Krishna Kumar <krkumar2@...ibm.com>
Subject: Re: [RFC] New driver API to speed up small packets xmits

Krishna Kumar <krkumar2@...ibm.com> writes:

> Doing some measurements, I found that for small packets like 128 bytes,
> the bandwidth is approximately 60% of the line speed. To possibly speed
> up performance of small packet xmits, a method of "linking" skbs was
> thought of - where two pointers (skb_flink/blink) is added to the skb.

You don't need that. You can just use the normal next/prev pointers.
In general it's a good idea to lower lock overhead etc., the VM has
used similar tricks very successfully in the past.

There were some thoughts about this earlier, but in highend
NICs the direction instead seems to go towards LRO (large receive offloading). 

LRO is basically like TSO, just for receiving. The NIC aggregates
multiple packets into a single larger one that is then processed by
the stack as one skb. This typically doesn't use linked lists, but an
array of pages.

Your scheme would help old NICs that don't have this optimization.
Might be a worth goal, although people often seem to be more interested
in modern hardware.

Another problem is that this setup typically requires the aggregate
packets to be from the same connection. Otherwise you will only
safe a short trip into the stack until the linked packet would need
to be split again to pass to multiple sockets. With that the scheme
probably helps much less.

The hardware schemes typically use at least some kind of hash to
aggregiate connections You might need to implement something similar
too if it doesn't save enough time.  Don't know if it would be very
efficient in software.

Or you could do this only if multiple packets belong to the same
single connection (basically with a one hit cache); but then it would
smell a bit like a benchmark hack.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html