netdev - Re: Using ethernet device as efficient small packet generator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1293005302.4317.19.camel@edumazet-laptop>
Date:	Wed, 22 Dec 2010 09:08:22 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	juice@...gman.org
Cc:	Stephen Hemminger <shemminger@...tta.com>, netdev@...r.kernel.org
Subject: Re: Using ethernet device as efficient small packet generator

Le mercredi 22 décembre 2010 à 09:30 +0200, juice a écrit :
> > On Tue, 21 Dec 2010 11:56:42 +0200 shemminger wrote:
> > I regularly get full 1G line rate of 64 byte packets using old Opteron
> box and pktgen.  It does require some tuning of IRQ's and interrupt
> mitigation but
> > no patches. Did you remember to do the basic stuff like setting IRQ
> affinity
> > and not enabling debugging or tracing in the kernel? This is on sky2,
> but
> > also using e1000 and tg3. Others have reported 7M packets per second
> over
> > 10G cards.
> > The r8169 hardware is low end consumer hardware and doesn't work as
> well.
> > It is possible to get close to 1G line rate forwarding with a single
> core
> > with current
> > generation processors. Actual rate depends on hardware and configuration
> (size of route
> > table, firewalling, etc).  Much better performance with multi-queue
> hardware to spread load
> > over multiple cores.
> 
> I did my testing on two kinds of boxes we use in our lab, an older Pomi
> Supermicro with e1000 and a newer Dell T3500 with tg3 and r8169.
> Both computers have dual-core 2.4G Xeon Cpus, but with somewhat different
> model and stepping.
> Both boxes are running the same OS, Ubuntu 2.6.32-26-generic #48.
> 

Hmm, might be better with 10.10 ubuntu, with 2.6.35 kernels

> Could you share some information on the required interrupt tuning? It
> would certainly be easiest if the full line rate can be achieved without
> any patching of drivers or hindering normal eth/ip interface operation.
> 

Thats pretty easy.

Say your card has 8 queues, do :

echo 01 >/proc/irq/*/eth1-fp-0/../smp_affinity
echo 02 >/proc/irq/*/eth1-fp-1/../smp_affinity
echo 04 >/proc/irq/*/eth1-fp-2/../smp_affinity
echo 08 >/proc/irq/*/eth1-fp-3/../smp_affinity
echo 10 >/proc/irq/*/eth1-fp-4/../smp_affinity
echo 20 >/proc/irq/*/eth1-fp-5/../smp_affinity
echo 40 >/proc/irq/*/eth1-fp-6/../smp_affinity
echo 80 >/proc/irq/*/eth1-fp-7/../smp_affinity

Then, start your pktgen threads on each queue, so that TX completion IRQ
are run on same CPU.

I confirm getting 6Mpps (or more) out of the box is OK.

I did it one year ago on ixgbe, no patches needed.

With recent kernels, it should even be faster.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html