netdev - RE: e1000: Question about polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <36D9DB17C6DE9E40B059440DB8D95F520474680C@orsmsx418.amr.corp.intel.com>
Date:	Wed, 20 Feb 2008 00:15:08 -0800
From:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
To:	"Badalian Vyacheslav" <slavon@...telecom.ru>,
	<netdev@...r.kernel.org>
Subject: RE: e1000: Question about polling

Badalian Vyacheslav wrote:
> Hello all.
> 
> Interesting think:
> 
> Have PC that do NAT. Bandwidth about 600 mbs.
> 
> Have  4 CPU (2xCoRe 2 DUO "HT OFF" 3.2 HZ).
> 
> irqbalance in kernel is off.
> 
> nat2 ~ # cat /proc/irq/217/smp_affinity
> 00000001
this binds all 217 irq interrupts to cpu 0

> nat2 ~ # cat /proc/irq/218/smp_affinity
> 00000003

do you mean to be balancing interrupts between core 1 and 2 here?
1 = cpu 0
2 = cpu 1
4 = cpu 2
8 = cpu 3

so 1+2 = 3 for irq 218, ie balancing between the two.

sometimes the cpus will have a paired cache, depending on your bios it
will be organized like cpu 0/2 = shared cache, and cput 1/3 = shared
cache.
you can find this out by looking at physical ID and CORE ID in
/proc/cpuinfo

> Load SI on CPU0 and CPU1 is about 90%
> 
> Good... try do
> echo ffffffff > /proc/irq/217/smp_affinity
> echo ffffffff > /proc/irq/218/smp_affinity
> 
> Get 100% SI at CPU0
> 
> Question Why?

because as each adapter generating interrupts gets rotated through cpu0,
it gets "stuck" on cpu0 because the napi scheduling can only run one at
a time, and so each is always waiting in line behind the other to run
its napi poll, always fills its quota (work_done is always != 0) and
keeps interrupts disabled "forever"

> I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30%
> perfomance... but i have 4 CPU... i must get more perfomance if i cat
> "ffffffff"  to smp_affinity.

only if your performance is not cache limited but cpu horsepower
limited.  you're sacrificing cache coherency for cpu power, but if that
works for you then great.

> picture looks liks this:
> 0-3 CPU get over 50% SI.... bandwith up.... 55% SI... bandwith up...
> 100% SI on CPU0....
> 
> I remember patch to fix problem like it... patched function
> e1000_clean...  kernel on pc have this patch (2.6.24-rc7-git2)...
> e1000 driver work much better (i up to 1.5-2x bandwidth before i get
> 100% SI), but i think that it not get 100% that it can =)

the patch helps a little because it decreases the amount of time the
driver spends in napi mode, basically shortening the exit condition
(which reenables interrupts, and therefore balancing) to work_done <
budget, not work_done == 0.

> Thanks for answers and sorry for my English

you basically can't get much more than one cpu can do for each nic.  its
possible to get a little more, but my guess is you won't get much.  The
best thing you can do is make sure as much traffic as possible stays in
the same cache, on two different cores.

you can try turning off NAPI mode either in the .config, or build the
sourceforge driver with CFLAGS_EXTRA=-DE1000_NO_NAPI,  which seems
counterintuitive, but with the non-napi e1000 pushing packets to the
backlog queue on each cpu, you may actually get better performance due
to the balancing.

some day soon (maybe) we'll have some coherent way to have one tx and rx
interrupt per core, and enough queues for each port to be able to handle
1 queue per core.

good luck,
  Jesse  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html