lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 23 Feb 2007 20:51:01 +0200
From:	Gergely Imre <imre.gergely@...ral.ro>
To:	Arjan van de Ven <arjan@...radead.org>
CC:	linux-kernel@...r.kernel.org, hancockr@...w.ca
Subject: Re: irq balancing question


Arjan van de Ven wrote:
>> in fact i have two cards, and 4 CPUs, but i was interested in then answer
>> Robert gave, that only _some_ machines distribute interrupts in hardware.
>> software distribution is obviously not good. consider this scenario:
>>
>> you have one machine with 4 cpus, and two ethernet cards with a lot of
>> traffic on them. if you bind every card to one cpu, two of them are not used,
>> so you really use only half the power. not let's say you have so much traffic
>> (with limiting enabled, htb or something), that the two CPUs are on 100% all
>> the time, but the other two are doing nothing.
>>
>> now if you could balance that to all 4 cpus, you could use all the power AND
>> no cpu would be used 100%.
> 
> actually this will give you worse performance than only using 2 cores.
> The reason for this is twofold
> 1) If you rotate the irqs, TCP and IP packet fragments will arrive at
> different CPUs. This in turn means that a VERY expensive reassembly path
> gets taken, compared to local-cpu-only reassembly
> 2) If you rotate the irqs, you bounce cachelines between the caches ALL
> THE TIME, which is also very expensive.
> 
> Both make it more likely that you'll be slower than just using only 2
> cores...

and i guess it doesn't matter if the distribution is being done by the
hardware, from the point of view of the kernel, i would still get the
performance penalty.

and what if CPU0 and CPU1 is actually the same CPU, only duo core, and i'm
distributing interrupts to them, and with the other card to CPU2 and 3 (which
are part of the other physical CPU) ?

i'm just trying to figure it out, i have no real knowledge of the inner
kernel workings, so i dont know. but i really would like to use all 4 cores.
just how expensive is that reassembly path ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ