netdev - Re[2]: htb parallelism on multi-core platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 30 Apr 2009 02:00:53 +0300
From:	Calin Velea <calin.velea@...enii.ro>
To:	Calin Velea <vcalinus@...enii.ro>
CC:	Radu Rendec <radu.rendec@...s.ro>,
	Jarek Poplawski <jarkao2@...il.com>,
	Jesper Dangaard Brouer <hawk@...u.dk>,
	Denys Fedoryschenko <denys@...p.net.lb>,
	netdev <netdev@...r.kernel.org>
Subject: Re[2]: htb parallelism on multi-core platforms

Hello Calin,

Thursday, April 30, 2009, 1:49:46 AM, you wrote:

> Wednesday, April 29, 2009, 7:21:11 PM, you wrote:

>> I finally managed to disable NAPI on e1000e - apparently it can only be
>> done on the "official" Intel driver (downloaded from their website), by
>> compiling with "make CFLAGS_EXTRA=-DE1000E_NO_NAPI". This doesn't seem
>> to be available in the (2.6.29) kernel driver.

>> With NAPI disabled, 4 (of 8) cores go to 100% (instead of only one), but
>> overall throughput *decreases* from ~110K pps (with NAPI) to ~80K pps.
>> This makes sense, since h/w interrupt is much more time consuming than
>> polling (that's the whole idea behind NAPI anyway).

>> Radu Rendec

>    I tested with e1000 only, on a single quad-core CPU - the L2 cache was
> shared between the cores.

>   For 8 cores I suppose you have 2 quad-core CPUs. If the cores actually
> used belong to different physical CPUs, L2 cache sharing does not occur -
> maybe this could explain the performance drop in your case.
>   Or there may be other explanation...


>   Anyway - coming back to David Miller's words:

> "HTB acts upon global state, so anything that goes into a particular 
> device's HTB ruleset is going to be single threaded. 
> There really isn't any way around this. "

>   It could be the only way to get more power is to increase the number
> of devices where you are shaping. You could split the IP space into 4 groups
> and direct the trafic to 4 IMQ devices with 4 iptables rules -

> -d 0.0.0.0/2 -j IMQ --todev imq0,
> -d 64.0.0.0/2 -j IMQ --todev imq1, etc...

> Or you can customize the split depeding on the traffic distribution.
> ipset nethash match can also be used.


>  The 4 devices can have the same htb ruleset, only the right parts 
> of it will match.
>   You should test with 4 flows that use all the devices simultaneously and
> see what is the aggregate throughput.


>   The performance gained through parallelism might be a lot higher than the
> added overhead of iptables and/or ipset nethash match. Anyway - this is more of
> a "hack" than a clean solution :)


> p.s.: latest IMQ at http://www.linuximq.net/ is for 2.6.26 so you will need to try with that


  You will also need -i ethX (router), or -m physdev --physdev-in ethX
(bridge) to differentiate between upload and download in the iptables rules.


-- 
Best regards,
 Calin                            mailto:calin.velea@...enii.ro

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html