netdev - Re: htb parallelism on multi-core platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <395864833.20090430014946@gemenii.ro>
Date:	Thu, 30 Apr 2009 01:49:46 +0300
From:	Calin Velea <calin.velea@...enii.ro>
To:	Radu Rendec <radu.rendec@...s.ro>
CC:	Jarek Poplawski <jarkao2@...il.com>,
	Jesper Dangaard Brouer <hawk@...u.dk>,
	Denys Fedoryschenko <denys@...p.net.lb>,
	netdev <netdev@...r.kernel.org>
Subject: Re: htb parallelism on multi-core platforms

Wednesday, April 29, 2009, 7:21:11 PM, you wrote:

> I finally managed to disable NAPI on e1000e - apparently it can only be
> done on the "official" Intel driver (downloaded from their website), by
> compiling with "make CFLAGS_EXTRA=-DE1000E_NO_NAPI". This doesn't seem
> to be available in the (2.6.29) kernel driver.

> With NAPI disabled, 4 (of 8) cores go to 100% (instead of only one), but
> overall throughput *decreases* from ~110K pps (with NAPI) to ~80K pps.
> This makes sense, since h/w interrupt is much more time consuming than
> polling (that's the whole idea behind NAPI anyway).

> Radu Rendec

   I tested with e1000 only, on a single quad-core CPU - the L2 cache was
shared between the cores.

  For 8 cores I suppose you have 2 quad-core CPUs. If the cores actually
used belong to different physical CPUs, L2 cache sharing does not occur -
maybe this could explain the performance drop in your case.
  Or there may be other explanation...

  Anyway - coming back to David Miller's words:

"HTB acts upon global state, so anything that goes into a particular 
device's HTB ruleset is going to be single threaded. 
There really isn't any way around this. "

  It could be the only way to get more power is to increase the number 
of devices where you are shaping. You could split the IP space into 4 groups
and direct the trafic to 4 IMQ devices with 4 iptables rules -

-d 0.0.0.0/2 -j IMQ --todev imq0,
-d 64.0.0.0/2 -j IMQ --todev imq1, etc...

Or you can customize the split depeding on the traffic distribution.
ipset nethash match can also be used.

 The 4 devices can have the same htb ruleset, only the right parts 
of it will match.
  You should test with 4 flows that use all the devices simultaneously and
see what is the aggregate throughput.

  The performance gained through parallelism might be a lot higher than the 
added overhead of iptables and/or ipset nethash match. Anyway - this is more of
a "hack" than a clean solution :)

p.s.: latest IMQ at http://www.linuximq.net/ is for 2.6.26 so you will need to try with that
-- 
Best regards,
 Calin                            mailto:calin.velea@...enii.ro

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html